This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.8.7-rc1
Fudongwang fudong.wang@amd.com drm/amd/display: fix disable otg wa logic in DCN316
Wenjing Liu wenjing.liu@amd.com drm/amd/display: always reset ODM mode in context when adding first plane
Alex Hung alex.hung@amd.com drm/amd/display: Return max resolution supported by DWB
Dillon Varone dillon.varone@amd.com drm/amd/display: Do not recursively call manual trigger programming
Harry Wentland harry.wentland@amd.com drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
Harry Wentland harry.wentland@amd.com drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
Yifan Zhang yifan1.zhang@amd.com drm/amdgpu: differentiate external rev id for gfx 11.5.0
Tim Huang Tim.Huang@amd.com drm/amdgpu: fix incorrect number of active RBs for gfx11
Alex Deucher alexander.deucher@amd.com drm/amdgpu: always force full reset for SOC21
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Reset dGPU if suspend got aborted
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Disable live M/N updates when using bigjoiner
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Disable port sync when bigjoiner is used
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/psr: Disable PSR when bigjoiner is used
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/cdclk: Fix CDCLK programming order when pipes are active
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Fix BHI handling of RRSBA
Ingo Molnar mingo@kernel.org x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Fix BHI documentation
Daniel Sneddon daniel.sneddon@linux.intel.com x86/bugs: Fix return type of spectre_bhi_state()
Amir Goldstein amir73il@gmail.com kernfs: annotate different lockdep class for of->mutex of writable files
Oleg Nesterov oleg@redhat.com selftests: kselftest: Fix build failure with NOLIBC
Arnd Bergmann arnd@arndb.de irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
Adam Dunlap acdunlap@google.com x86/apic: Force native_apic_mem_read() to use the MOV instruction
Nathan Chancellor nathan@kernel.org selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn
John Stultz jstultz@google.com selftests: timers: Fix abs() warning in posix_timers test
John Stultz jstultz@google.com selftests: timers: Fix posix_timers ksft_print_msg() warning
Oleg Nesterov oleg@redhat.com selftests/timers/posix_timers: Reimplement check_timer_distribution()
Sean Christopherson seanjc@google.com x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
Namhyung Kim namhyung@kernel.org perf/x86: Fix out of range data
Gavin Shan gshan@redhat.com vhost: Add smp_rmb() in vhost_enable_notify()
Gavin Shan gshan@redhat.com vhost: Add smp_rmb() in vhost_vq_avail_empty()
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix spi lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix pwm lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-conn: fix usb lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix adc lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix can lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8qm-ss-dma: fix can lpcg indices
Lang Yu Lang.Yu@amd.com drm/amdgpu/umsch: reinitialize write pointer in hw init
Johan Hovold johan+linaro@kernel.org drm/msm/dp: fix runtime PM leak on connect failure
Johan Hovold johan+linaro@kernel.org drm/msm/dp: fix runtime PM leak on disconnect
Ville Syrjälä ville.syrjala@linux.intel.com drm/client: Fully protect modes[] with dev->mode_config.mutex
Boris Brezillon boris.brezillon@collabora.com drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
Jammy Huang jammy_huang@aspeedtech.com drm/ast: Fix soft lockup
Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com drm/amdkfd: Reset GPU on queue preemption failure
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/vrr: Disable VRR when using bigjoiner
Zack Rusin zack.rusin@broadcom.com drm/vmwgfx: Enable DMA mappings with SEV
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Fix deadlock in context_xa
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Return max freq for DRM_IVPU_PARAM_CORE_CLOCK_RATE
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Put NPU back to D3hot after failed resume
Wachowski, Karol karol.wachowski@intel.com accel/ivpu: Fix PCI D0 state entry in resume
Wachowski, Karol karol.wachowski@intel.com accel/ivpu: Check return code of ipc->lock init
Alexander Wetzel Alexander@wetzel-home.de scsi: sg: Avoid race in error handling & drop bogus warn
Alexander Wetzel Alexander@wetzel-home.de scsi: sg: Avoid sg device teardown race
Masami Hiramatsu mhiramat@kernel.org fs/proc: Skip bootloader comment if no embedded kernel parameters
Zhenhua Huang quic_zhenhuah@quicinc.com fs/proc: remove redundant comments from /proc/bootconfig
Zheng Yejian zhengyejian1@huawei.com kprobes: Fix possible use-after-free issue on kprobe registration
Pavel Begunkov asml.silence@gmail.com io_uring/net: restore msg_control on sendzc retry
Boris Burkov boris@bur.io btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
Boris Burkov boris@bur.io btrfs: record delayed inode root in transaction
Boris Burkov boris@bur.io btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
Boris Burkov boris@bur.io btrfs: qgroup: correctly model root qgroup rsv in convert
Jens Axboe axboe@kernel.dk io_uring: disable io-wq execution of multishot NOWAIT requests
Pavel Begunkov asml.silence@gmail.com io_uring: refactor DEFER_TASKRUN multishot checks
Lu Baolu baolu.lu@linux.intel.com iommu/vt-d: Fix WARN_ON in iommu probe path
Jacob Pan jacob.jun.pan@linux.intel.com iommu/vt-d: Allocate local memory for page request queue
Xuchun Shang xuchun.shang@linux.alibaba.com iommu/vt-d: Fix wrong use of pasid config
Arnd Bergmann arnd@arndb.de tracing: hide unused ftrace_event_id_fops
Karthik Poosa karthik.poosa@intel.com drm/xe/hwmon: Cast result to output precision on left shift of operand
Lucas De Marchi lucas.demarchi@intel.com drm/xe/display: Fix double mutex initialization
David Arinzon darinzon@amazon.com net: ena: Set tx_info->xdpf value to NULL
David Arinzon darinzon@amazon.com net: ena: Fix incorrect descriptor free behavior
David Arinzon darinzon@amazon.com net: ena: Wrong missing IO completions check order
David Arinzon darinzon@amazon.com net: ena: Fix potential sign extension issue
Michal Luczaj mhal@rbox.co af_unix: Fix garbage collector racing against connect()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: trap link-local frames regardless of ST Port State
Gerd Bayer gbayer@linux.ibm.com Revert "s390/ism: fix receive message buffer allocation"
Daniel Machon daniel.machon@microchip.com net: sparx5: fix wrong config being used when reconfiguring PCS
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
Carolina Jubran cjubran@nvidia.com net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
Carolina Jubran cjubran@nvidia.com net/mlx5e: Fix mlx5e_priv_init() cleanup flow
Carolina Jubran cjubran@nvidia.com net/mlx5e: RSS, Block changing channels number when RXFH is configured
Cosmin Ratiu cratiu@nvidia.com net/mlx5: Correctly compare pkt reformat ids
Cosmin Ratiu cratiu@nvidia.com net/mlx5: Properly link new fs rules into the tree
Michael Liang mliang@purestorage.com net/mlx5: offset comp irq index in name by one
Shay Drory shayd@nvidia.com net/mlx5: Register devlink first under devlink lock
Moshe Shemesh moshe@nvidia.com net/mlx5: SF, Stop waiting for FW as teardown was called
Eric Dumazet edumazet@google.com netfilter: complete validation of user input
Archie Pusaka apusaka@chromium.org Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sock: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: RFCOMM: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: SCO: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Use QoS to determine which PHY to scan
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
Brett Creeley brett.creeley@amd.com pds_core: Fix pdsc_check_pci_health function to use work thread
Shannon Nelson shannon.nelson@amd.com pds_core: use pci_reset_function for health reset
Jiri Benc jbenc@redhat.com ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
Arnd Bergmann arnd@arndb.de ipv4/route: avoid unused-but-set-variable warning
Arnd Bergmann arnd@arndb.de ipv6: fib: hide unused 'pn' variable
Geetha sowjanya gakula@marvell.com octeontx2-af: Fix NIX SQ mode and BP config
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Clear stale u->oob_skb.
Marek Vasut marex@denx.de net: ks8851: Handle softirqs at the end of IRQ thread to fix hang
Marek Vasut marex@denx.de net: ks8851: Inline ks8851_rx_skb()
Dave Jiang dave.jiang@intel.com cxl: Fix retrieving of access_coordinates in PCIe path
Dave Jiang dave.jiang@intel.com cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
Dave Jiang dave.jiang@intel.com cxl: Split out host bridge access coordinates
Dave Jiang dave.jiang@intel.com cxl: Split out combine_coordinates() for common shared usage
Dave Jiang dave.jiang@intel.com ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
Dave Jiang dave.jiang@intel.com ACPI: HMAT: Introduce 2 levels of generic port access class
Dave Jiang dave.jiang@intel.com base/node / ACPI: Enumerate node access class for 'struct access_coordinate'
Raag Jadav raag.jadav@intel.com ACPI: bus: allow _UID matching for integer zero
Pavan Chebbi pavan.chebbi@broadcom.com bnxt_en: Reset PTP tx_avail after possible firmware reset
Vikas Gupta vikas.gupta@broadcom.com bnxt_en: Fix error recovery for RoCE ulp client
Vikas Gupta vikas.gupta@broadcom.com bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()
Gerd Bayer gbayer@linux.ibm.com s390/ism: fix receive message buffer allocation
Eric Dumazet edumazet@google.com geneve: fix header validation in geneve[6]_xmit_skb
Arnd Bergmann arnd@arndb.de lib: checksum: hide unused expected_csum_ipv6_magic[]
Ming Lei ming.lei@redhat.com block: fix q->blkg_list corruption during disk rebind
Hariprasad Kelam hkelam@marvell.com octeontx2-pf: Fix transmit scheduler resource leak
Eric Dumazet edumazet@google.com xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
Petr Tesarik petr@tesarici.cz u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file
Ilya Maximets i.maximets@ovn.org net: openvswitch: fix unwanted error log on timeout policy probing
Dan Carpenter dan.carpenter@linaro.org scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
Xiang Chen chenxiang66@hisilicon.com scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
Luca Weiss luca.weiss@fairphone.com drm/msm/adreno: Set highest_bank_bit for A619
Arnd Bergmann arnd@arndb.de nouveau: fix function cast warning
Alex Constantino dreaming.about.electric.sheep@gmail.com Revert "drm/qxl: simplify qxl_fence_wait"
Kwangjin Ko kwangjin.ko@sk.com cxl/core: Fix initialization of mbox_cmd.size_out in get event
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
Dmitry Baryshkov dmitry.baryshkov@linaro.org dt-bindings: display/msm: sm8150-mdss: add DP node
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: make error messages at dpu_core_irq_register_callback() more sensible
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: don't allow overriding data from catalog
Stephen Boyd swboyd@chromium.org drm/msm: Add newlines to some debug prints
Tim Harvey tharvey@gateworks.com arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
Tim Harvey tharvey@gateworks.com arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
Dave Jiang dave.jiang@intel.com cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
Yuquan Wang wangyuquan1236@phytium.com.cn cxl/mem: Fix for the index of Clear Event Record Handle
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Make raw debugfs entries non-seekable
Jens Wiklander jens.wiklander@linaro.org firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix USB regression on Nokia N8x0
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: restore original power up/down steps
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: fix deferred probe
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: fix broken slot switch lookup
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix N810 MMC gpiod table
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
David Sterba dsterba@suse.com btrfs: tests: allocate dummy fs_info and root in test_find_delalloc()
Nini Song nini.song@mediatek.com media: cec: core: remove length check of Timer Status
Anna-Maria Behnsen anna-maria@linutronix.de PM: s2idle: Make sure CPUs will wakeup directly on resume
Hans de Goede hdegoede@redhat.com ACPI: scan: Do not increase dep_unmet for already met dependencies
Noah Loomans noah@noahloomans.com platform/chrome: cros_ec_uart: properly fix race condition
Tim Huang Tim.Huang@amd.com drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
Dmitry Antipov dmantipov@yandex.ru Bluetooth: Fix memory leak in hci_req_sync_complete()
Steven Rostedt (Google) rostedt@goodmis.org ring-buffer: Only update pages_touched when a new page is touched
Yu Kuai yukuai3@huawei.com raid1: fix use-after-free for original bio in raid1_write_request()
Fabio Estevam festevam@denx.de ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
Gavin Shan gshan@redhat.com arm64: tlb: Fix TLBI RANGE operand
Breno Leitao leitao@debian.org virtio_net: Do not send RSS key if it is not supported
Xiubo Li xiubli@redhat.com ceph: switch to use cap_delay_lock for the unlink delay list
NeilBrown neilb@suse.de ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
Sven Eckelmann sven@narfation.org batman-adv: Avoid infinite loop trying to resize local TT
Peyton Lee peytolee@amd.com drm/amdgpu/vpe: power on vpe when hw_init
Damien Le Moal dlemoal@kernel.org ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
Igor Pylypiv ipylypiv@google.com ata: libata-core: Allow command duration limits detection for ACS-4 drives
Steve French stfrench@microsoft.com smb3: fix Open files on server counter going negative
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 22 +- Documentation/admin-guide/kernel-parameters.txt | 12 +- .../bindings/display/msm/qcom,sm8150-mdss.yaml | 9 + Makefile | 4 +- arch/arm/boot/dts/nxp/imx/imx7s-warp.dts | 1 + arch/arm/mach-omap2/board-n8x0.c | 23 +-- arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 16 +- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 40 ++-- arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi | 16 +- .../boot/dts/freescale/imx8mp-venice-gw72xx.dtsi | 2 +- .../boot/dts/freescale/imx8mp-venice-gw73xx.dtsi | 2 +- arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi | 8 +- arch/arm64/include/asm/tlbflush.h | 20 +- arch/x86/Kconfig | 21 +- arch/x86/events/core.c | 1 + arch/x86/include/asm/apic.h | 3 +- arch/x86/kernel/apic/apic.c | 6 +- arch/x86/kernel/cpu/bugs.c | 82 ++++---- arch/x86/kernel/cpu/common.c | 48 ++--- block/blk-cgroup.c | 9 +- block/blk-cgroup.h | 2 + block/blk-core.c | 2 + drivers/accel/ivpu/ivpu_drv.c | 20 +- drivers/accel/ivpu/ivpu_hw.h | 6 + drivers/accel/ivpu/ivpu_hw_37xx.c | 7 +- drivers/accel/ivpu/ivpu_hw_40xx.c | 6 + drivers/accel/ivpu/ivpu_ipc.c | 8 +- drivers/accel/ivpu/ivpu_pm.c | 7 +- drivers/acpi/numa/hmat.c | 43 ++-- drivers/acpi/scan.c | 3 +- drivers/ata/libata-core.c | 2 +- drivers/ata/libata-scsi.c | 9 +- drivers/base/node.c | 6 +- drivers/cxl/acpi.c | 8 +- drivers/cxl/core/cdat.c | 58 ++++-- drivers/cxl/core/mbox.c | 5 +- drivers/cxl/core/port.c | 76 ++++--- drivers/cxl/core/regs.c | 5 +- drivers/cxl/cxl.h | 8 +- drivers/firmware/arm_ffa/driver.c | 2 +- drivers/firmware/arm_scmi/raw_mode.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/soc21.c | 32 ++- drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 2 + .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c | 6 +- .../amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c | 19 +- drivers/gpu/drm/amd/display/dc/core/dc_state.c | 9 + .../gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c | 3 - .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +- drivers/gpu/drm/ast/ast_dp.c | 3 + drivers/gpu/drm/drm_client_modeset.c | 3 +- drivers/gpu/drm/i915/display/intel_cdclk.c | 7 +- drivers/gpu/drm/i915/display/intel_cdclk.h | 3 + drivers/gpu/drm/i915/display/intel_ddi.c | 5 + drivers/gpu/drm/i915/display/intel_dp.c | 6 +- drivers/gpu/drm/i915/display/intel_psr.c | 11 + drivers/gpu/drm/i915/display/intel_vrr.c | 7 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 + drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 10 +- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 8 +- drivers/gpu/drm/msm/dp/dp_display.c | 2 + drivers/gpu/drm/msm/msm_fb.c | 6 +- drivers/gpu/drm/msm/msm_kms.c | 4 +- .../gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c | 7 +- drivers/gpu/drm/panfrost/panfrost_mmu.c | 13 +- drivers/gpu/drm/qxl/qxl_release.c | 50 ++++- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 +- drivers/gpu/drm/xe/xe_display.c | 5 - drivers/gpu/drm/xe/xe_hwmon.c | 4 +- drivers/iommu/intel/iommu.c | 11 +- drivers/iommu/intel/perfmon.c | 2 +- drivers/iommu/intel/svm.c | 2 +- drivers/md/raid1.c | 2 +- drivers/media/cec/core/cec-adap.c | 14 -- drivers/mmc/host/omap.c | 48 +++-- drivers/net/dsa/mt7530.c | 229 ++++++++++++++++++--- drivers/net/dsa/mt7530.h | 5 + drivers/net/ethernet/amazon/ena/ena_com.c | 2 +- drivers/net/ethernet/amazon/ena/ena_netdev.c | 35 ++-- drivers/net/ethernet/amazon/ena/ena_xdp.c | 4 +- drivers/net/ethernet/amd/pds_core/core.c | 14 +- drivers/net/ethernet/amd/pds_core/core.h | 5 +- drivers/net/ethernet/amd/pds_core/dev.c | 3 + drivers/net/ethernet/amd/pds_core/main.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 6 +- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 22 +- drivers/net/ethernet/marvell/octeontx2/nic/qos.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 33 +-- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c | 2 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 17 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 7 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 17 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 37 ++-- drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +- .../ethernet/mellanox/mlx5/core/sf/dev/driver.c | 22 +- drivers/net/ethernet/micrel/ks8851.h | 3 - drivers/net/ethernet/micrel/ks8851_common.c | 16 +- drivers/net/ethernet/micrel/ks8851_par.c | 11 - drivers/net/ethernet/micrel/ks8851_spi.c | 11 - .../net/ethernet/microchip/sparx5/sparx5_port.c | 4 +- drivers/net/geneve.c | 4 +- drivers/net/virtio_net.c | 28 ++- drivers/platform/chrome/cros_ec_uart.c | 28 +-- drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +- drivers/scsi/qla2xxx/qla_edif.c | 2 +- drivers/scsi/sg.c | 20 +- drivers/vhost/vhost.c | 28 ++- fs/btrfs/delayed-inode.c | 3 + fs/btrfs/inode.c | 13 +- fs/btrfs/ioctl.c | 37 +++- fs/btrfs/qgroup.c | 2 + fs/btrfs/root-tree.c | 10 - fs/btrfs/root-tree.h | 2 - fs/btrfs/tests/extent-io-tests.c | 28 ++- fs/btrfs/transaction.c | 17 +- fs/ceph/addr.c | 4 +- fs/ceph/caps.c | 4 +- fs/ceph/mds_client.c | 9 +- fs/ceph/mds_client.h | 3 +- fs/kernfs/file.c | 9 +- fs/proc/bootconfig.c | 12 +- fs/smb/client/cached_dir.c | 4 +- include/acpi/acpi_bus.h | 8 +- include/linux/bootconfig.h | 1 + include/linux/dma-fence.h | 7 + include/linux/irqflags.h | 2 +- include/linux/node.h | 18 +- include/linux/u64_stats_sync.h | 9 +- include/net/addrconf.h | 4 + include/net/af_unix.h | 2 +- include/net/bluetooth/bluetooth.h | 11 + include/net/ip_tunnels.h | 33 +++ init/main.c | 5 + io_uring/io_uring.c | 25 +++ io_uring/net.c | 22 +- io_uring/rw.c | 2 - kernel/cpu.c | 3 +- kernel/kprobes.c | 18 +- kernel/power/suspend.c | 6 + kernel/trace/ring_buffer.c | 6 +- kernel/trace/trace_events.c | 4 + lib/checksum_kunit.c | 5 +- net/batman-adv/translation-table.c | 2 +- net/bluetooth/hci_request.c | 4 +- net/bluetooth/hci_sock.c | 21 +- net/bluetooth/hci_sync.c | 66 +++++- net/bluetooth/iso.c | 50 ++--- net/bluetooth/l2cap_core.c | 3 +- net/bluetooth/l2cap_sock.c | 52 ++--- net/bluetooth/rfcomm/sock.c | 14 +- net/bluetooth/sco.c | 23 +-- net/ipv4/netfilter/arp_tables.c | 4 + net/ipv4/netfilter/ip_tables.c | 4 + net/ipv4/route.c | 4 +- net/ipv6/addrconf.c | 7 +- net/ipv6/ip6_fib.c | 7 +- net/ipv6/netfilter/ip6_tables.c | 4 + net/openvswitch/conntrack.c | 5 +- net/unix/af_unix.c | 8 +- net/unix/garbage.c | 35 +++- net/unix/scm.c | 8 +- net/xdp/xsk.c | 2 + tools/testing/selftests/kselftest.h | 33 ++- tools/testing/selftests/timers/posix_timers.c | 105 +++++----- 170 files changed, 1559 insertions(+), 882 deletions(-)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French stfrench@microsoft.com
commit 28e0947651ce6a2200b9a7eceb93282e97d7e51a upstream.
We were decrementing the count of open files on server twice for the case where we were closing cached directories.
Fixes: 8e843bf38f7b ("cifs: return a single-use cfid if we did not get a lease") Cc: stable@vger.kernel.org Acked-by: Bharath SM bharathsm@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cached_dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -433,8 +433,8 @@ smb2_close_cached_fid(struct kref *ref) if (cfid->is_open) { rc = SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, cfid->fid.volatile_fid); - if (rc != -EBUSY && rc != -EAGAIN) - atomic_dec(&cfid->tcon->num_remote_opens); + if (rc) /* should we retry on -EBUSY or -EAGAIN? */ + cifs_dbg(VFS, "close cached dir rc %d\n", rc); }
free_cached_dir(cfid);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Igor Pylypiv ipylypiv@google.com
commit c0297e7dd50795d559f3534887a6de1756b35d0f upstream.
Even though the command duration limits (CDL) feature was first added in ACS-5 (major version 12), there are some ACS-4 (major version 11) drives that implement CDL as well.
IDENTIFY_DEVICE, SUPPORTED_CAPABILITIES, and CURRENT_SETTINGS log pages are mandatory in the ACS-4 standard so it should be safe to read these log pages on older drives implementing the ACS-4 standard.
Fixes: 62e4a60e0cdb ("scsi: ata: libata: Detect support for command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Igor Pylypiv ipylypiv@google.com Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2539,7 +2539,7 @@ static void ata_dev_config_cdl(struct at bool cdl_enabled; u64 val;
- if (ata_id_major_version(dev->id) < 12) + if (ata_id_major_version(dev->id) < 11) goto not_supported;
if (!ata_log_supported(dev, ATA_LOG_IDENTIFY_DEVICE) ||
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 79336504781e7fee5ddaf046dcc186c8dfdf60b1 upstream.
Commit 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") incorrectly handles failures of scsi_resume_device() in ata_scsi_dev_rescan(), leading to a double call to spin_unlock_irqrestore() to unlock a device port. Fix this by redefining the goto labels used in case of errors and only unlock the port scsi_scan_mutex when scsi_resume_device() fails.
Bug found with the Smatch static checker warning:
drivers/ata/libata-scsi.c:4774 ata_scsi_dev_rescan() error: double unlocked 'ap->lock' (orig line 4757)
Reported-by: Dan Carpenter dan.carpenter@linaro.org Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-scsi.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4745,7 +4745,7 @@ void ata_scsi_dev_rescan(struct work_str * bail out. */ if (ap->pflags & ATA_PFLAG_SUSPENDED) - goto unlock; + goto unlock_ap;
if (!sdev) continue; @@ -4758,7 +4758,7 @@ void ata_scsi_dev_rescan(struct work_str if (do_resume) { ret = scsi_resume_device(sdev); if (ret == -EWOULDBLOCK) - goto unlock; + goto unlock_scan; dev->flags &= ~ATA_DFLAG_RESUMING; } ret = scsi_rescan_device(sdev); @@ -4766,12 +4766,13 @@ void ata_scsi_dev_rescan(struct work_str spin_lock_irqsave(ap->lock, flags);
if (ret) - goto unlock; + goto unlock_ap; } }
-unlock: +unlock_ap: spin_unlock_irqrestore(ap->lock, flags); +unlock_scan: mutex_unlock(&ap->scsi_scan_mutex);
/* Reschedule with a delay if scsi_rescan_device() returned an error */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peyton Lee peytolee@amd.com
commit eed14eb48ee176fe0144c6a999d00c855d0b199b upstream.
To fix mode2 reset failure. Should power on VPE when hw_init.
Signed-off-by: Peyton Lee peytolee@amd.com Reviewed-by: Lang Yu lang.yu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: "Gong, Richard" richard.gong@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c @@ -390,6 +390,12 @@ static int vpe_hw_init(void *handle) struct amdgpu_vpe *vpe = &adev->vpe; int ret;
+ /* Power on VPE */ + ret = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VPE, + AMD_PG_STATE_UNGATE); + if (ret) + return ret; + ret = vpe_load_microcode(vpe); if (ret) return ret;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Eckelmann sven@narfation.org
commit b1f532a3b1e6d2e5559c7ace49322922637a28aa upstream.
If the MTU of one of an attached interface becomes too small to transmit the local translation table then it must be resized to fit inside all fragments (when enabled) or a single packet.
But if the MTU becomes too low to transmit even the header + the VLAN specific part then the resizing of the local TT will never succeed. This can for example happen when the usable space is 110 bytes and 11 VLANs are on top of batman-adv. In this case, at least 116 byte would be needed. There will just be an endless spam of
batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
in the log but the function will never finish. Problem here is that the timeout will be halved all the time and will then stagnate at 0 and therefore never be able to reduce the table even more.
There are other scenarios possible with a similar result. The number of BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too high to fit inside a packet. Such a scenario can therefore happen also with only a single VLAN + 7 non-purgable addresses - requiring at least 120 bytes.
While this should be handled proactively when:
* interface with too low MTU is added * VLAN is added * non-purgeable local mac is added * MTU of an attached interface is reduced * fragmentation setting gets disabled (which most likely requires dropping attached interfaces)
not all of these scenarios can be prevented because batman-adv is only consuming events without the the possibility to prevent these actions (non-purgable MAC address added, MTU of an attached interface is reduced). It is therefore necessary to also make sure that the code is able to handle also the situations when there were already incompatible system configuration are present.
Cc: stable@vger.kernel.org Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size") Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/batman-adv/translation-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struc
spin_lock_bh(&bat_priv->tt.commit_lock);
- while (true) { + while (timeout) { table_size = batadv_tt_local_table_transmit_size(bat_priv); if (packet_size_max >= table_size) break;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
commit b372e96bd0a32729d55d27f613c8bc80708a82e1 upstream.
The page has been marked clean before writepage is called. If we don't redirty it before postponing the write, it might never get written.
Cc: stable@vger.kernel.org Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: Xiubo Li xiubli@redhat.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/addr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -795,8 +795,10 @@ static int ceph_writepage(struct page *p ihold(inode);
if (wbc->sync_mode == WB_SYNC_NONE && - ceph_inode_to_fs_client(inode)->write_congested) + ceph_inode_to_fs_client(inode)->write_congested) { + redirty_page_for_writepage(wbc, page); return AOP_WRITEPAGE_ACTIVATE; + }
wait_on_page_fscache(page);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiubo Li xiubli@redhat.com
commit 17f8dc2db52185460f212052f3a692c1fdc167ba upstream.
The same list item will be used in both cap_delay_list and cap_unlink_delay_list, so it's buggy to use two different locks to protect them.
Cc: stable@vger.kernel.org Fixes: dbc347ef7f0c ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately") Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMX... Reported-by: Marc Ruhmann ruhmann@luis.uni-hannover.de Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Ilya Dryomov idryomov@gmail.com Tested-by: Marc Ruhmann ruhmann@luis.uni-hannover.de Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/caps.c | 4 ++-- fs/ceph/mds_client.c | 9 ++++----- fs/ceph/mds_client.h | 3 +-- 3 files changed, 7 insertions(+), 9 deletions(-)
--- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4775,13 +4775,13 @@ int ceph_drop_caps_for_unlink(struct ino
doutc(mdsc->fsc->client, "%p %llx.%llx\n", inode, ceph_vinop(inode)); - spin_lock(&mdsc->cap_unlink_delay_lock); + spin_lock(&mdsc->cap_delay_lock); ci->i_ceph_flags |= CEPH_I_FLUSH; if (!list_empty(&ci->i_cap_delay_list)) list_del_init(&ci->i_cap_delay_list); list_add_tail(&ci->i_cap_delay_list, &mdsc->cap_unlink_delay_list); - spin_unlock(&mdsc->cap_unlink_delay_lock); + spin_unlock(&mdsc->cap_delay_lock);
/* * Fire the work immediately, because the MDS maybe --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2504,7 +2504,7 @@ static void ceph_cap_unlink_work(struct struct ceph_client *cl = mdsc->fsc->client;
doutc(cl, "begin\n"); - spin_lock(&mdsc->cap_unlink_delay_lock); + spin_lock(&mdsc->cap_delay_lock); while (!list_empty(&mdsc->cap_unlink_delay_list)) { struct ceph_inode_info *ci; struct inode *inode; @@ -2516,15 +2516,15 @@ static void ceph_cap_unlink_work(struct
inode = igrab(&ci->netfs.inode); if (inode) { - spin_unlock(&mdsc->cap_unlink_delay_lock); + spin_unlock(&mdsc->cap_delay_lock); doutc(cl, "on %p %llx.%llx\n", inode, ceph_vinop(inode)); ceph_check_caps(ci, CHECK_CAPS_FLUSH); iput(inode); - spin_lock(&mdsc->cap_unlink_delay_lock); + spin_lock(&mdsc->cap_delay_lock); } } - spin_unlock(&mdsc->cap_unlink_delay_lock); + spin_unlock(&mdsc->cap_delay_lock); doutc(cl, "done\n"); }
@@ -5404,7 +5404,6 @@ int ceph_mdsc_init(struct ceph_fs_client INIT_LIST_HEAD(&mdsc->cap_wait_list); spin_lock_init(&mdsc->cap_delay_lock); INIT_LIST_HEAD(&mdsc->cap_unlink_delay_list); - spin_lock_init(&mdsc->cap_unlink_delay_lock); INIT_LIST_HEAD(&mdsc->snap_flush_list); spin_lock_init(&mdsc->snap_flush_lock); mdsc->last_cap_flush_tid = 1; --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -461,9 +461,8 @@ struct ceph_mds_client { struct delayed_work delayed_work; /* delayed work */ unsigned long last_renew_caps; /* last time we renewed our caps */ struct list_head cap_delay_list; /* caps with delayed release */ - spinlock_t cap_delay_lock; /* protects cap_delay_list */ struct list_head cap_unlink_delay_list; /* caps with delayed release for unlink */ - spinlock_t cap_unlink_delay_lock; /* protects cap_unlink_delay_list */ + spinlock_t cap_delay_lock; /* protects cap_delay_list and cap_unlink_delay_list */ struct list_head snap_flush_list; /* cap_snaps ready to flush */ spinlock_t snap_flush_lock;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream.
There is a bug when setting the RSS options in virtio_net that can break the whole machine, getting the kernel into an infinite loop.
Running the following command in any QEMU virtual machine with virtionet will reproduce this problem:
# ethtool -X eth0 hfunc toeplitz
This is how the problem happens:
1) ethtool_set_rxfh() calls virtnet_set_rxfh()
2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
3) virtnet_commit_rss_command() populates 4 entries for the rss scatter-gather
4) Since the command above does not have a key, then the last scatter-gatter entry will be zeroed, since rss_key_size == 0. sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer with zero length, and do the following in virtqueue_map_desc() (QEMU function):
if (!sz) { virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) && !virtqueue_is_broken(vi->cvq)) cpu_relax();
10) None of the following functions above is true, thus, the kernel loops here forever. Keeping in mind that virtqueue_is_broken() does not look at the qemu `vdev->broken`, so, it never realizes that the vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in the device.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Cc: stable@vger.kernel.org Cc: qemu-devel@nongnu.org Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Heng Qi hengqi@linux.alibaba.com Reviewed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3768,6 +3768,7 @@ static int virtnet_set_rxfh(struct net_d struct netlink_ext_ack *extack) { struct virtnet_info *vi = netdev_priv(dev); + bool update = false; int i;
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE && @@ -3775,13 +3776,28 @@ static int virtnet_set_rxfh(struct net_d return -EOPNOTSUPP;
if (rxfh->indir) { + if (!vi->has_rss) + return -EOPNOTSUPP; + for (i = 0; i < vi->rss_indir_table_size; ++i) vi->ctrl->rss.indirection_table[i] = rxfh->indir[i]; + update = true; } - if (rxfh->key) + + if (rxfh->key) { + /* If either _F_HASH_REPORT or _F_RSS are negotiated, the + * device provides hash calculation capabilities, that is, + * hash_key is configured. + */ + if (!vi->has_rss && !vi->has_rss_hash_report) + return -EOPNOTSUPP; + memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size); + update = true; + }
- virtnet_commit_rss_command(vi); + if (update) + virtnet_commit_rss_command(vi);
return 0; } @@ -4686,13 +4702,15 @@ static int virtnet_probe(struct virtio_d if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT)) vi->has_rss_hash_report = true;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) { vi->has_rss = true;
- if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_indir_table_size = virtio_cread16(vdev, offsetof(struct virtio_net_config, rss_max_indirection_table_length)); + } + + if (vi->has_rss || vi->has_rss_hash_report) { vi->rss_key_size = virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size));
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavin Shan gshan@redhat.com
commit e3ba51ab24fddef79fc212f9840de54db8fd1685 upstream.
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty pages are collected by VMM and the page table entries become write protected during live migration. Unfortunately, the operand passed to the TLBI RANGE instruction isn't correctly sorted out due to the commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()"). It leads to crash on the destination VM after live migration because TLBs aren't flushed completely and some of the dirty pages are missed.
For example, I have a VM where 8GB memory is assigned, starting from 0x40000000 (1GB). Note that the host has 4KB as the base page size. In the middile of migration, kvm_tlb_flush_vmid_range() is executed to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3 and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop in the __flush_tlb_range_op() until the variable @scale underflows and becomes -9, 0xffff708000040000 is set as the operand. The operand is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to invalid @scale and @num.
Fix it by extending __TLBI_RANGE_NUM() to support the combination of SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can be returned from the macro, meaning the TLBs for 0x200000 pages in the above example can be flushed in one shoot with SCALE#3 and NUM#31. The macro TLBI_RANGE_MASK is dropped since no one uses it any more. The comments are also adjusted accordingly.
Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") Cc: stable@kernel.org # v6.6+ Reported-by: Yihuang Yu yihyu@redhat.com Suggested-by: Marc Zyngier maz@kernel.org Signed-off-by: Gavin Shan gshan@redhat.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Reviewed-by: Shaoqin Huang shahuang@redhat.com Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/tlbflush.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -161,12 +161,18 @@ static inline unsigned long get_trans_gr #define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3)
/* - * Generate 'num' values from -1 to 30 with -1 rejected by the - * __flush_tlb_range() loop below. - */ -#define TLBI_RANGE_MASK GENMASK_ULL(4, 0) -#define __TLBI_RANGE_NUM(pages, scale) \ - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1) + * Generate 'num' values from -1 to 31 with -1 rejected by the + * __flush_tlb_range() loop below. Its return value is only + * significant for a maximum of MAX_TLBI_RANGE_PAGES pages. If + * 'pages' is more than that, you must iterate over the overall + * range. + */ +#define __TLBI_RANGE_NUM(pages, scale) \ + ({ \ + int __pages = min((pages), \ + __TLBI_RANGE_PAGES(31, (scale))); \ + (__pages >> (5 * (scale) + 1)) - 1; \ + })
/* * TLB Invalidation @@ -379,10 +385,6 @@ static inline void arch_tlbbatch_flush(s * 3. If there is 1 page remaining, flush it through non-range operations. Range * operations can only span an even number of pages. We save this for last to * ensure 64KB start alignment is maintained for the LPA2 case. - * - * Note that certain ranges can be represented by either num = 31 and - * scale or num = 0 and scale + 1. The loop below favours the latter - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro. */ #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Estevam festevam@denx.de
commit 135f218255b28c5bbf71e9e32a49e5c734cabbe5 upstream.
Since commit 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification") the ov2680 no longer probes on a imx7s-warp7:
ov2680 1-0036: error -EINVAL: supported link freq 330000000 not found ov2680 1-0036: probe with driver ov2680 failed with error -22
Fix it by passing the required 'link-frequencies' property as recommended by:
https://www.kernel.org/doc/html/v6.9-rc1/driver-api/media/camera-sensor.html...
Cc: stable@vger.kernel.org Fixes: 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification") Signed-off-by: Fabio Estevam festevam@denx.de Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/nxp/imx/imx7s-warp.dts | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm/boot/dts/nxp/imx/imx7s-warp.dts +++ b/arch/arm/boot/dts/nxp/imx/imx7s-warp.dts @@ -210,6 +210,7 @@ remote-endpoint = <&mipi_from_sensor>; clock-lanes = <0>; data-lanes = <1>; + link-frequencies = /bits/ 64 <330000000>; }; }; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
commit fcf3f7e2fc8a53a6140beee46ec782a4c88e4744 upstream.
r1_bio->bios[] is used to record new bios that will be issued to underlying disks, however, in raid1_write_request(), r1_bio->bios[] will set to the original bio temporarily. Meanwhile, if blocked rdev is set, free_r1bio() will be called causing that all r1_bio->bios[] to be freed:
raid1_write_request() r1_bio = alloc_r1bio(mddev, bio); -> r1_bio->bios[] is NULL for (i = 0; i < disks; i++) -> for each rdev in conf // first rdev is normal r1_bio->bios[0] = bio; -> set to original bio // second rdev is blocked if (test_bit(Blocked, &rdev->flags)) break
if (blocked_rdev) free_r1bio() put_all_bios() bio_put(r1_bio->bios[0]) -> original bio is freed
Test scripts:
mdadm -CR /dev/md0 -l1 -n4 /dev/sd[abcd] --assume-clean fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 \ -iodepth=128 -name=test -direct=1 echo blocked > /sys/block/md0/md/rd2/state
Test result:
BUG bio-264 (Not tainted): Object already free -----------------------------------------------------------------------------
Allocated in mempool_alloc_slab+0x24/0x50 age=1 cpu=1 pid=869 kmem_cache_alloc+0x324/0x480 mempool_alloc_slab+0x24/0x50 mempool_alloc+0x6e/0x220 bio_alloc_bioset+0x1af/0x4d0 blkdev_direct_IO+0x164/0x8a0 blkdev_write_iter+0x309/0x440 aio_write+0x139/0x2f0 io_submit_one+0x5ca/0xb70 __do_sys_io_submit+0x86/0x270 __x64_sys_io_submit+0x22/0x30 do_syscall_64+0xb1/0x210 entry_SYSCALL_64_after_hwframe+0x6c/0x74 Freed in mempool_free_slab+0x1f/0x30 age=1 cpu=1 pid=869 kmem_cache_free+0x28c/0x550 mempool_free_slab+0x1f/0x30 mempool_free+0x40/0x100 bio_free+0x59/0x80 bio_put+0xf0/0x220 free_r1bio+0x74/0xb0 raid1_make_request+0xadf/0x1150 md_handle_request+0xc7/0x3b0 md_submit_bio+0x76/0x130 __submit_bio+0xd8/0x1d0 submit_bio_noacct_nocheck+0x1eb/0x5c0 submit_bio_noacct+0x169/0xd40 submit_bio+0xee/0x1d0 blkdev_direct_IO+0x322/0x8a0 blkdev_write_iter+0x309/0x440 aio_write+0x139/0x2f0
Since that bios for underlying disks are not allocated yet, fix this problem by using mempool_free() directly to free the r1_bio.
Fixes: 992db13a4aee ("md/raid1: free the r1bio before waiting for blocked rdev") Cc: stable@vger.kernel.org # v6.6+ Reported-by: Coly Li colyli@suse.de Signed-off-by: Yu Kuai yukuai3@huawei.com Tested-by: Coly Li colyli@suse.de Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20240308093726.1047420-1-yukuai1@huaweicloud.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/raid1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1474,7 +1474,7 @@ static void raid1_write_request(struct m for (j = 0; j < i; j++) if (r1_bio->bios[j]) rdev_dec_pending(conf->mirrors[j].rdev, mddev); - free_r1bio(r1_bio); + mempool_free(r1_bio, &conf->r1bio_pool); allow_barrier(conf, bio->bi_iter.bi_sector);
if (bio->bi_opf & REQ_NOWAIT) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit ffe3986fece696cf65e0ef99e74c75f848be8e30 upstream.
The "buffer_percent" logic that is used by the ring buffer splice code to only wake up the tasks when there's no data after the buffer is filled to the percentage of the "buffer_percent" file is dependent on three variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed 2) pages_lost - incremented every time a writer overwrites a sub-buffer 3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been touched, minus the number of sub-bufs lost and sub-bufs consumed. This is divided by the total count to give the buffer percentage. When the percentage is greater than the value in the "buffer_percent" file, it wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was constantly decreasing the longer the trace was running. That is, if one asked for 60%, it would read over 60% when it first starts tracing, but then it would be woken up at under 60% and would slowly decrease the amount of data read after being woken up, where the amount becomes much less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This value is incremented whenever a writer transfers to a new sub-buffer. But the place where it was incremented was incorrect. If a writer overflowed the current sub-buffer it would go to the next one. If it gets preempted by an interrupt at that time, and the interrupt performs a trace, it too will end up going to the next sub-buffer. But only one should increment the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a try_cmpxchg(), and on success, perform the increment of pages_touched. This will only increment the counter once for when the writer moves to a new sub-buffer, and not when there's a race and is incremented for when a writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.l...
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1400,7 +1400,6 @@ static void rb_tail_page_update(struct r old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write); old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched); /* * Just make sure we have seen our old_write and synchronize * with any interrupts that come in. @@ -1437,8 +1436,9 @@ static void rb_tail_page_update(struct r */ local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */ - (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page); + /* Either we update tail_page or an interrupt does */ + if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page)) + local_inc(&cpu_buffer->pages_touched); } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
commit 45d355a926ab40f3ae7bc0b0a00cb0e3e8a5a810 upstream.
In 'hci_req_sync_complete()', always free the previous sync request state before assigning reference to a new one.
Reported-by: syzbot+39ec16ff6cc18b1d066d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=39ec16ff6cc18b1d066d Cc: stable@vger.kernel.org Fixes: f60cb30579d3 ("Bluetooth: Convert hci_req_sync family of function to new request API") Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_request.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/bluetooth/hci_request.c +++ b/net/bluetooth/hci_request.c @@ -105,8 +105,10 @@ void hci_req_sync_complete(struct hci_de if (hdev->req_status == HCI_REQ_PEND) { hdev->req_result = result; hdev->req_status = HCI_REQ_DONE; - if (skb) + if (skb) { + kfree_skb(hdev->req_skb); hdev->req_skb = skb_get(skb); + } wake_up_interruptible(&hdev->req_wait_q); } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Huang Tim.Huang@amd.com
commit 31729e8c21ecfd671458e02b6511eb68c2225113 upstream.
While doing multiple S4 stress tests, GC/RLC/PMFW get into an invalid state resulting into hard hangs.
Adding a GFX reset as workaround just before sending the MP1_UNLOAD message avoids this failure.
Signed-off-by: Tim Huang Tim.Huang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: Mario Limonciello superm1@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c @@ -226,8 +226,18 @@ static int smu_v13_0_4_system_features_c struct amdgpu_device *adev = smu->adev; int ret = 0;
- if (!en && !adev->in_s0ix) + if (!en && !adev->in_s0ix) { + /* Adds a GFX reset as workaround just before sending the + * MP1_UNLOAD message to prevent GC/RLC/PMFW from entering + * an invalid state. + */ + ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, + SMU_RESET_MODE_2, NULL); + if (ret) + return ret; + ret = smu_cmn_send_smc_msg(smu, SMU_MSG_PrepareMp1ForUnload, NULL); + }
return ret; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Noah Loomans noah@noahloomans.com
commit 5e700b384ec13f5bcac9855cb28fcc674f1d3593 upstream.
The cros_ec_uart_probe() function calls devm_serdev_device_open() before it calls serdev_device_set_client_ops(). This can trigger a NULL pointer dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000000 ... Call Trace: <TASK> ... ? ttyport_receive_buf
A simplified version of crashing code is as follows:
static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl, const u8 *data, size_t count) { struct serdev_device *serdev = ctrl->serdev;
if (!serdev || !serdev->ops->receive_buf) // CRASH! return 0;
return serdev->ops->receive_buf(serdev, data, count); }
It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops will also exist. This conflicts with the existing cros_ec_uart_probe() logic, as it first calls devm_serdev_device_open() (which sets SERPORT_ACTIVE), and only later sets serdev->ops via serdev_device_set_client_ops().
Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition") attempted to fix a similar race condition, but while doing so, made the window of error for this race condition to happen much wider.
Attempt to fix the race condition again, making sure we fully setup before calling devm_serdev_device_open().
Fixes: 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition") Cc: stable@vger.kernel.org Signed-off-by: Noah Loomans noah@noahloomans.com Reviewed-by: Guenter Roeck groeck@chromium.org Link: https://lore.kernel.org/r/20240410182618.169042-2-noah@noahloomans.com Signed-off-by: Tzung-Bi Shih tzungbi@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/chrome/cros_ec_uart.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-)
--- a/drivers/platform/chrome/cros_ec_uart.c +++ b/drivers/platform/chrome/cros_ec_uart.c @@ -263,12 +263,6 @@ static int cros_ec_uart_probe(struct ser if (!ec_dev) return -ENOMEM;
- ret = devm_serdev_device_open(dev, serdev); - if (ret) { - dev_err(dev, "Unable to open UART device"); - return ret; - } - serdev_device_set_drvdata(serdev, ec_dev); init_waitqueue_head(&ec_uart->response.wait_queue);
@@ -280,14 +274,6 @@ static int cros_ec_uart_probe(struct ser return ret; }
- ret = serdev_device_set_baudrate(serdev, ec_uart->baudrate); - if (ret < 0) { - dev_err(dev, "Failed to set up host baud rate (%d)", ret); - return ret; - } - - serdev_device_set_flow_control(serdev, ec_uart->flowcontrol); - /* Initialize ec_dev for cros_ec */ ec_dev->phys_name = dev_name(dev); ec_dev->dev = dev; @@ -301,6 +287,20 @@ static int cros_ec_uart_probe(struct ser
serdev_device_set_client_ops(serdev, &cros_ec_uart_client_ops);
+ ret = devm_serdev_device_open(dev, serdev); + if (ret) { + dev_err(dev, "Unable to open UART device"); + return ret; + } + + ret = serdev_device_set_baudrate(serdev, ec_uart->baudrate); + if (ret < 0) { + dev_err(dev, "Failed to set up host baud rate (%d)", ret); + return ret; + } + + serdev_device_set_flow_control(serdev, ec_uart->flowcontrol); + return cros_ec_register(ec_dev); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit d730192ff0246356a2d7e63ff5bd501060670eec upstream.
On the Toshiba Encore WT10-A tablet the BATC battery ACPI device depends on 3 other devices:
Name (_DEP, Package (0x03) // _DEP: Dependencies { I2C1, GPO2, GPO0 })
acpi_scan_check_dep() adds all 3 of these to the acpi_dep_list and then before an acpi_device is created for the BATC handle (and thus before acpi_scan_dep_init() runs) acpi_scan_clear_dep() gets called for both GPIO depenencies, with free_when_met not set for the dependencies.
Since there is no adev for BATC yet, there also is no dep_unmet to decrement. The only result of acpi_scan_clear_dep() in this case is dep->met getting set.
Soon after acpi_scan_clear_dep() has been called for the GPIO dependencies the acpi_device gets created for the BATC handle and acpi_scan_dep_init() runs, this sees 3 dependencies on the acpi_dep_list and initializes unmet_dep to 3. Later when the dependency for I2C1 is met unmet_dep becomes 2, but since the 2 GPIO deps where already met it never becomes 0 causing battery monitoring to not work.
Fix this by modifying acpi_scan_dep_init() to not increase dep_met for dependencies which have already been marked as being met.
Fixes: 3ba12d8de3fa ("ACPI: scan: Reduce overhead related to devices with dependencies") Signed-off-by: Hans de Goede hdegoede@redhat.com Cc: 6.5+ stable@vger.kernel.org # 6.5+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/scan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1802,7 +1802,8 @@ static void acpi_scan_dep_init(struct ac if (dep->honor_dep) adev->flags.honor_deps = 1;
- adev->dep_unmet++; + if (!dep->met) + adev->dep_unmet++; } } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anna-Maria Behnsen anna-maria@linutronix.de
commit 3c89a068bfd0698a5478f4cf39493595ef757d5e upstream.
s2idle works like a regular suspend with freezing processes and freezing devices. All CPUs except the control CPU go into idle. Once this is completed the control CPU kicks all other CPUs out of idle, so that they reenter the idle loop and then enter s2idle state. The control CPU then issues an swait() on the suspend state and therefore enters the idle loop as well.
Due to being kicked out of idle, the other CPUs leave their NOHZ states, which means the tick is active and the corresponding hrtimer is programmed to the next jiffie.
On entering s2idle the CPUs shut down their local clockevent device to prevent wakeups. The last CPU which enters s2idle shuts down its local clockevent and freezes timekeeping.
On resume, one of the CPUs receives the wakeup interrupt, unfreezes timekeeping and its local clockevent and starts the resume process. At that point all other CPUs are still in s2idle with their clockevents switched off. They only resume when they are kicked by another CPU or after resuming devices and then receiving a device interrupt.
That means there is no guarantee that all CPUs will wakeup directly on resume. As a consequence there is no guarantee that timers which are queued on those CPUs and should expire directly after resume, are handled. Also timer list timers which are remotely queued to one of those CPUs after resume will not result in a reprogramming IPI as the tick is active. Queueing a hrtimer will also not result in a reprogramming IPI because the first hrtimer event is already in the past.
The recent introduction of the timer pull model (7ee988770326 ("timers: Implement the hierarchical pull model")) amplifies this problem, if the current migrator is one of the non woken up CPUs. When a non pinned timer list timer is queued and the queuing CPU goes idle, it relies on the still suspended migrator CPU to expire the timer which will happen by chance.
The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which in turn invoked a wakeup for all idle CPUs was moved to a later point in the resume process. This might not be reached or reached very late because it waits on a timer of a still suspended CPU.
Address this by kicking all CPUs out of idle after the control CPU returns from swait() so that they resume their timers and restore consistent system state.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641 Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path") Signed-off-by: Anna-Maria Behnsen anna-maria@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Tested-by: Mario Limonciello mario.limonciello@amd.com Cc: 5.16+ stable@kernel.org # 5.16+ Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/suspend.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -106,6 +106,12 @@ static void s2idle_enter(void) swait_event_exclusive(s2idle_wait_head, s2idle_state == S2IDLE_STATE_WAKE);
+ /* + * Kick all CPUs to ensure that they resume their timers and restore + * consistent system state. + */ + wake_up_all_idle_cpus(); + cpus_read_unlock();
raw_spin_lock_irq(&s2idle_lock);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nini Song nini.song@mediatek.com
commit ce5d241c3ad4568c12842168288993234345c0eb upstream.
The valid_la is used to check the length requirements, including special cases of Timer Status. If the length is shorter than 5, that means no Duration Available is returned, the message will be forced to be invalid.
However, the description of Duration Available in the spec is that this parameter may be returned when these cases, or that it can be optionally return when these cases. The key words in the spec description are flexible choices.
Remove the special length check of Timer Status to fit the spec which is not compulsory about that.
Signed-off-by: Nini Song nini.song@mediatek.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/cec/core/cec-adap.c | 14 -------------- 1 file changed, 14 deletions(-)
--- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1151,20 +1151,6 @@ void cec_received_msg_ts(struct cec_adap if (valid_la && min_len) { /* These messages have special length requirements */ switch (cmd) { - case CEC_MSG_TIMER_STATUS: - if (msg->msg[2] & 0x10) { - switch (msg->msg[2] & 0xf) { - case CEC_OP_PROG_INFO_NOT_ENOUGH_SPACE: - case CEC_OP_PROG_INFO_MIGHT_NOT_BE_ENOUGH_SPACE: - if (msg->len < 5) - valid_la = false; - break; - } - } else if ((msg->msg[2] & 0xf) == CEC_OP_PROG_ERROR_DUPLICATE) { - if (msg->len < 5) - valid_la = false; - } - break; case CEC_MSG_RECORD_ON: switch (msg->msg[2]) { case CEC_OP_RECORD_SRC_OWN:
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
commit b2136cc288fce2f24a92f3d656531b2d50ebec5a upstream.
Allocate fs_info and root to have a valid fs_info pointer in case it's dereferenced by a helper outside of tests, like find_lock_delalloc_range().
Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tests/extent-io-tests.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)
--- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -11,6 +11,7 @@ #include "btrfs-tests.h" #include "../ctree.h" #include "../extent_io.h" +#include "../disk-io.h" #include "../btrfs_inode.h"
#define PROCESS_UNLOCK (1 << 0) @@ -105,9 +106,11 @@ static void dump_extent_io_tree(const st } }
-static int test_find_delalloc(u32 sectorsize) +static int test_find_delalloc(u32 sectorsize, u32 nodesize) { - struct inode *inode; + struct btrfs_fs_info *fs_info; + struct btrfs_root *root = NULL; + struct inode *inode = NULL; struct extent_io_tree *tmp; struct page *page; struct page *locked_page = NULL; @@ -121,12 +124,27 @@ static int test_find_delalloc(u32 sector
test_msg("running find delalloc tests");
+ fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize); + if (!fs_info) { + test_std_err(TEST_ALLOC_FS_INFO); + return -ENOMEM; + } + + root = btrfs_alloc_dummy_root(fs_info); + if (IS_ERR(root)) { + test_std_err(TEST_ALLOC_ROOT); + ret = PTR_ERR(root); + goto out; + } + inode = btrfs_new_test_inode(); if (!inode) { test_std_err(TEST_ALLOC_INODE); - return -ENOMEM; + ret = -ENOMEM; + goto out; } tmp = &BTRFS_I(inode)->io_tree; + BTRFS_I(inode)->root = root;
/* * Passing NULL as we don't have fs_info but tracepoints are not used @@ -316,6 +334,8 @@ out: process_page_range(inode, 0, total_dirty - 1, PROCESS_UNLOCK | PROCESS_RELEASE); iput(inode); + btrfs_free_dummy_root(root); + btrfs_free_dummy_fs_info(fs_info); return ret; }
@@ -794,7 +814,7 @@ int btrfs_test_extent_io(u32 sectorsize,
test_msg("running extent I/O tests");
- ret = test_find_delalloc(sectorsize); + ret = test_find_delalloc(sectorsize, nodesize); if (ret) goto out;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit 95f37eb52e18879a1b16e51b972d992b39e50a81 ]
The GPIO bank width is 32 on OMAP2, so all labels are incorrect.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Message-ID: 20240223181439.1099750-2-aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-omap2/board-n8x0.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/arm/mach-omap2/board-n8x0.c b/arch/arm/mach-omap2/board-n8x0.c index 31755a378c736..3e48f34016c19 100644 --- a/arch/arm/mach-omap2/board-n8x0.c +++ b/arch/arm/mach-omap2/board-n8x0.c @@ -144,8 +144,7 @@ static struct gpiod_lookup_table nokia8xx_mmc_gpio_table = { .dev_id = "mmci-omap.0", .table = { /* Slot switch, GPIO 96 */ - GPIO_LOOKUP("gpio-80-111", 16, - "switch", GPIO_ACTIVE_HIGH), + GPIO_LOOKUP("gpio-96-127", 0, "switch", GPIO_ACTIVE_HIGH), { } }, }; @@ -154,11 +153,9 @@ static struct gpiod_lookup_table nokia810_mmc_gpio_table = { .dev_id = "mmci-omap.0", .table = { /* Slot index 1, VSD power, GPIO 23 */ - GPIO_LOOKUP_IDX("gpio-16-31", 7, - "vsd", 1, GPIO_ACTIVE_HIGH), + GPIO_LOOKUP_IDX("gpio-0-31", 23, "vsd", 1, GPIO_ACTIVE_HIGH), /* Slot index 1, VIO power, GPIO 9 */ - GPIO_LOOKUP_IDX("gpio-0-15", 9, - "vio", 1, GPIO_ACTIVE_HIGH), + GPIO_LOOKUP_IDX("gpio-0-31", 9, "vio", 1, GPIO_ACTIVE_HIGH), { } }, };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit 480d44d0820dd5ae043dc97c0b46dabbe53cb1cf ]
Trying to append a second table for the same dev_id doesn't seem to work. The second table is just silently ignored. As a result eMMC GPIOs are not present.
Fix by using separate tables for N800 and N810.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Message-ID: 20240223181439.1099750-3-aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-omap2/board-n8x0.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-omap2/board-n8x0.c b/arch/arm/mach-omap2/board-n8x0.c index 3e48f34016c19..c933a91751e4f 100644 --- a/arch/arm/mach-omap2/board-n8x0.c +++ b/arch/arm/mach-omap2/board-n8x0.c @@ -140,7 +140,7 @@ static int slot1_cover_open; static int slot2_cover_open; static struct device *mmc_device;
-static struct gpiod_lookup_table nokia8xx_mmc_gpio_table = { +static struct gpiod_lookup_table nokia800_mmc_gpio_table = { .dev_id = "mmci-omap.0", .table = { /* Slot switch, GPIO 96 */ @@ -152,6 +152,8 @@ static struct gpiod_lookup_table nokia8xx_mmc_gpio_table = { static struct gpiod_lookup_table nokia810_mmc_gpio_table = { .dev_id = "mmci-omap.0", .table = { + /* Slot switch, GPIO 96 */ + GPIO_LOOKUP("gpio-96-127", 0, "switch", GPIO_ACTIVE_HIGH), /* Slot index 1, VSD power, GPIO 23 */ GPIO_LOOKUP_IDX("gpio-0-31", 23, "vsd", 1, GPIO_ACTIVE_HIGH), /* Slot index 1, VIO power, GPIO 9 */ @@ -412,8 +414,6 @@ static struct omap_mmc_platform_data *mmc_data[OMAP24XX_NR_MMC];
static void __init n8x0_mmc_init(void) { - gpiod_add_lookup_table(&nokia8xx_mmc_gpio_table); - if (board_is_n810()) { mmc1_data.slots[0].name = "external";
@@ -426,6 +426,8 @@ static void __init n8x0_mmc_init(void) mmc1_data.slots[1].name = "internal"; mmc1_data.slots[1].ban_openended = 1; gpiod_add_lookup_table(&nokia810_mmc_gpio_table); + } else { + gpiod_add_lookup_table(&nokia800_mmc_gpio_table); }
mmc1_data.nr_slots = 2;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit d4debbcbffa45c3de5df0040af2eea74a9e794a3 ]
The lookup is done before host->dev is initialized. It will always just fail silently, and the MMC behaviour is totally unpredictable as the switch is left in an undefined state. Fix that.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Message-ID: 20240223181439.1099750-4-aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c index 9fb8995b43a1c..aa40e1a9dc29e 100644 --- a/drivers/mmc/host/omap.c +++ b/drivers/mmc/host/omap.c @@ -1384,13 +1384,6 @@ static int mmc_omap_probe(struct platform_device *pdev) if (IS_ERR(host->virt_base)) return PTR_ERR(host->virt_base);
- host->slot_switch = gpiod_get_optional(host->dev, "switch", - GPIOD_OUT_LOW); - if (IS_ERR(host->slot_switch)) - return dev_err_probe(host->dev, PTR_ERR(host->slot_switch), - "error looking up slot switch GPIO\n"); - - INIT_WORK(&host->slot_release_work, mmc_omap_slot_release_work); INIT_WORK(&host->send_stop_work, mmc_omap_send_stop_work);
@@ -1409,6 +1402,12 @@ static int mmc_omap_probe(struct platform_device *pdev) host->dev = &pdev->dev; platform_set_drvdata(pdev, host);
+ host->slot_switch = gpiod_get_optional(host->dev, "switch", + GPIOD_OUT_LOW); + if (IS_ERR(host->slot_switch)) + return dev_err_probe(host->dev, PTR_ERR(host->slot_switch), + "error looking up slot switch GPIO\n"); + host->id = pdev->id; host->irq = irq; host->phys_base = res->start;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit f6862c7f156d04f81c38467e1c304b7e9517e810 ]
After a deferred probe, GPIO descriptor lookup will fail with EBUSY. Fix by using managed descriptors.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Message-ID: 20240223181439.1099750-5-aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c index aa40e1a9dc29e..50408771ae01c 100644 --- a/drivers/mmc/host/omap.c +++ b/drivers/mmc/host/omap.c @@ -1259,18 +1259,18 @@ static int mmc_omap_new_slot(struct mmc_omap_host *host, int id) slot->pdata = &host->pdata->slots[id];
/* Check for some optional GPIO controls */ - slot->vsd = gpiod_get_index_optional(host->dev, "vsd", - id, GPIOD_OUT_LOW); + slot->vsd = devm_gpiod_get_index_optional(host->dev, "vsd", + id, GPIOD_OUT_LOW); if (IS_ERR(slot->vsd)) return dev_err_probe(host->dev, PTR_ERR(slot->vsd), "error looking up VSD GPIO\n"); - slot->vio = gpiod_get_index_optional(host->dev, "vio", - id, GPIOD_OUT_LOW); + slot->vio = devm_gpiod_get_index_optional(host->dev, "vio", + id, GPIOD_OUT_LOW); if (IS_ERR(slot->vio)) return dev_err_probe(host->dev, PTR_ERR(slot->vio), "error looking up VIO GPIO\n"); - slot->cover = gpiod_get_index_optional(host->dev, "cover", - id, GPIOD_IN); + slot->cover = devm_gpiod_get_index_optional(host->dev, "cover", + id, GPIOD_IN); if (IS_ERR(slot->cover)) return dev_err_probe(host->dev, PTR_ERR(slot->cover), "error looking up cover switch GPIO\n"); @@ -1402,8 +1402,8 @@ static int mmc_omap_probe(struct platform_device *pdev) host->dev = &pdev->dev; platform_set_drvdata(pdev, host);
- host->slot_switch = gpiod_get_optional(host->dev, "switch", - GPIOD_OUT_LOW); + host->slot_switch = devm_gpiod_get_optional(host->dev, "switch", + GPIOD_OUT_LOW); if (IS_ERR(host->slot_switch)) return dev_err_probe(host->dev, PTR_ERR(host->slot_switch), "error looking up slot switch GPIO\n");
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit 894ad61b85d6ba8efd4274aa8719d9ff1c89ea54 ]
Commit e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") moved Nokia N810 MMC power up/down from the board file into the MMC driver.
The change removed some delays, and ordering without a valid reason. Restore power up/down to match the original code. This matters only on N810 where the 2nd GPIO is in use. Other boards will see an additional delay but that should be a lesser concern than omitting delays altogether.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Message-ID: 20240223181439.1099750-6-aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c index 50408771ae01c..13fa8588e38c1 100644 --- a/drivers/mmc/host/omap.c +++ b/drivers/mmc/host/omap.c @@ -1119,10 +1119,25 @@ static void mmc_omap_set_power(struct mmc_omap_slot *slot, int power_on,
host = slot->host;
- if (slot->vsd) - gpiod_set_value(slot->vsd, power_on); - if (slot->vio) - gpiod_set_value(slot->vio, power_on); + if (power_on) { + if (slot->vsd) { + gpiod_set_value(slot->vsd, power_on); + msleep(1); + } + if (slot->vio) { + gpiod_set_value(slot->vio, power_on); + msleep(1); + } + } else { + if (slot->vio) { + gpiod_set_value(slot->vio, power_on); + msleep(50); + } + if (slot->vsd) { + gpiod_set_value(slot->vsd, power_on); + msleep(50); + } + }
if (slot->pdata->set_power != NULL) slot->pdata->set_power(mmc_dev(slot->mmc), slot->id, power_on,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaro Koskinen aaro.koskinen@iki.fi
[ Upstream commit 4421405e3634a3189b541cf1e34598e44260720d ]
GPIO chip labels are wrong for OMAP2, so the USB does not work. Fix.
Fixes: 8e0285ab95a9 ("ARM/musb: omap2: Remove global GPIO numbers from TUSB6010") Signed-off-by: Aaro Koskinen aaro.koskinen@iki.fi Reviewed-by: Linus Walleij linus.walleij@linaro.org Message-ID: 20240223181656.1099845-1-aaro.koskinen@iki.fi Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-omap2/board-n8x0.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/arm/mach-omap2/board-n8x0.c b/arch/arm/mach-omap2/board-n8x0.c index c933a91751e4f..ff2a4a4d82204 100644 --- a/arch/arm/mach-omap2/board-n8x0.c +++ b/arch/arm/mach-omap2/board-n8x0.c @@ -79,10 +79,8 @@ static struct musb_hdrc_platform_data tusb_data = { static struct gpiod_lookup_table tusb_gpio_table = { .dev_id = "musb-tusb", .table = { - GPIO_LOOKUP("gpio-0-15", 0, "enable", - GPIO_ACTIVE_HIGH), - GPIO_LOOKUP("gpio-48-63", 10, "int", - GPIO_ACTIVE_HIGH), + GPIO_LOOKUP("gpio-0-31", 0, "enable", GPIO_ACTIVE_HIGH), + GPIO_LOOKUP("gpio-32-63", 26, "int", GPIO_ACTIVE_HIGH), { } }, };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Wiklander jens.wiklander@linaro.org
[ Upstream commit 1a4bd2b128fb5ca62e4d1c5ca298d3d06b9c1e8e ]
FFA_NOTIFICATION_INFO_GET retrieves information about pending notifications. Notifications can be either global or per VCPU. Global notifications are reported with the partition ID only in the list of endpoints with pending notifications. ffa_notification_info_get() incorrectly expect no ID at all for global notifications. Fix this by checking for ID = 1 instead of ID = 0.
Fixes: 3522be48d82b ("firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface") Signed-off-by: Jens Wiklander jens.wiklander@linaro.org Reviewed-by: Lorenzo Pieralisi lpieralisi@kernel.org Link: https://lore.kernel.org/r/20240311110700.2367142-1-jens.wiklander@linaro.org Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_ffa/driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c index f2556a8e94015..9bc2e10381afd 100644 --- a/drivers/firmware/arm_ffa/driver.c +++ b/drivers/firmware/arm_ffa/driver.c @@ -790,7 +790,7 @@ static void ffa_notification_info_get(void)
part_id = packed_id_list[ids_processed++];
- if (!ids_count[list]) { /* Global Notification */ + if (ids_count[list] == 1) { /* Global Notification */ __do_sched_recv_cb(part_id, 0, false); continue; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit b70c7996d4ffb2e02895132e8a79a37cee66504f ]
SCMI raw debugfs entries are used to inject and snoop messages out of the SCMI core and, as such, the underlying virtual files have no reason to support seeking.
Modify the related file_operations descriptors to be non-seekable.
Fixes: 3c3d818a9317 ("firmware: arm_scmi: Add core raw transmission support") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20240315140324.231830-1-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/raw_mode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c index 3505735185033..130d13e9cd6be 100644 --- a/drivers/firmware/arm_scmi/raw_mode.c +++ b/drivers/firmware/arm_scmi/raw_mode.c @@ -921,7 +921,7 @@ static int scmi_dbg_raw_mode_open(struct inode *inode, struct file *filp) rd->raw = raw; filp->private_data = rd;
- return 0; + return nonseekable_open(inode, filp); }
static int scmi_dbg_raw_mode_release(struct inode *inode, struct file *filp) @@ -950,6 +950,7 @@ static const struct file_operations scmi_dbg_raw_mode_reset_fops = { .open = scmi_dbg_raw_mode_open, .release = scmi_dbg_raw_mode_release, .write = scmi_dbg_raw_mode_reset_write, + .llseek = no_llseek, .owner = THIS_MODULE, };
@@ -959,6 +960,7 @@ static const struct file_operations scmi_dbg_raw_mode_message_fops = { .read = scmi_dbg_raw_mode_message_read, .write = scmi_dbg_raw_mode_message_write, .poll = scmi_dbg_raw_mode_message_poll, + .llseek = no_llseek, .owner = THIS_MODULE, };
@@ -975,6 +977,7 @@ static const struct file_operations scmi_dbg_raw_mode_message_async_fops = { .read = scmi_dbg_raw_mode_message_read, .write = scmi_dbg_raw_mode_message_async_write, .poll = scmi_dbg_raw_mode_message_poll, + .llseek = no_llseek, .owner = THIS_MODULE, };
@@ -998,6 +1001,7 @@ static const struct file_operations scmi_dbg_raw_mode_notification_fops = { .release = scmi_dbg_raw_mode_release, .read = scmi_test_dbg_raw_mode_notif_read, .poll = scmi_test_dbg_raw_mode_notif_poll, + .llseek = no_llseek, .owner = THIS_MODULE, };
@@ -1021,6 +1025,7 @@ static const struct file_operations scmi_dbg_raw_mode_errors_fops = { .release = scmi_dbg_raw_mode_release, .read = scmi_test_dbg_raw_mode_errors_read, .poll = scmi_test_dbg_raw_mode_errors_poll, + .llseek = no_llseek, .owner = THIS_MODULE, };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuquan Wang wangyuquan1236@phytium.com.cn
[ Upstream commit b7c59b038c656214f56432867056997c2e0fc268 ]
The dev_dbg info for Clear Event Records mailbox command would report the handle of the next record to clear not the current one.
This was because the index 'i' had incremented before printing the current handle value.
Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load") Signed-off-by: Yuquan Wang wangyuquan1236@phytium.com.cn Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Fan Ni fan.ni@samsung.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/mbox.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9adda4795eb78..50146161887d5 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -915,7 +915,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
payload->handles[i++] = gen->hdr.handle; dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log, - le16_to_cpu(payload->handles[i])); + le16_to_cpu(payload->handles[i - 1]));
if (i == max_handles) { payload->nr_recs = i;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 5c88a9ccd4c431d58b532e4158b6999a8350062c ]
In the error path, map->reg_type is being used for kernel warning before its value is setup. Found by code inspection. Exposure to user is wrong reg_type being emitted via kernel log. Use a local var for reg_type and retrieve value for usage.
Fixes: 6c7f4f1e51c2 ("cxl/core/regs: Make cxl_map_{component, device}_regs() device generic") Reviewed-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Davidlohr Bueso dave@stgolabs.net Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/regs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c index 372786f809555..3c42f984eeafa 100644 --- a/drivers/cxl/core/regs.c +++ b/drivers/cxl/core/regs.c @@ -271,6 +271,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_map_device_regs, CXL); static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi, struct cxl_register_map *map) { + u8 reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo); int bar = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BIR_MASK, reg_lo); u64 offset = ((u64)reg_hi << 32) | (reg_lo & CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK); @@ -278,11 +279,11 @@ static bool cxl_decode_regblock(struct pci_dev *pdev, u32 reg_lo, u32 reg_hi, if (offset > pci_resource_len(pdev, bar)) { dev_warn(&pdev->dev, "BAR%d: %pr: too small (offset: %pa, type: %d)\n", bar, - &pdev->resource[bar], &offset, map->reg_type); + &pdev->resource[bar], &offset, reg_type); return false; }
- map->reg_type = FIELD_GET(CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK, reg_lo); + map->reg_type = reg_type; map->resource = pci_resource_start(pdev, bar) + offset; map->max_size = pci_resource_len(pdev, bar) - offset; return true;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit 8cb10cba124c4798b6cb333245ecdc8dde78aeae ]
When using usb-conn-gpio to control USB role and VBUS, the vbus-supply property must be present in the usb-conn-gpio node. Additionally it should not be present in the phy node as that isn't what controls vbus and will upset the use count.
This resolves an issue where VBUS is enabled with OTG in peripheral mode.
Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi index 41c79d2ebdd62..f24b14744799e 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi @@ -14,6 +14,7 @@ connector { pinctrl-0 = <&pinctrl_usbcon1>; type = "micro"; label = "otg"; + vbus-supply = <®_usb1_vbus>; id-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
port { @@ -183,7 +184,6 @@ &usb3_0 { };
&usb3_phy0 { - vbus-supply = <®_usb1_vbus>; status = "okay"; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Harvey tharvey@gateworks.com
[ Upstream commit 6f8e0aca838e163e81fde176e945161d50679339 ]
When using usb-conn-gpio to control USB role and VBUS, the vbus-supply property must be present in the usb-conn-gpio node. Additionally it should not be present in the phy node as that isn't what controls vbus and will upset the use count.
This resolves an issue where VBUS is enabled with OTG in peripheral mode.
Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description") Signed-off-by: Tim Harvey tharvey@gateworks.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi index d5c400b355af5..f5491a608b2f3 100644 --- a/arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi @@ -14,6 +14,7 @@ connector { pinctrl-0 = <&pinctrl_usbcon1>; type = "micro"; label = "otg"; + vbus-supply = <®_usb1_vbus>; id-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
port { @@ -202,7 +203,6 @@ &usb3_0 { };
&usb3_phy0 { - vbus-supply = <®_usb1_vbus>; status = "okay"; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Boyd swboyd@chromium.org
[ Upstream commit c588f7d67044d6d59ef92d75a970b64929984d89 ]
These debug prints are missing newlines, leading to multiple messages being printed on one line and hard to read logs. Add newlines to have the debug prints on separate lines. The DBG macro used to add a newline, but I missed that while migrating to drm_dbg wrappers.
Fixes: 7cb017db1896 ("drm/msm: Move FB debug prints to drm_dbg_state()") Fixes: 721c6e0c6aed ("drm/msm: Move vblank debug prints to drm_dbg_vbl()") Signed-off-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/584769/ Link: https://lore.kernel.org/r/20240325210810.1340820-1-swboyd@chromium.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_fb.c | 6 +++--- drivers/gpu/drm/msm/msm_kms.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c index e3f61c39df69b..80166f702a0db 100644 --- a/drivers/gpu/drm/msm/msm_fb.c +++ b/drivers/gpu/drm/msm/msm_fb.c @@ -89,7 +89,7 @@ int msm_framebuffer_prepare(struct drm_framebuffer *fb,
for (i = 0; i < n; i++) { ret = msm_gem_get_and_pin_iova(fb->obj[i], aspace, &msm_fb->iova[i]); - drm_dbg_state(fb->dev, "FB[%u]: iova[%d]: %08llx (%d)", + drm_dbg_state(fb->dev, "FB[%u]: iova[%d]: %08llx (%d)\n", fb->base.id, i, msm_fb->iova[i], ret); if (ret) return ret; @@ -176,7 +176,7 @@ static struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev, const struct msm_format *format; int ret, i, n;
- drm_dbg_state(dev, "create framebuffer: mode_cmd=%p (%dx%d@%4.4s)", + drm_dbg_state(dev, "create framebuffer: mode_cmd=%p (%dx%d@%4.4s)\n", mode_cmd, mode_cmd->width, mode_cmd->height, (char *)&mode_cmd->pixel_format);
@@ -232,7 +232,7 @@ static struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev,
refcount_set(&msm_fb->dirtyfb, 1);
- drm_dbg_state(dev, "create: FB ID: %d (%p)", fb->base.id, fb); + drm_dbg_state(dev, "create: FB ID: %d (%p)\n", fb->base.id, fb);
return fb;
diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c index 84c21ec2ceeae..af6a6fcb11736 100644 --- a/drivers/gpu/drm/msm/msm_kms.c +++ b/drivers/gpu/drm/msm/msm_kms.c @@ -149,7 +149,7 @@ int msm_crtc_enable_vblank(struct drm_crtc *crtc) struct msm_kms *kms = priv->kms; if (!kms) return -ENXIO; - drm_dbg_vbl(dev, "crtc=%u", crtc->base.id); + drm_dbg_vbl(dev, "crtc=%u\n", crtc->base.id); return vblank_ctrl_queue_work(priv, crtc, true); }
@@ -160,7 +160,7 @@ void msm_crtc_disable_vblank(struct drm_crtc *crtc) struct msm_kms *kms = priv->kms; if (!kms) return; - drm_dbg_vbl(dev, "crtc=%u", crtc->base.id); + drm_dbg_vbl(dev, "crtc=%u\n", crtc->base.id); vblank_ctrl_queue_work(priv, crtc, false); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit 4f3b77ae5ff5b5ba9d99c5d5450db388dbee5107 ]
The data from catalog is marked as const, so it is a part of the RO segment. Allowing userspace to write to it through debugfs can cause protection faults. Set debugfs file mode to read-only for debug entries corresponding to perf_cfg coming from catalog.
Fixes: abda0d925f9c ("drm/msm/dpu: Mark various data tables as const") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/582844/ Link: https://lore.kernel.org/r/20240314-dpu-perf-rework-v3-1-79fa4e065574@linaro.... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c index ef871239adb2a..68fae048a9a83 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c @@ -459,15 +459,15 @@ int dpu_core_perf_debugfs_init(struct dpu_kms *dpu_kms, struct dentry *parent) &perf->core_clk_rate); debugfs_create_u32("enable_bw_release", 0600, entry, (u32 *)&perf->enable_bw_release); - debugfs_create_u32("threshold_low", 0600, entry, + debugfs_create_u32("threshold_low", 0400, entry, (u32 *)&perf->perf_cfg->max_bw_low); - debugfs_create_u32("threshold_high", 0600, entry, + debugfs_create_u32("threshold_high", 0400, entry, (u32 *)&perf->perf_cfg->max_bw_high); - debugfs_create_u32("min_core_ib", 0600, entry, + debugfs_create_u32("min_core_ib", 0400, entry, (u32 *)&perf->perf_cfg->min_core_ib); - debugfs_create_u32("min_llcc_ib", 0600, entry, + debugfs_create_u32("min_llcc_ib", 0400, entry, (u32 *)&perf->perf_cfg->min_llcc_ib); - debugfs_create_u32("min_dram_ib", 0600, entry, + debugfs_create_u32("min_dram_ib", 0400, entry, (u32 *)&perf->perf_cfg->min_dram_ib); debugfs_create_file("perf_mode", 0600, entry, (u32 *)perf, &dpu_core_perf_mode_fops);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit 8844f467d6a58dc915f241e81c46e0c126f8c070 ]
There is little point in using %ps to print a value known to be NULL. On the other hand it makes sense to print the callback symbol in the 'invalid IRQ' message. Correct those two error messages to make more sense.
Fixes: 6893199183f8 ("drm/msm/dpu: stop using raw IRQ indices in the kernel output") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Marijn Suijten marijn.suijten@somainline.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/585565/ Link: https://lore.kernel.org/r/20240330-dpu-irq-messages-v1-1-9ce782ae35f9@linaro... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c index 946dd0135dffc..6a0a74832fb64 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c @@ -525,14 +525,14 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int ret;
if (!irq_cb) { - DPU_ERROR("invalid IRQ=[%d, %d] irq_cb:%ps\n", - DPU_IRQ_REG(irq_idx), DPU_IRQ_BIT(irq_idx), irq_cb); + DPU_ERROR("IRQ=[%d, %d] NULL callback\n", + DPU_IRQ_REG(irq_idx), DPU_IRQ_BIT(irq_idx)); return -EINVAL; }
if (!dpu_core_irq_is_valid(irq_idx)) { - DPU_ERROR("invalid IRQ=[%d, %d]\n", - DPU_IRQ_REG(irq_idx), DPU_IRQ_BIT(irq_idx)); + DPU_ERROR("invalid IRQ=[%d, %d] irq_cb:%ps\n", + DPU_IRQ_REG(irq_idx), DPU_IRQ_BIT(irq_idx), irq_cb); return -EINVAL; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit be1b7acb929137e3943fe380671242beb485190c ]
As Qualcomm SM8150 got support for the DisplayPort, add displayport@ node as a valid child to the MDSS node.
Fixes: 88806318e2c2 ("dt-bindings: display: msm: dp: declare compatible string for sm8150") Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/586156/ Link: https://lore.kernel.org/r/20240402-fd-fix-schema-v3-1-817ea6ddf775@linaro.or... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../bindings/display/msm/qcom,sm8150-mdss.yaml | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/msm/qcom,sm8150-mdss.yaml b/Documentation/devicetree/bindings/display/msm/qcom,sm8150-mdss.yaml index c0d6a4fdff97e..e6dc5494baee2 100644 --- a/Documentation/devicetree/bindings/display/msm/qcom,sm8150-mdss.yaml +++ b/Documentation/devicetree/bindings/display/msm/qcom,sm8150-mdss.yaml @@ -53,6 +53,15 @@ patternProperties: compatible: const: qcom,sm8150-dpu
+ "^displayport-controller@[0-9a-f]+$": + type: object + additionalProperties: true + + properties: + compatible: + contains: + const: qcom,sm8150-dp + "^dsi@[0-9a-f]+$": type: object additionalProperties: true
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
[ Upstream commit c6ddd6e7b166532a0816825442ff60f70aed9647 ]
The actual clock show wrong frequency:
echo on >/sys/devices/platform/bus@5b000000/5b010000.mmc/power/control cat /sys/kernel/debug/mmc0/ios
clock: 200000000 Hz actual clock: 166000000 Hz ^^^^^^^^^ .....
According to
sdhc0_lpcg: clock-controller@5b200000 { compatible = "fsl,imx8qxp-lpcg"; reg = <0x5b200000 0x10000>; #clock-cells = <1>; clocks = <&clk IMX_SC_R_SDHC_0 IMX_SC_PM_CLK_PER>, <&conn_ipg_clk>, <&conn_axi_clk>; clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>; clock-output-names = "sdhc0_lpcg_per_clk", "sdhc0_lpcg_ipg_clk", "sdhc0_lpcg_ahb_clk"; power-domains = <&pd IMX_SC_R_SDHC_0>; }
"per_clk" should be IMX_LPCG_CLK_0 instead of IMX_LPCG_CLK_5.
After correct clocks order:
echo on >/sys/devices/platform/bus@5b000000/5b010000.mmc/power/control cat /sys/kernel/debug/mmc0/ios
clock: 200000000 Hz actual clock: 198000000 Hz ^^^^^^^^ ...
Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi index 3c42240e78e24..af2259e997967 100644 --- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi @@ -67,8 +67,8 @@ usdhc1: mmc@5b010000 { interrupts = <GIC_SPI 232 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b010000 0x10000>; clocks = <&sdhc0_lpcg IMX_LPCG_CLK_4>, - <&sdhc0_lpcg IMX_LPCG_CLK_0>, - <&sdhc0_lpcg IMX_LPCG_CLK_5>; + <&sdhc0_lpcg IMX_LPCG_CLK_5>, + <&sdhc0_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_0>; status = "disabled"; @@ -78,8 +78,8 @@ usdhc2: mmc@5b020000 { interrupts = <GIC_SPI 233 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b020000 0x10000>; clocks = <&sdhc1_lpcg IMX_LPCG_CLK_4>, - <&sdhc1_lpcg IMX_LPCG_CLK_0>, - <&sdhc1_lpcg IMX_LPCG_CLK_5>; + <&sdhc1_lpcg IMX_LPCG_CLK_5>, + <&sdhc1_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_1>; fsl,tuning-start-tap = <20>; @@ -91,8 +91,8 @@ usdhc3: mmc@5b030000 { interrupts = <GIC_SPI 234 IRQ_TYPE_LEVEL_HIGH>; reg = <0x5b030000 0x10000>; clocks = <&sdhc2_lpcg IMX_LPCG_CLK_4>, - <&sdhc2_lpcg IMX_LPCG_CLK_0>, - <&sdhc2_lpcg IMX_LPCG_CLK_5>; + <&sdhc2_lpcg IMX_LPCG_CLK_5>, + <&sdhc2_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "ahb", "per"; power-domains = <&pd IMX_SC_R_SDHC_2>; status = "disabled";
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kwangjin Ko kwangjin.ko@sk.com
[ Upstream commit f7c52345ccc96343c0a05bdea3121c8ac7b67d5f ]
Since mbox_cmd.size_out is overwritten with the actual output size in the function below, it needs to be initialized every time.
cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
Problem scenario:
1) The size_out variable is initially set to the size of the mailbox. 2) Read an event. - size_out is set to 160 bytes(header 32B + one event 128B). - Two event are created while reading. 3) Read the new *two* events. - size_out is still set to 160 bytes. - Although the value of out_len is 288 bytes, only 160 bytes are copied from the mailbox register to the local variable. - record_count is set to 2. - Accessing records[1] will result in reading incorrect data.
Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load") Tested-by: Ira Weiny ira.weiny@intel.com Reviewed-by: Ira Weiny ira.weiny@intel.com Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Kwangjin Ko kwangjin.ko@sk.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/mbox.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 50146161887d5..f0f54aeccc872 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, .payload_in = &log_type, .size_in = sizeof(log_type), .payload_out = payload, - .size_out = mds->payload_size, .min_out = struct_size(payload, records, 0), };
do { int rc, i;
+ mbox_cmd.size_out = mds->payload_size; + rc = cxl_internal_send_cmd(mds, &mbox_cmd); if (rc) { dev_err_ratelimited(dev,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Constantino dreaming.about.electric.sheep@gmail.com
[ Upstream commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea ]
This reverts commit 5a838e5d5825c85556011478abde708251cc0776.
Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would result in a '[TTM] Buffer eviction failed' exception whenever it reached a timeout. Due to a dependency to DMA_FENCE_WARN this also restores some code deleted by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2").
Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") Link: https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/ Reported-by: Timo Lindfors timo.lindfors@iki.fi Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 Signed-off-by: Alex Constantino dreaming.about.electric.sheep@gmail.com Signed-off-by: Maxime Ripard mripard@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240404181448.1643-2-dreaming... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/qxl/qxl_release.c | 50 +++++++++++++++++++++++++++---- include/linux/dma-fence.h | 7 +++++ 2 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 368d26da0d6a2..9febc8b73f09e 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -58,16 +58,56 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr, signed long timeout) { struct qxl_device *qdev; + struct qxl_release *release; + int count = 0, sc = 0; + bool have_drawable_releases; unsigned long cur, end = jiffies + timeout;
qdev = container_of(fence->lock, struct qxl_device, release_lock); + release = container_of(fence, struct qxl_release, base); + have_drawable_releases = release->type == QXL_RELEASE_DRAWABLE;
- if (!wait_event_timeout(qdev->release_event, - (dma_fence_is_signaled(fence) || - (qxl_io_notify_oom(qdev), 0)), - timeout)) - return 0; +retry: + sc++; + + if (dma_fence_is_signaled(fence)) + goto signaled; + + qxl_io_notify_oom(qdev); + + for (count = 0; count < 11; count++) { + if (!qxl_queue_garbage_collect(qdev, true)) + break; + + if (dma_fence_is_signaled(fence)) + goto signaled; + } + + if (dma_fence_is_signaled(fence)) + goto signaled; + + if (have_drawable_releases || sc < 4) { + if (sc > 2) + /* back off */ + usleep_range(500, 1000); + + if (time_after(jiffies, end)) + return 0; + + if (have_drawable_releases && sc > 300) { + DMA_FENCE_WARN(fence, + "failed to wait on release %llu after spincount %d\n", + fence->context & ~0xf0000000, sc); + goto signaled; + } + goto retry; + } + /* + * yeah, original sync_obj_wait gave up after 3 spins when + * have_drawable_releases is not set. + */
+signaled: cur = jiffies; if (time_after(cur, end)) return 0; diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index e06bad467f55e..c3f9bb6602ba2 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -682,4 +682,11 @@ static inline bool dma_fence_is_container(struct dma_fence *fence) return dma_fence_is_array(fence) || dma_fence_is_chain(fence); }
+#define DMA_FENCE_WARN(f, fmt, args...) \ + do { \ + struct dma_fence *__ff = (f); \ + pr_warn("f %llu#%llu: " fmt, __ff->context, __ff->seqno,\ + ##args); \ + } while (0) + #endif /* __LINUX_DMA_FENCE_H */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 185fdb4697cc9684a02f2fab0530ecdd0c2f15d4 ]
Calling a function through an incompatible pointer type causes breaks kcfi, so clang warns about the assignment:
drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c:73:10: error: cast from 'void (*)(const void *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 73 | .fini = (void(*)(void *))kfree,
Avoid this with a trivial wrapper.
Fixes: c39f472e9f14 ("drm/nouveau: remove symlinks, move core/ to nvkm/ (no code changes)") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240404160234.2923554-1-arnd@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c index 4bf486b571013..cb05f7f48a98b 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c @@ -66,11 +66,16 @@ of_init(struct nvkm_bios *bios, const char *name) return ERR_PTR(-EINVAL); }
+static void of_fini(void *p) +{ + kfree(p); +} + const struct nvbios_source nvbios_of = { .name = "OpenFirmware", .init = of_init, - .fini = (void(*)(void *))kfree, + .fini = of_fini, .read = of_read, .size = of_size, .rw = false,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luca Weiss luca.weiss@fairphone.com
[ Upstream commit 9dc23cba0927d09cb481da064c8413eb9df42e2b ]
The default highest_bank_bit of 15 didn't seem to cause issues so far but downstream defines it to be 14. But similar to [0] leaving it on 14 (or 15 for that matter) causes some corruption issues with some resolutions with DisplayPort, like 1920x1200.
So set it to 13 for now so that there's no screen corruption.
[0] commit 6a0dbcd20ef2 ("drm/msm/a6xx: set highest_bank_bit to 13 for a610")
Fixes: b7616b5c69e6 ("drm/msm/adreno: Add A619 support") Signed-off-by: Luca Weiss luca.weiss@fairphone.com Patchwork: https://patchwork.freedesktop.org/patch/585215/ Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index fd60e49b8ec4d..792a4c60a20c2 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1295,6 +1295,10 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu) if (adreno_is_a618(gpu)) gpu->ubwc_config.highest_bank_bit = 14;
+ if (adreno_is_a619(gpu)) + /* TODO: Should be 14 but causes corruption at e.g. 1920x1200 on DP */ + gpu->ubwc_config.highest_bank_bit = 13; + if (adreno_is_a619_holi(gpu)) gpu->ubwc_config.highest_bank_bit = 13;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiang Chen chenxiang66@hisilicon.com
[ Upstream commit 0098c55e0881f0b32591f2110410d5c8b7f9bd5a ]
We found that the second parameter of function ata_wait_after_reset() is incorrectly used. We call smp_ata_check_ready_type() to poll the device type until the 30s timeout, so the correct deadline should be (jiffies + 30000).
Fixes: 3c2673a09cf1 ("scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset") Co-developed-by: xiabing xiabing12@h-partners.com Signed-off-by: xiabing xiabing12@h-partners.com Co-developed-by: Yihang Li liyihang9@huawei.com Signed-off-by: Yihang Li liyihang9@huawei.com Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Link: https://lore.kernel.org/r/20240402035513.2024241-3-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 1abc62b07d24c..05c38e43f140a 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1792,7 +1792,7 @@ static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device) if (dev_is_sata(device)) { struct ata_link *link = &device->sata_dev.ap->link;
- rc = ata_wait_after_reset(link, HISI_SAS_WAIT_PHYUP_TIMEOUT, + rc = ata_wait_after_reset(link, jiffies + HISI_SAS_WAIT_PHYUP_TIMEOUT, smp_ata_check_ready_type); } else { msleep(2000);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 4406e4176f47177f5e51b4cc7e6a7a2ff3dbfbbd ]
The app_reply->elem[] array is allocated earlier in this function and it has app_req.num_ports elements. Thus this > comparison needs to be >= to prevent memory corruption.
Fixes: 7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/5c125b2f-92dd-412b-9b6f-fc3a3207bd60@moroto.mounta... Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/qla2xxx/qla_edif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c index 26e6b3e3af431..dcde55c8ee5de 100644 --- a/drivers/scsi/qla2xxx/qla_edif.c +++ b/drivers/scsi/qla2xxx/qla_edif.c @@ -1100,7 +1100,7 @@ qla_edif_app_getstats(scsi_qla_host_t *vha, struct bsg_job *bsg_job)
list_for_each_entry_safe(fcport, tf, &vha->vp_fcports, list) { if (fcport->edif.enable) { - if (pcnt > app_req.num_ports) + if (pcnt >= app_req.num_ports) break;
app_reply->elem[pcnt].rekey_count =
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 4539f91f2a801c0c028c252bffae56030cfb2cae ]
On startup, ovs-vswitchd probes different datapath features including support for timeout policies. While probing, it tries to execute certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE attributes set. These attributes tell the openvswitch module to not log any errors when they occur as it is expected that some of the probes will fail.
For some reason, setting the timeout policy ignores the PROBE attribute and logs a failure anyway. This is causing the following kernel log on each re-start of ovs-vswitchd:
kernel: Failed to associated timeout policy `ovs_test_tp'
Fix that by using the same logging macro that all other messages are using. The message will still be printed at info level when needed and will be rate limited, but with a net rate limiter instead of generic printk one.
The nf_ct_set_timeout() itself will still print some info messages, but at least this change makes logging in openvswitch module more consistent.
Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Ilya Maximets i.maximets@ovn.org Acked-by: Eelco Chaudron echaudro@redhat.com Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/conntrack.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 3019a4406ca4f..74b63cdb59923 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1380,8 +1380,9 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr, if (ct_info.timeout[0]) { if (nf_ct_set_timeout(net, ct_info.ct, family, key->ip.proto, ct_info.timeout)) - pr_info_ratelimited("Failed to associated timeout " - "policy `%s'\n", ct_info.timeout); + OVS_NLERR(log, + "Failed to associated timeout policy '%s'", + ct_info.timeout); else ct_info.nf_ct_timeout = rcu_dereference( nf_ct_timeout_find(ct_info.ct)->timeout);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Tesarik petr@tesarici.cz
[ Upstream commit 38a15d0a50e0a43778561a5861403851f0b0194c ]
Fix bogus lockdep warnings if multiple u64_stats_sync variables are initialized in the same file.
With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
static struct lock_class_key __key;
Since u64_stats_init() is a function (albeit an inline one), all calls within the same file end up using the same instance, effectively treating them all as a single lock-class.
Fixes: 9464ca650008 ("net: make u64_stats_init() a function") Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us... Signed-off-by: Petr Tesarik petr@tesarici.cz Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.cz Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/u64_stats_sync.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h index ffe48e69b3f3a..457879938fc19 100644 --- a/include/linux/u64_stats_sync.h +++ b/include/linux/u64_stats_sync.h @@ -135,10 +135,11 @@ static inline void u64_stats_inc(u64_stats_t *p) p->v++; }
-static inline void u64_stats_init(struct u64_stats_sync *syncp) -{ - seqcount_init(&syncp->seq); -} +#define u64_stats_init(syncp) \ + do { \ + struct u64_stats_sync *__s = (syncp); \ + seqcount_init(&__s->seq); \ + } while (0)
static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
syzbot reported an illegal copy in xsk_setsockopt() [1]
Make sure to validate setsockopt() @optlen parameter.
[1]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420 Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420 do_sock_setsockopt+0x3af/0x720 net/socket.c:2311 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fb40587de69 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69 RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006 RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000 R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08 </TASK>
Allocated by task 7549: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:3966 [inline] __kmalloc+0x233/0x4a0 mm/slub.c:3979 kmalloc include/linux/slab.h:632 [inline] __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869 do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75
The buggy address belongs to the object at ffff888028c6cde0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 1 bytes to the right of allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
The buggy address belongs to the physical page: page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001 raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533 prep_new_page mm/page_alloc.c:1540 [inline] get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311 __alloc_pages+0x256/0x680 mm/page_alloc.c:4569 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page+0x5f/0x160 mm/slub.c:2175 allocate_slab mm/slub.c:2338 [inline] new_slab+0x84/0x2f0 mm/slub.c:2391 ___slab_alloc+0xc73/0x1260 mm/slub.c:3525 __slab_alloc mm/slub.c:3610 [inline] __slab_alloc_node mm/slub.c:3663 [inline] slab_alloc_node mm/slub.c:3835 [inline] __do_kmalloc_node mm/slub.c:3965 [inline] __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973 kmalloc_node include/linux/slab.h:648 [inline] __vmalloc_area_node mm/vmalloc.c:3197 [inline] __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392 __vmalloc_node mm/vmalloc.c:3457 [inline] vzalloc+0x79/0x90 mm/vmalloc.c:3530 bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162 bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895 __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631 __do_sys_bpf kernel/bpf/syscall.c:5738 [inline] __se_sys_bpf kernel/bpf/syscall.c:5736 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 page last free pid 6650 tgid 6647 stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1140 [inline] free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346 free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532 release_pages+0x2117/0x2400 mm/swap.c:1042 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline] tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392 exit_mmap+0x4b6/0xd40 mm/mmap.c:3300 __mmput+0x115/0x3c0 kernel/fork.c:1345 exit_mm+0x220/0x310 kernel/exit.c:569 do_exit+0x99e/0x27e0 kernel/exit.c:865 do_group_exit+0x207/0x2c0 kernel/exit.c:1027 get_signal+0x176e/0x1850 kernel/signal.c:2907 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310 exit_to_user_mode_loop kernel/entry/common.c:105 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline] syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212 do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Memory state around the buggy address: ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
^ ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: "Björn Töpel" bjorn@kernel.org Cc: Magnus Karlsson magnus.karlsson@intel.com Cc: Maciej Fijalkowski maciej.fijalkowski@intel.com Cc: Jonathan Lemon jonathan.lemon@gmail.com Acked-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/xdp/xsk.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b78c0e095e221..7d1c0986f9bb3 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1414,6 +1414,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, struct xsk_queue **q; int entries;
+ if (optlen < sizeof(entries)) + return -EINVAL; if (copy_from_sockptr(&entries, optval, sizeof(entries))) return -EFAULT;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hariprasad Kelam hkelam@marvell.com
[ Upstream commit bccb798e07f8bb8b91212fe8ed1e421685449076 ]
Inorder to support shaping and scheduling, Upon class creation Netdev driver allocates trasmit schedulers.
The previous patch which added support for Round robin scheduling has a bug due to which driver is not freeing transmit schedulers post class deletion.
This patch fixes the same.
Fixes: 47a9656f168a ("octeontx2-pf: htb offload support for Round Robin scheduling") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/qos.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c index 1e77bbf5d22a1..1723e9912ae07 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c @@ -382,6 +382,7 @@ static void otx2_qos_read_txschq_cfg_tl(struct otx2_qos_node *parent, otx2_qos_read_txschq_cfg_tl(node, cfg); cnt = cfg->static_node_pos[node->level]; cfg->schq_contig_list[node->level][cnt] = node->schq; + cfg->schq_index_used[node->level][cnt] = true; cfg->schq_contig[node->level]++; cfg->static_node_pos[node->level]++; otx2_qos_read_txschq_cfg_schq(node, cfg);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 8b8ace080319a866f5dfe9da8e665ae51d971c54 ]
Multiple gendisk instances can allocated/added for single request queue in case of disk rebind. blkg may still stay in q->blkg_list when calling blkcg_init_disk() for rebind, then q->blkg_list becomes corrupted.
Fix the list corruption issue by:
- add blkg_init_queue() to initialize q->blkg_list & q->blkcg_mutex only - move calling blkg_init_queue() into blk_alloc_queue()
The list corruption should be started since commit f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") which delays removing blkg from q->blkg_list into blkg_free_workfn().
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler") Cc: Yu Kuai yukuai3@huawei.com Cc: Tejun Heo tj@kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240407125910.4053377-1-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 9 ++++++--- block/blk-cgroup.h | 2 ++ block/blk-core.c | 2 ++ 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index ff93c385ba5af..4529122e0cbdb 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1409,6 +1409,12 @@ static int blkcg_css_online(struct cgroup_subsys_state *css) return 0; }
+void blkg_init_queue(struct request_queue *q) +{ + INIT_LIST_HEAD(&q->blkg_list); + mutex_init(&q->blkcg_mutex); +} + int blkcg_init_disk(struct gendisk *disk) { struct request_queue *q = disk->queue; @@ -1416,9 +1422,6 @@ int blkcg_init_disk(struct gendisk *disk) bool preloaded; int ret;
- INIT_LIST_HEAD(&q->blkg_list); - mutex_init(&q->blkcg_mutex); - new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL); if (!new_blkg) return -ENOMEM; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index b927a4a0ad030..5b0bdc268ade9 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -188,6 +188,7 @@ struct blkcg_policy { extern struct blkcg blkcg_root; extern bool blkcg_debug_stats;
+void blkg_init_queue(struct request_queue *q); int blkcg_init_disk(struct gendisk *disk); void blkcg_exit_disk(struct gendisk *disk);
@@ -481,6 +482,7 @@ struct blkcg { };
static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } +static inline void blkg_init_queue(struct request_queue *q) { } static inline int blkcg_init_disk(struct gendisk *disk) { return 0; } static inline void blkcg_exit_disk(struct gendisk *disk) { } static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } diff --git a/block/blk-core.c b/block/blk-core.c index de771093b5268..99d684085719d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -431,6 +431,8 @@ struct request_queue *blk_alloc_queue(int node_id) init_waitqueue_head(&q->mq_freeze_wq); mutex_init(&q->mq_freeze_lock);
+ blkg_init_queue(q); + /* * Init percpu_ref in atomic mode so that it's faster to shutdown. * See blk_register_queue() for details.
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit e9d47b7b31563a6524b9f64ea70ed0289cc4d9c4 ]
When CONFIG_NET is disabled, an extra warning shows up for this unused variable:
lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=]
Replace the #ifdef with an IS_ENABLED() check that makes the compiler's dead-code-elimination take care of the link failure.
Fixes: f24a70106dc1 ("lib: checksum: Fix build with CONFIG_NET=n") Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Acked-by: Palmer Dabbelt palmer@rivosinc.com Acked-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org # build-tested Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- lib/checksum_kunit.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c index bf70850035c76..404dba36bae38 100644 --- a/lib/checksum_kunit.c +++ b/lib/checksum_kunit.c @@ -594,13 +594,15 @@ static void test_ip_fast_csum(struct kunit *test)
static void test_csum_ipv6_magic(struct kunit *test) { -#if defined(CONFIG_NET) const struct in6_addr *saddr; const struct in6_addr *daddr; unsigned int len; unsigned char proto; __wsum csum;
+ if (!IS_ENABLED(CONFIG_NET)) + return; + const int daddr_offset = sizeof(struct in6_addr); const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr); const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) + @@ -618,7 +620,6 @@ static void test_csum_ipv6_magic(struct kunit *test) CHECK_EQ(to_sum16(expected_csum_ipv6_magic[i]), csum_ipv6_magic(saddr, daddr, len, proto, csum)); } -#endif /* !CONFIG_NET */ }
static struct kunit_case __refdata checksum_test_cases[] = {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit d8a6213d70accb403b82924a1c229e733433a5ef ]
syzbot is able to trigger an uninit-value in geneve_xmit() [1]
Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield()) uses skb_protocol(skb, true), pskb_inet_may_pull() is only using skb->protocol.
If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol, pskb_inet_may_pull() does nothing at all.
If a vlan tag was provided by the caller (af_packet in the syzbot case), the network header might not point to the correct location, and skb linear part could be smaller than expected.
Add skb_vlan_inet_prepare() to perform a complete mac validation.
Use this in geneve for the moment, I suspect we need to adopt this more broadly.
v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest - Only call __vlan_get_protocol() for vlan types. Link: https://lore.kernel.org/netdev/20240404100035.3270a7d5@kernel.org/
v2,v3 - Addressed Sabrina comments on v1 and v2 Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/
[1]
BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline] BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 geneve_xmit_skb drivers/net/geneve.c:910 [inline] geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 __dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335 dev_queue_xmit include/linux/netdevice.h:3091 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 packet_alloc_skb net/packet/af_packet.c:2930 [inline] packet_snd net/packet/af_packet.c:3024 [inline] packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb") Reported-by: syzbot+9ee20ec1de7b3168db09@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000d19c3a06152f9ee4@google.com/ Signed-off-by: Eric Dumazet edumazet@google.com Cc: Phillip Potter phil@philpotter.co.uk Cc: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Phillip Potter phil@philpotter.co.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 4 ++-- include/net/ip_tunnels.h | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 097a8db0d1d99..7f00fca0c538c 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -830,7 +830,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, __be16 sport; int err;
- if (!pskb_inet_may_pull(skb)) + if (!skb_vlan_inet_prepare(skb)) return -EINVAL;
if (!gs4) @@ -937,7 +937,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, __be16 sport; int err;
- if (!pskb_inet_may_pull(skb)) + if (!skb_vlan_inet_prepare(skb)) return -EINVAL;
if (!gs6) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 2d746f4c9a0a4..6690939f241a4 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -360,6 +360,39 @@ static inline bool pskb_inet_may_pull(struct sk_buff *skb) return pskb_network_may_pull(skb, nhlen); }
+/* Variant of pskb_inet_may_pull(). + */ +static inline bool skb_vlan_inet_prepare(struct sk_buff *skb) +{ + int nhlen = 0, maclen = ETH_HLEN; + __be16 type = skb->protocol; + + /* Essentially this is skb_protocol(skb, true) + * And we get MAC len. + */ + if (eth_type_vlan(type)) + type = __vlan_get_protocol(skb, type, &maclen); + + switch (type) { +#if IS_ENABLED(CONFIG_IPV6) + case htons(ETH_P_IPV6): + nhlen = sizeof(struct ipv6hdr); + break; +#endif + case htons(ETH_P_IP): + nhlen = sizeof(struct iphdr); + break; + } + /* For ETH_P_IPV6/ETH_P_IP we make sure to pull + * a base network header in skb->head. + */ + if (!pskb_may_pull(skb, maclen + nhlen)) + return false; + + skb_set_network_header(skb, maclen); + return true; +} + static inline int ip_encap_hlen(struct ip_tunnel_encap *e) { const struct ip_tunnel_encap_ops *ops;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerd Bayer gbayer@linux.ibm.com
[ Upstream commit 58effa3476536215530c9ec4910ffc981613b413 ]
Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP anymore, even on archs that may be able to fulfill this. Functionality that relied on the receive buffer being a compound page broke at that point: The SMC-D protocol, that utilizes the ism device driver, passes receive buffers to the splice processor in a struct splice_pipe_desc with a single entry list of struct pages. As the buffer is no longer a compound page, the splice processor now rejects requests to handle more than a page worth of data.
Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and create a DMA map for it with dma_map_page(). Since only receive buffers on ISM devices use DMA, qualify the mapping as FROM_DEVICE. Since ISM devices are available on arch s390, only and on that arch all DMA is coherent, there is no need to introduce and export some kind of dma_sync_to_cpu() method to be called by the SMC-D protocol layer.
Analogously, replace dma_free_coherent by a two step dma_unmap_page, then folio_put to free the receive buffer.
[1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/
Fixes: c08004eede4b ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent") Signed-off-by: Gerd Bayer gbayer@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/ism_drv.c | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index 2c8e964425dc3..affb05521e146 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -14,6 +14,8 @@ #include <linux/err.h> #include <linux/ctype.h> #include <linux/processor.h> +#include <linux/dma-mapping.h> +#include <linux/mm.h>
#include "ism.h"
@@ -292,13 +294,15 @@ static int ism_read_local_gid(struct ism_dev *ism) static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { clear_bit(dmb->sba_idx, ism->sba_bitmap); - dma_free_coherent(&ism->pdev->dev, dmb->dmb_len, - dmb->cpu_addr, dmb->dma_addr); + dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len, + DMA_FROM_DEVICE); + folio_put(virt_to_folio(dmb->cpu_addr)); }
static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { unsigned long bit; + int rc;
if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev)) return -EINVAL; @@ -315,14 +319,30 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb) test_and_set_bit(dmb->sba_idx, ism->sba_bitmap)) return -EINVAL;
- dmb->cpu_addr = dma_alloc_coherent(&ism->pdev->dev, dmb->dmb_len, - &dmb->dma_addr, - GFP_KERNEL | __GFP_NOWARN | - __GFP_NOMEMALLOC | __GFP_NORETRY); - if (!dmb->cpu_addr) - clear_bit(dmb->sba_idx, ism->sba_bitmap); + dmb->cpu_addr = + folio_address(folio_alloc(GFP_KERNEL | __GFP_NOWARN | + __GFP_NOMEMALLOC | __GFP_NORETRY, + get_order(dmb->dmb_len)));
- return dmb->cpu_addr ? 0 : -ENOMEM; + if (!dmb->cpu_addr) { + rc = -ENOMEM; + goto out_bit; + } + dmb->dma_addr = dma_map_page(&ism->pdev->dev, + virt_to_page(dmb->cpu_addr), 0, + dmb->dmb_len, DMA_FROM_DEVICE); + if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) { + rc = -ENOMEM; + goto out_free; + } + + return 0; + +out_free: + kfree(dmb->cpu_addr); +out_bit: + clear_bit(dmb->sba_idx, ism->sba_bitmap); + return rc; }
int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikas Gupta vikas.gupta@broadcom.com
[ Upstream commit 7ac10c7d728d75bc9daaa8fade3c7a3273b9a9ff ]
If ulp = kzalloc() fails, the allocated edev will leak because it is not properly assigned and the cleanup path will not be able to free it. Fix it by assigning it properly immediately after allocation.
Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Vikas Gupta vikas.gupta@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c index 93f9bd55020f2..a5f9c9090a6b0 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c @@ -392,12 +392,13 @@ void bnxt_rdma_aux_device_init(struct bnxt *bp) if (!edev) goto aux_dev_uninit;
+ aux_priv->edev = edev; + ulp = kzalloc(sizeof(*ulp), GFP_KERNEL); if (!ulp) goto aux_dev_uninit;
edev->ulp_tbl = ulp; - aux_priv->edev = edev; bp->edev = edev; bnxt_set_edev_info(edev, bp);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikas Gupta vikas.gupta@broadcom.com
[ Upstream commit b5ea7d33ba2a42b95b4298d08d2af9cdeeaf0090 ]
Since runtime MSIXs vector allocation/free has been removed, the L2 driver needs to repopulate the MSIX entries for the ulp client as the irq table may change during the recovery process.
Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Vikas Gupta vikas.gupta@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c index a5f9c9090a6b0..195c02dc06830 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c @@ -210,6 +210,9 @@ void bnxt_ulp_start(struct bnxt *bp, int err) if (err) return;
+ if (edev->ulp_tbl->msix_requested) + bnxt_fill_msix_vecs(bp, edev->msix_entries); + if (aux_priv) { struct auxiliary_device *adev;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavan Chebbi pavan.chebbi@broadcom.com
[ Upstream commit faa12ca245585379d612736a4b5e98e88481ea59 ]
It is possible that during error recovery and firmware reset, there is a pending TX PTP packet waiting for the timestamp. We need to reset this condition so that after recovery, the tx_avail count for PTP is reset back to the initial value. Otherwise, we may not accept any PTP TX timestamps after recovery.
Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods") Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 39845d556bafc..5e6e32d708e24 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11526,6 +11526,8 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) /* VF-reps may need to be re-opened after the PF is re-opened */ if (BNXT_PF(bp)) bnxt_vf_reps_open(bp); + if (bp->ptp_cfg) + atomic_set(&bp->ptp_cfg->tx_avail, BNXT_MAX_TX_TS); bnxt_ptp_init_rtc(bp, true); bnxt_ptp_cfg_tstamp_filters(bp); return 0;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Raag Jadav raag.jadav@intel.com
[ Upstream commit aca1a5287ea328fd1f7e2bfa6806646486d86a70 ]
Commit b2b32a173881 ("ACPI: bus: update acpi_dev_hid_uid_match() to support multiple types") added _UID matching support for both integer and string types, which satisfies NULL @uid2 argument for string types using inversion, but this logic prevents _UID comparision in case the argument is integer 0, which may result in false positives.
Fix this using _Generic(), which will allow NULL @uid2 argument for string types as well as _UID matching for all possible integer values.
Fixes: b2b32a173881 ("ACPI: bus: update acpi_dev_hid_uid_match() to support multiple types") Signed-off-by: Raag Jadav raag.jadav@intel.com [ rjw: Comment adjustment ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/acpi/acpi_bus.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 446225aada50d..8b45b82cd5edc 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -911,17 +911,19 @@ static inline bool acpi_int_uid_match(struct acpi_device *adev, u64 uid2) * acpi_dev_hid_uid_match - Match device by supplied HID and UID * @adev: ACPI device to match. * @hid2: Hardware ID of the device. - * @uid2: Unique ID of the device, pass 0 or NULL to not check _UID. + * @uid2: Unique ID of the device, pass NULL to not check _UID. * * Matches HID and UID in @adev with given @hid2 and @uid2. Absence of @uid2 * will be treated as a match. If user wants to validate @uid2, it should be * done before calling this function. * - * Returns: %true if matches or @uid2 is 0 or NULL, %false otherwise. + * Returns: %true if matches or @uid2 is NULL, %false otherwise. */ #define acpi_dev_hid_uid_match(adev, hid2, uid2) \ (acpi_dev_hid_match(adev, hid2) && \ - (!(uid2) || acpi_dev_uid_match(adev, uid2))) + /* Distinguish integer 0 from NULL @uid2 */ \ + (_Generic(uid2, ACPI_STR_TYPES(!(uid2)), default: 0) || \ + acpi_dev_uid_match(adev, uid2)))
void acpi_dev_clear_dependencies(struct acpi_device *supplier); bool acpi_dev_ready_for_enumeration(const struct acpi_device *device);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 11270e526276ffad4c4237acb393da82a3287487 ]
Both generic node and HMAT handling code have been using magic numbers to indicate access classes for 'struct access_coordinate'. Introduce enums to enumerate the access0 and access1 classes shared by the two subsystems. Update the function parameters and callers as appropriate to utilize the new enum.
Access0 is named to ACCESS_COORDINATE_LOCAL in order to indicate that the access class is for 'struct access_coordinate' between a target node and the nearest initiator node.
Access1 is named to ACCESS_COORDINATE_CPU in order to indicate that the access class is for 'struct access_coordinate' between a target node and the nearest CPU node.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Rafael J. Wysocki rafael@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Tested-by: Jonathan Cameron Jonathan.Cameron@huawei.com Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240308220055.2172956-3-dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/numa/hmat.c | 26 ++++++++++++++------------ drivers/base/node.c | 6 +++--- include/linux/node.h | 18 +++++++++++++++--- 3 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index a26e7793ec4ef..e0144cfbf1f31 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -59,9 +59,7 @@ struct target_cache { };
enum { - NODE_ACCESS_CLASS_0 = 0, - NODE_ACCESS_CLASS_1, - NODE_ACCESS_CLASS_GENPORT_SINK, + NODE_ACCESS_CLASS_GENPORT_SINK = ACCESS_COORDINATE_MAX, NODE_ACCESS_CLASS_MAX, };
@@ -374,11 +372,11 @@ static __init void hmat_update_target(unsigned int tgt_pxm, unsigned int init_px
if (target && target->processor_pxm == init_pxm) { hmat_update_target_access(target, type, value, - NODE_ACCESS_CLASS_0); + ACCESS_COORDINATE_LOCAL); /* If the node has a CPU, update access 1 */ if (node_state(pxm_to_node(init_pxm), N_CPU)) hmat_update_target_access(target, type, value, - NODE_ACCESS_CLASS_1); + ACCESS_COORDINATE_CPU); } }
@@ -709,7 +707,8 @@ static void hmat_update_target_attrs(struct memory_target *target, */ if (target->processor_pxm != PXM_INVAL) { cpu_nid = pxm_to_node(target->processor_pxm); - if (access == 0 || node_state(cpu_nid, N_CPU)) { + if (access == ACCESS_COORDINATE_LOCAL || + node_state(cpu_nid, N_CPU)) { set_bit(target->processor_pxm, p_nodes); return; } @@ -737,7 +736,8 @@ static void hmat_update_target_attrs(struct memory_target *target, list_for_each_entry(initiator, &initiators, node) { u32 value;
- if (access == 1 && !initiator->has_cpu) { + if (access == ACCESS_COORDINATE_CPU && + !initiator->has_cpu) { clear_bit(initiator->processor_pxm, p_nodes); continue; } @@ -782,8 +782,10 @@ static void hmat_register_target_initiators(struct memory_target *target) { static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
- __hmat_register_target_initiators(target, p_nodes, 0); - __hmat_register_target_initiators(target, p_nodes, 1); + __hmat_register_target_initiators(target, p_nodes, + ACCESS_COORDINATE_LOCAL); + __hmat_register_target_initiators(target, p_nodes, + ACCESS_COORDINATE_CPU); }
static void hmat_register_target_cache(struct memory_target *target) @@ -854,8 +856,8 @@ static void hmat_register_target(struct memory_target *target) if (!target->registered) { hmat_register_target_initiators(target); hmat_register_target_cache(target); - hmat_register_target_perf(target, NODE_ACCESS_CLASS_0); - hmat_register_target_perf(target, NODE_ACCESS_CLASS_1); + hmat_register_target_perf(target, ACCESS_COORDINATE_LOCAL); + hmat_register_target_perf(target, ACCESS_COORDINATE_CPU); target->registered = true; } mutex_unlock(&target_lock); @@ -927,7 +929,7 @@ static int hmat_calculate_adistance(struct notifier_block *self, return NOTIFY_OK;
mutex_lock(&target_lock); - hmat_update_target_attrs(target, p_nodes, 1); + hmat_update_target_attrs(target, p_nodes, ACCESS_COORDINATE_CPU); mutex_unlock(&target_lock);
perf = &target->coord[1]; diff --git a/drivers/base/node.c b/drivers/base/node.c index 1c05640461dd1..a73b0c9a401ad 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -126,7 +126,7 @@ static void node_access_release(struct device *dev) }
static struct node_access_nodes *node_init_node_access(struct node *node, - unsigned int access) + enum access_coordinate_class access) { struct node_access_nodes *access_node; struct device *dev; @@ -191,7 +191,7 @@ static struct attribute *access_attrs[] = { * @access: The access class the for the given attributes */ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, - unsigned int access) + enum access_coordinate_class access) { struct node_access_nodes *c; struct node *node; @@ -689,7 +689,7 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) */ int register_memory_node_under_compute_node(unsigned int mem_nid, unsigned int cpu_nid, - unsigned int access) + enum access_coordinate_class access) { struct node *init_node, *targ_node; struct node_access_nodes *initiator, *target; diff --git a/include/linux/node.h b/include/linux/node.h index 25b66d705ee2e..dfc004e4bee74 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -34,6 +34,18 @@ struct access_coordinate { unsigned int write_latency; };
+/* + * ACCESS_COORDINATE_LOCAL correlates to ACCESS CLASS 0 + * - access_coordinate between target node and nearest initiator node + * ACCESS_COORDINATE_CPU correlates to ACCESS CLASS 1 + * - access_coordinate between target node and nearest CPU node + */ +enum access_coordinate_class { + ACCESS_COORDINATE_LOCAL, + ACCESS_COORDINATE_CPU, + ACCESS_COORDINATE_MAX +}; + enum cache_indexing { NODE_CACHE_DIRECT_MAP, NODE_CACHE_INDEXED, @@ -66,7 +78,7 @@ struct node_cache_attrs { #ifdef CONFIG_HMEM_REPORTING void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs); void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, - unsigned access); + enum access_coordinate_class access); #else static inline void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs) @@ -75,7 +87,7 @@ static inline void node_add_cache(unsigned int nid,
static inline void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, - unsigned access) + enum access_coordinate_class access) { } #endif @@ -137,7 +149,7 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
extern int register_memory_node_under_compute_node(unsigned int mem_nid, unsigned int cpu_nid, - unsigned access); + enum access_coordinate_class access); #else static inline void node_dev_init(void) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 1745a7b364dfd339ab2696b7d51d7ed950ed2598 ]
In order to compute access0 and access1 classes for CXL memory, 2 levels of generic port information must be stored. Access0 will indicate the generic port access coordinates to the closest initiator and access1 will indicate the generic port access coordinates to the cloest CPU.
Cc: Rafael J. Wysocki rafael@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Tested-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240308220055.2172956-4-dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/numa/hmat.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index e0144cfbf1f31..a1257888a6dfd 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -59,7 +59,8 @@ struct target_cache { };
enum { - NODE_ACCESS_CLASS_GENPORT_SINK = ACCESS_COORDINATE_MAX, + NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL = ACCESS_COORDINATE_MAX, + NODE_ACCESS_CLASS_GENPORT_SINK_CPU, NODE_ACCESS_CLASS_MAX, };
@@ -141,7 +142,7 @@ int acpi_get_genport_coordinates(u32 uid, if (!target) return -ENOENT;
- *coord = target->coord[NODE_ACCESS_CLASS_GENPORT_SINK]; + *coord = target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL];
return 0; } @@ -695,7 +696,8 @@ static void hmat_update_target_attrs(struct memory_target *target, int i;
/* Don't update for generic port if there's no device handle */ - if (access == NODE_ACCESS_CLASS_GENPORT_SINK && + if ((access == NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL || + access == NODE_ACCESS_CLASS_GENPORT_SINK_CPU) && !(*(u16 *)target->gen_port_device_handle)) return;
@@ -736,7 +738,8 @@ static void hmat_update_target_attrs(struct memory_target *target, list_for_each_entry(initiator, &initiators, node) { u32 value;
- if (access == ACCESS_COORDINATE_CPU && + if ((access == ACCESS_COORDINATE_CPU || + access == NODE_ACCESS_CLASS_GENPORT_SINK_CPU) && !initiator->has_cpu) { clear_bit(initiator->processor_pxm, p_nodes); continue; @@ -775,7 +778,9 @@ static void hmat_update_generic_target(struct memory_target *target) static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
hmat_update_target_attrs(target, p_nodes, - NODE_ACCESS_CLASS_GENPORT_SINK); + NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL); + hmat_update_target_attrs(target, p_nodes, + NODE_ACCESS_CLASS_GENPORT_SINK_CPU); }
static void hmat_register_target_initiators(struct memory_target *target)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit bd98cbbbf82a3086423865816e1b5ab4bb4b6c60 ]
Update acpi_get_genport_coordinates() to allow retrieval of both access classes of the 'struct access_coordinate' for a generic target. The update will allow CXL code to compute access coordinates for both access class.
Cc: Rafael J. Wysocki rafael@kernel.org Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Tested-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/numa/hmat.c | 8 ++++++-- drivers/cxl/acpi.c | 8 +++++--- drivers/cxl/core/port.c | 2 +- drivers/cxl/cxl.h | 2 +- 4 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index a1257888a6dfd..75e9aac43228a 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -126,7 +126,8 @@ static struct memory_target *acpi_find_genport_target(u32 uid) /** * acpi_get_genport_coordinates - Retrieve the access coordinates for a generic port * @uid: ACPI unique id - * @coord: The access coordinates written back out for the generic port + * @coord: The access coordinates written back out for the generic port. + * Expect 2 levels array. * * Return: 0 on success. Errno on failure. * @@ -142,7 +143,10 @@ int acpi_get_genport_coordinates(u32 uid, if (!target) return -ENOENT;
- *coord = target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL]; + coord[ACCESS_COORDINATE_LOCAL] = + target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL]; + coord[ACCESS_COORDINATE_CPU] = + target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_CPU];
return 0; } diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 1a3e6aafbdcc3..af5cb818f84d6 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -530,13 +530,15 @@ static int get_genport_coordinates(struct device *dev, struct cxl_dport *dport) if (kstrtou32(acpi_device_uid(hb), 0, &uid)) return -EINVAL;
- rc = acpi_get_genport_coordinates(uid, &dport->hb_coord); + rc = acpi_get_genport_coordinates(uid, dport->hb_coord); if (rc < 0) return rc;
/* Adjust back to picoseconds from nanoseconds */ - dport->hb_coord.read_latency *= 1000; - dport->hb_coord.write_latency *= 1000; + for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { + dport->hb_coord[i].read_latency *= 1000; + dport->hb_coord[i].write_latency *= 1000; + }
return 0; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index e59d9d37aa650..612bf7e1e8474 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2152,7 +2152,7 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, }
/* Augment with the generic port (host bridge) perf data */ - combine_coordinates(&c, &dport->hb_coord); + combine_coordinates(&c, &dport->hb_coord[ACCESS_COORDINATE_LOCAL]);
/* Get the calculated PCI paths bandwidth */ pdev = to_pci_dev(port->uport_dev->parent); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 003feebab79b5..fe7448f2745e1 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -671,7 +671,7 @@ struct cxl_dport { struct cxl_port *port; struct cxl_regs regs; struct access_coordinate sw_coord; - struct access_coordinate hb_coord; + struct access_coordinate hb_coord[ACCESS_COORDINATE_MAX]; long link_latency; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 032f7b37adff6985e22516053698b77131c2ce96 ]
Refactor the common code of combining coordinates in order to reduce code. Create a new function cxl_cooordinates_combine() it combine two 'struct access_coordinate'.
Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Tested-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/cdat.c | 32 +++++++++++++++++++++++--------- drivers/cxl/core/port.c | 18 ++---------------- drivers/cxl/cxl.h | 4 ++++ 3 files changed, 29 insertions(+), 25 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 0363ca434ef45..4739a9d776a65 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -185,15 +185,7 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, xa_for_each(dsmas_xa, index, dent) { int qos_class;
- dent->coord.read_latency = dent->coord.read_latency + - c.read_latency; - dent->coord.write_latency = dent->coord.write_latency + - c.write_latency; - dent->coord.read_bandwidth = min_t(int, c.read_bandwidth, - dent->coord.read_bandwidth); - dent->coord.write_bandwidth = min_t(int, c.write_bandwidth, - dent->coord.write_bandwidth); - + cxl_coordinates_combine(&dent->coord, &dent->coord, &c); dent->entries = 1; rc = cxl_root->ops->qos_class(cxl_root, &dent->coord, 1, &qos_class); @@ -484,4 +476,26 @@ void cxl_switch_parse_cdat(struct cxl_port *port) } EXPORT_SYMBOL_NS_GPL(cxl_switch_parse_cdat, CXL);
+/** + * cxl_coordinates_combine - Combine the two input coordinates + * + * @out: Output coordinate of c1 and c2 combined + * @c1: input coordinates + * @c2: input coordinates + */ +void cxl_coordinates_combine(struct access_coordinate *out, + struct access_coordinate *c1, + struct access_coordinate *c2) +{ + if (c1->write_bandwidth && c2->write_bandwidth) + out->write_bandwidth = min(c1->write_bandwidth, + c2->write_bandwidth); + out->write_latency = c1->write_latency + c2->write_latency; + + if (c1->read_bandwidth && c2->read_bandwidth) + out->read_bandwidth = min(c1->read_bandwidth, + c2->read_bandwidth); + out->read_latency = c1->read_latency + c2->read_latency; +} + MODULE_IMPORT_NS(CXL); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 612bf7e1e8474..af9458b2678cf 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2096,20 +2096,6 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd) } EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
-static void combine_coordinates(struct access_coordinate *c1, - struct access_coordinate *c2) -{ - if (c2->write_bandwidth) - c1->write_bandwidth = min(c1->write_bandwidth, - c2->write_bandwidth); - c1->write_latency += c2->write_latency; - - if (c2->read_bandwidth) - c1->read_bandwidth = min(c1->read_bandwidth, - c2->read_bandwidth); - c1->read_latency += c2->read_latency; -} - /** * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports * of CXL path @@ -2143,7 +2129,7 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, * nothing to gather. */ while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) { - combine_coordinates(&c, &dport->sw_coord); + cxl_coordinates_combine(&c, &c, &dport->sw_coord); c.write_latency += dport->link_latency; c.read_latency += dport->link_latency;
@@ -2152,7 +2138,7 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, }
/* Augment with the generic port (host bridge) perf data */ - combine_coordinates(&c, &dport->hb_coord[ACCESS_COORDINATE_LOCAL]); + cxl_coordinates_combine(&c, &c, &dport->hb_coord[ACCESS_COORDINATE_LOCAL]);
/* Get the calculated PCI paths bandwidth */ pdev = to_pci_dev(port->uport_dev->parent); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index fe7448f2745e1..fab2da4b1f04e 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -882,6 +882,10 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
+void cxl_coordinates_combine(struct access_coordinate *out, + struct access_coordinate *c1, + struct access_coordinate *c2); + /* * Unit test builds overrides this to __weak, find the 'strong' version * of these symbols in tools/testing/cxl/.
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 863027d40993f13155451bd898bfe4c4e9b7002f ]
The difference between access class 0 and access class 1 for 'struct access_coordinate', if any, is that class 0 is for the distance from the target to the closest initiator and that class 1 is for the distance from the target to the closest CPU. For CXL memory, the nearest initiator may not necessarily be a CPU node. The performance path from the CXL endpoint to the host bridge should remain the same. However, the numbers extracted and stored from HMAT is the difference for the two access classes. Split out the performance numbers for the host bridge (generic target) from the calculation of the entire path in order to allow calculation of both access classes for a CXL region.
Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Tested-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/cdat.c | 28 ++++++++++++++++++++++------ drivers/cxl/core/port.c | 35 ++++++++++++++++++++++++++++++++--- drivers/cxl/cxl.h | 2 ++ 3 files changed, 56 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 4739a9d776a65..fbf167f9d59d4 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -162,15 +162,22 @@ static int cxl_cdat_endpoint_process(struct cxl_port *port, static int cxl_port_perf_data_calculate(struct cxl_port *port, struct xarray *dsmas_xa) { - struct access_coordinate c; + struct access_coordinate ep_c; + struct access_coordinate coord[ACCESS_COORDINATE_MAX]; struct dsmas_entry *dent; int valid_entries = 0; unsigned long index; int rc;
- rc = cxl_endpoint_get_perf_coordinates(port, &c); + rc = cxl_endpoint_get_perf_coordinates(port, &ep_c); if (rc) { - dev_dbg(&port->dev, "Failed to retrieve perf coordinates.\n"); + dev_dbg(&port->dev, "Failed to retrieve ep perf coordinates.\n"); + return rc; + } + + rc = cxl_hb_get_perf_coordinates(port, coord); + if (rc) { + dev_dbg(&port->dev, "Failed to retrieve hb perf coordinates.\n"); return rc; }
@@ -185,10 +192,19 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, xa_for_each(dsmas_xa, index, dent) { int qos_class;
- cxl_coordinates_combine(&dent->coord, &dent->coord, &c); + cxl_coordinates_combine(&dent->coord, &dent->coord, &ep_c); + /* + * Keeping the host bridge coordinates separate from the dsmas + * coordinates in order to allow calculation of access class + * 0 and 1 for region later. + */ + cxl_coordinates_combine(&coord[ACCESS_COORDINATE_LOCAL], + &coord[ACCESS_COORDINATE_LOCAL], + &dent->coord); dent->entries = 1; - rc = cxl_root->ops->qos_class(cxl_root, &dent->coord, 1, - &qos_class); + rc = cxl_root->ops->qos_class(cxl_root, + &coord[ACCESS_COORDINATE_LOCAL], + 1, &qos_class); if (rc != 1) continue;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index af9458b2678cf..b2a2f6c34886d 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2096,6 +2096,38 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd) } EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
+/** + * cxl_hb_get_perf_coordinates - Retrieve performance numbers between initiator + * and host bridge + * + * @port: endpoint cxl_port + * @coord: output access coordinates + * + * Return: errno on failure, 0 on success. + */ +int cxl_hb_get_perf_coordinates(struct cxl_port *port, + struct access_coordinate *coord) +{ + struct cxl_port *iter = port; + struct cxl_dport *dport; + + if (!is_cxl_endpoint(port)) + return -EINVAL; + + dport = iter->parent_dport; + while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) { + iter = to_cxl_port(iter->dev.parent); + dport = iter->parent_dport; + } + + coord[ACCESS_COORDINATE_LOCAL] = + dport->hb_coord[ACCESS_COORDINATE_LOCAL]; + coord[ACCESS_COORDINATE_CPU] = + dport->hb_coord[ACCESS_COORDINATE_CPU]; + + return 0; +} + /** * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports * of CXL path @@ -2137,9 +2169,6 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, dport = iter->parent_dport; }
- /* Augment with the generic port (host bridge) perf data */ - cxl_coordinates_combine(&c, &c, &dport->hb_coord[ACCESS_COORDINATE_LOCAL]); - /* Get the calculated PCI paths bandwidth */ pdev = to_pci_dev(port->uport_dev->parent); bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index fab2da4b1f04e..de477eb7f5d54 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -879,6 +879,8 @@ void cxl_switch_parse_cdat(struct cxl_port *port);
int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, struct access_coordinate *coord); +int cxl_hb_get_perf_coordinates(struct cxl_port *port, + struct access_coordinate *coord);
void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 648dae58a830ecceea3b1bebf68432435980f137 ]
The while() loop in cxl_endpoint_get_perf_coordinates() checks to see if 'iter' is valid as part of the condition breaking out of the loop. is_cxl_root() will stop the loop before the next iteration could go NULL. Remove the iter check.
The presence of the iter or removing the iter does not impact the behavior of the code. This is a code clean up and not a bug fix.
Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Davidlohr Bueso dave@stgolabs.net Reviewed-by: Dan Williams dan.j.williams@intel.com Link: https://lore.kernel.org/r/20240403154844.3403859-2-dave.jiang@intel.com Signed-off-by: Dave Jiang dave.jiang@intel.com Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/port.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index b2a2f6c34886d..0332b431117db 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2160,7 +2160,7 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, * port each iteration. If the parent is cxl root then there is * nothing to gather. */ - while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) { + while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { cxl_coordinates_combine(&c, &c, &dport->sw_coord); c.write_latency += dport->link_latency; c.read_latency += dport->link_latency;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 592780b8391fe31f129ef4823c1513528f4dcb76 ]
Current loop in cxl_endpoint_get_perf_coordinates() incorrectly assumes the Root Port (RP) dport is the one with generic port access_coordinate. However those coordinates are one level up in the Host Bridge (HB). Current code causes the computation code to pick up 0s as the coordinates and cause minimal bandwidth to result in 0.
Add check to skip RP when combining coordinates.
Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports") Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Dan Williams dan.j.williams@intel.com Link: https://lore.kernel.org/r/20240403154844.3403859-3-dave.jiang@intel.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/port.c | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 0332b431117db..4ae441ef32174 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2128,6 +2128,11 @@ int cxl_hb_get_perf_coordinates(struct cxl_port *port, return 0; }
+static bool parent_port_is_cxl_root(struct cxl_port *port) +{ + return is_cxl_root(to_cxl_port(port->dev.parent)); +} + /** * cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports * of CXL path @@ -2147,27 +2152,31 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, struct cxl_dport *dport; struct pci_dev *pdev; unsigned int bw; + bool is_cxl_root;
if (!is_cxl_endpoint(port)) return -EINVAL;
- dport = iter->parent_dport; - /* - * Exit the loop when the parent port of the current port is cxl root. - * The iterative loop starts at the endpoint and gathers the - * latency of the CXL link from the current iter to the next downstream - * port each iteration. If the parent is cxl root then there is - * nothing to gather. + * Exit the loop when the parent port of the current iter port is cxl + * root. The iterative loop starts at the endpoint and gathers the + * latency of the CXL link from the current device/port to the connected + * downstream port each iteration. */ - while (!is_cxl_root(to_cxl_port(iter->dev.parent))) { - cxl_coordinates_combine(&c, &c, &dport->sw_coord); + do { + dport = iter->parent_dport; + iter = to_cxl_port(iter->dev.parent); + is_cxl_root = parent_port_is_cxl_root(iter); + + /* + * There's no valid access_coordinate for a root port since RPs do not + * have CDAT and therefore needs to be skipped. + */ + if (!is_cxl_root) + cxl_coordinates_combine(&c, &c, &dport->sw_coord); c.write_latency += dport->link_latency; c.read_latency += dport->link_latency; - - iter = to_cxl_port(iter->dev.parent); - dport = iter->parent_dport; - } + } while (!is_cxl_root);
/* Get the calculated PCI paths bandwidth */ pdev = to_pci_dev(port->uport_dev->parent);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Vasut marex@denx.de
[ Upstream commit f96f700449b6d190e06272f1cf732ae8e45b73df ]
Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb), inline the netif_rx(skb) call directly into ks8851_common.c and drop the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one indirect call from the driver, no functional change otherwise.
Signed-off-by: Marek Vasut marex@denx.de Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.de Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851.h | 3 --- drivers/net/ethernet/micrel/ks8851_common.c | 12 +----------- drivers/net/ethernet/micrel/ks8851_par.c | 11 ----------- drivers/net/ethernet/micrel/ks8851_spi.c | 11 ----------- 4 files changed, 1 insertion(+), 36 deletions(-)
diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h index e5ec0a363aff8..31f75b4a67fd7 100644 --- a/drivers/net/ethernet/micrel/ks8851.h +++ b/drivers/net/ethernet/micrel/ks8851.h @@ -368,7 +368,6 @@ union ks8851_tx_hdr { * @rdfifo: FIFO read callback * @wrfifo: FIFO write callback * @start_xmit: start_xmit() implementation callback - * @rx_skb: rx_skb() implementation callback * @flush_tx_work: flush_tx_work() implementation callback * * The @statelock is used to protect information in the structure which may @@ -423,8 +422,6 @@ struct ks8851_net { struct sk_buff *txp, bool irq); netdev_tx_t (*start_xmit)(struct sk_buff *skb, struct net_device *dev); - void (*rx_skb)(struct ks8851_net *ks, - struct sk_buff *skb); void (*flush_tx_work)(struct ks8851_net *ks); };
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 0bf13b38b8f5b..896d43bb8883d 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -231,16 +231,6 @@ static void ks8851_dbg_dumpkkt(struct ks8851_net *ks, u8 *rxpkt) rxpkt[12], rxpkt[13], rxpkt[14], rxpkt[15]); }
-/** - * ks8851_rx_skb - receive skbuff - * @ks: The device state. - * @skb: The skbuff - */ -static void ks8851_rx_skb(struct ks8851_net *ks, struct sk_buff *skb) -{ - ks->rx_skb(ks, skb); -} - /** * ks8851_rx_pkts - receive packets from the host * @ks: The device information. @@ -309,7 +299,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) ks8851_dbg_dumpkkt(ks, rxpkt);
skb->protocol = eth_type_trans(skb, ks->netdev); - ks8851_rx_skb(ks, skb); + netif_rx(skb);
ks->netdev->stats.rx_packets++; ks->netdev->stats.rx_bytes += rxlen; diff --git a/drivers/net/ethernet/micrel/ks8851_par.c b/drivers/net/ethernet/micrel/ks8851_par.c index 2a7f298542670..381b9cd285ebd 100644 --- a/drivers/net/ethernet/micrel/ks8851_par.c +++ b/drivers/net/ethernet/micrel/ks8851_par.c @@ -210,16 +210,6 @@ static void ks8851_wrfifo_par(struct ks8851_net *ks, struct sk_buff *txp, iowrite16_rep(ksp->hw_addr, txp->data, len / 2); }
-/** - * ks8851_rx_skb_par - receive skbuff - * @ks: The device state. - * @skb: The skbuff - */ -static void ks8851_rx_skb_par(struct ks8851_net *ks, struct sk_buff *skb) -{ - netif_rx(skb); -} - static unsigned int ks8851_rdreg16_par_txqcr(struct ks8851_net *ks) { return ks8851_rdreg16_par(ks, KS_TXQCR); @@ -298,7 +288,6 @@ static int ks8851_probe_par(struct platform_device *pdev) ks->rdfifo = ks8851_rdfifo_par; ks->wrfifo = ks8851_wrfifo_par; ks->start_xmit = ks8851_start_xmit_par; - ks->rx_skb = ks8851_rx_skb_par;
#define STD_IRQ (IRQ_LCI | /* Link Change */ \ IRQ_RXI | /* RX done */ \ diff --git a/drivers/net/ethernet/micrel/ks8851_spi.c b/drivers/net/ethernet/micrel/ks8851_spi.c index 54f2eac11a631..55f6f9f6d030e 100644 --- a/drivers/net/ethernet/micrel/ks8851_spi.c +++ b/drivers/net/ethernet/micrel/ks8851_spi.c @@ -298,16 +298,6 @@ static unsigned int calc_txlen(unsigned int len) return ALIGN(len + 4, 4); }
-/** - * ks8851_rx_skb_spi - receive skbuff - * @ks: The device state - * @skb: The skbuff - */ -static void ks8851_rx_skb_spi(struct ks8851_net *ks, struct sk_buff *skb) -{ - netif_rx(skb); -} - /** * ks8851_tx_work - process tx packet(s) * @work: The work strucutre what was scheduled. @@ -435,7 +425,6 @@ static int ks8851_probe_spi(struct spi_device *spi) ks->rdfifo = ks8851_rdfifo_spi; ks->wrfifo = ks8851_wrfifo_spi; ks->start_xmit = ks8851_start_xmit_spi; - ks->rx_skb = ks8851_rx_skb_spi; ks->flush_tx_work = ks8851_flush_tx_work_spi;
#define STD_IRQ (IRQ_LCI | /* Link Change */ \
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Vasut marex@denx.de
[ Upstream commit be0384bf599cf1eb8d337517feeb732d71f75a6f ]
The ks8851_irq() thread may call ks8851_rx_pkts() in case there are any packets in the MAC FIFO, which calls netif_rx(). This netif_rx() implementation is guarded by local_bh_disable() and local_bh_enable(). The local_bh_enable() may call do_softirq() to run softirqs in case any are pending. One of the softirqs is net_rx_action, which ultimately reaches the driver .start_xmit callback. If that happens, the system hangs. The entire call chain is below:
ks8851_start_xmit_par from netdev_start_xmit netdev_start_xmit from dev_hard_start_xmit dev_hard_start_xmit from sch_direct_xmit sch_direct_xmit from __dev_queue_xmit __dev_queue_xmit from __neigh_update __neigh_update from neigh_update neigh_update from arp_process.constprop.0 arp_process.constprop.0 from __netif_receive_skb_one_core __netif_receive_skb_one_core from process_backlog process_backlog from __napi_poll.constprop.0 __napi_poll.constprop.0 from net_rx_action net_rx_action from __do_softirq __do_softirq from call_with_stack call_with_stack from do_softirq do_softirq from __local_bh_enable_ip __local_bh_enable_ip from netif_rx netif_rx from ks8851_irq ks8851_irq from irq_thread_fn irq_thread_fn from irq_thread irq_thread from kthread kthread from ret_from_fork
The hang happens because ks8851_irq() first locks a spinlock in ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...) and with that spinlock locked, calls netif_rx(). Once the execution reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again which attempts to claim the already locked spinlock again, and the hang happens.
Move the do_softirq() call outside of the spinlock protected section of ks8851_irq() by disabling BHs around the entire spinlock protected section of ks8851_irq() handler. Place local_bh_enable() outside of the spinlock protected section, so that it can trigger do_softirq() without the ks8851_par.c ks8851_lock_par() spinlock being held, and safely call ks8851_start_xmit_par() without attempting to lock the already locked spinlock.
Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable() now, replace netif_rx() with __netif_rx() which is not duplicating the local_bh_disable()/local_bh_enable() calls.
Fixes: 797047f875b5 ("net: ks8851: Implement Parallel bus operations") Signed-off-by: Marek Vasut marex@denx.de Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/micrel/ks8851_common.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 896d43bb8883d..d4cdf3d4f5525 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -299,7 +299,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) ks8851_dbg_dumpkkt(ks, rxpkt);
skb->protocol = eth_type_trans(skb, ks->netdev); - netif_rx(skb); + __netif_rx(skb);
ks->netdev->stats.rx_packets++; ks->netdev->stats.rx_bytes += rxlen; @@ -330,6 +330,8 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) unsigned long flags; unsigned int status;
+ local_bh_disable(); + ks8851_lock(ks, &flags);
status = ks8851_rdreg16(ks, KS_ISR); @@ -406,6 +408,8 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) if (status & IRQ_LCI) mii_check_link(&ks->mii);
+ local_bh_enable(); + return IRQ_HANDLED; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit b46f4eaa4f0ec38909fb0072eea3aeddb32f954e ]
syzkaller started to report deadlock of unix_gc_lock after commit 4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but it just uncovers the bug that has been there since commit 314001f0bf92 ("af_unix: Add OOB support").
The repro basically does the following.
from socket import * from array import array
c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB) c2.recv(1) # blocked as no normal data in recv queue
c2.close() # done async and unblock recv() c1.close() # done async and trigger GC
A socket sends its file descriptor to itself as OOB data and tries to receive normal data, but finally recv() fails due to async close().
The problem here is wrong handling of OOB skb in manage_oob(). When recvmsg() is called without MSG_OOB, manage_oob() is called to check if the peeked skb is OOB skb. In such a case, manage_oob() pops it out of the receive queue but does not clear unix_sock(sk)->oob_skb. This is wrong in terms of uAPI.
Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB. The 'o' is handled as OOB data. When recv() is called twice without MSG_OOB, the OOB data should be lost.
from socket import * c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0) c1.send(b'hello', MSG_OOB) # 'o' is OOB data
5
c1.send(b'world')
5
c2.recv(5) # OOB data is not received
b'hell'
c2.recv(5) # OOB date is skipped
b'world'
c2.recv(5, MSG_OOB) # This should return an error
b'o'
In the same situation, TCP actually returns -EINVAL for the last recv().
Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set EPOLLPRI even though the data has passed through by previous recv().
To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing it from recv queue.
The reason why the old GC did not trigger the deadlock is because the old GC relied on the receive queue to detect the loop.
When it is triggered, the socket with OOB data is marked as GC candidate because file refcount == inflight count (1). However, after traversing all inflight sockets, the socket still has a positive inflight count (1), thus the socket is excluded from candidates. Then, the old GC lose the chance to garbage-collect the socket.
With the old GC, the repro continues to create true garbage that will never be freed nor detected by kmemleak as it's linked to the global inflight list. That's why we couldn't even notice the issue.
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169 Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0748e7ea5210e..484874872fa6f 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2604,7 +2604,9 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk, } } else if (!(flags & MSG_PEEK)) { skb_unlink(skb, &sk->sk_receive_queue); - consume_skb(skb); + WRITE_ONCE(u->oob_skb, NULL); + if (!WARN_ON_ONCE(skb_unref(skb))) + kfree_skb(skb); skb = skb_peek(&sk->sk_receive_queue); } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit faf23006185e777db18912685922c5ddb2df383f ]
NIX SQ mode and link backpressure configuration is required for all platforms. But in current driver this code is wrongly placed under specific platform check. This patch fixes the issue by moving the code out of platform check.
Fixes: 5d9b976d4480 ("octeontx2-af: Support fixed transmit scheduler topology") Signed-off-by: Geetha sowjanya gakula@marvell.com Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_nix.c | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 66203a90f052b..42db213fb69a6 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -4721,18 +4721,18 @@ static int rvu_nix_block_init(struct rvu *rvu, struct nix_hw *nix_hw) */ rvu_write64(rvu, blkaddr, NIX_AF_CFG, rvu_read64(rvu, blkaddr, NIX_AF_CFG) | 0x40ULL); + }
- /* Set chan/link to backpressure TL3 instead of TL2 */ - rvu_write64(rvu, blkaddr, NIX_AF_PSE_CHANNEL_LEVEL, 0x01); + /* Set chan/link to backpressure TL3 instead of TL2 */ + rvu_write64(rvu, blkaddr, NIX_AF_PSE_CHANNEL_LEVEL, 0x01);
- /* Disable SQ manager's sticky mode operation (set TM6 = 0) - * This sticky mode is known to cause SQ stalls when multiple - * SQs are mapped to same SMQ and transmitting pkts at a time. - */ - cfg = rvu_read64(rvu, blkaddr, NIX_AF_SQM_DBG_CTL_STATUS); - cfg &= ~BIT_ULL(15); - rvu_write64(rvu, blkaddr, NIX_AF_SQM_DBG_CTL_STATUS, cfg); - } + /* Disable SQ manager's sticky mode operation (set TM6 = 0) + * This sticky mode is known to cause SQ stalls when multiple + * SQs are mapped to same SMQ and transmitting pkts at a time. + */ + cfg = rvu_read64(rvu, blkaddr, NIX_AF_SQM_DBG_CTL_STATUS); + cfg &= ~BIT_ULL(15); + rvu_write64(rvu, blkaddr, NIX_AF_SQM_DBG_CTL_STATUS, cfg);
ltdefs = rvu->kpu.lt_def; /* Calibrate X2P bus to check if CGX/LBK links are fine */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 74043489fcb5e5ca4074133582b5b8011b67f9e7 ]
When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing a 'make W=1' warning:
net/ipv6/ip6_fib.c: In function 'fib6_add': net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
Add another #ifdef around the variable declaration, matching the other uses in this file.
Fixes: 66729e18df08 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.") Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/ Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_fib.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 54294f6a8ec51..8184076a3924e 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1375,7 +1375,10 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, struct nl_info *info, struct netlink_ext_ack *extack) { struct fib6_table *table = rt->fib6_table; - struct fib6_node *fn, *pn = NULL; + struct fib6_node *fn; +#ifdef CONFIG_IPV6_SUBTREES + struct fib6_node *pn = NULL; +#endif int err = -ENOMEM; int allow_create = 1; int replace_required = 0; @@ -1399,9 +1402,9 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, goto out; }
+#ifdef CONFIG_IPV6_SUBTREES pn = fn;
-#ifdef CONFIG_IPV6_SUBTREES if (rt->fib6_src.plen) { struct fib6_node *sn;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit cf1b7201df59fb936f40f4a807433fe3f2ce310a ]
The log_martians variable is only used in an #ifdef, causing a 'make W=1' warning with gcc:
net/ipv4/route.c: In function 'ip_rt_send_redirect': net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
Change the #ifdef to an equivalent IS_ENABLED() to let the compiler see where the variable is used.
Fixes: 30038fc61adf ("net: ip_rt_send_redirect() optimization") Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/route.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 16615d107cf06..15c37c8113fc8 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -926,13 +926,11 @@ void ip_rt_send_redirect(struct sk_buff *skb) icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw); peer->rate_last = jiffies; ++peer->n_redirects; -#ifdef CONFIG_IP_ROUTE_VERBOSE - if (log_martians && + if (IS_ENABLED(CONFIG_IP_ROUTE_VERBOSE) && log_martians && peer->n_redirects == ip_rt_redirect_number) net_warn_ratelimited("host %pI4/if%d ignores redirects for %pI4 to %pI4\n", &ip_hdr(skb)->saddr, inet_iif(skb), &ip_hdr(skb)->daddr, &gw); -#endif } out_put_peer: inet_putpeer(peer);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Benc jbenc@redhat.com
[ Upstream commit 7633c4da919ad51164acbf1aa322cc1a3ead6129 ]
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it still means hlist_for_each_entry_rcu can return an item that got removed from the list. The memory itself of such item is not freed thanks to RCU but nothing guarantees the actual content of the memory is sane.
In particular, the reference count can be zero. This can happen if ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough timing, this can happen:
1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
2. Then, the whole ipv6_del_addr is executed for the given entry. The reference count drops to zero and kfree_rcu is scheduled.
3. ipv6_get_ifaddr continues and tries to increments the reference count (in6_ifa_hold).
4. The rcu is unlocked and the entry is freed.
5. The freed entry is returned.
Prevent increasing of the reference count in such case. The name in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
[ 41.506330] refcount_t: addition on 0; use-after-free. [ 41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130 [ 41.507413] Modules linked in: veth bridge stp llc [ 41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14 [ 41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [ 41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130 [ 41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff [ 41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282 [ 41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000 [ 41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900 [ 41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff [ 41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000 [ 41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48 [ 41.514086] FS: 00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000 [ 41.514726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0 [ 41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.516799] Call Trace: [ 41.517037] <TASK> [ 41.517249] ? __warn+0x7b/0x120 [ 41.517535] ? refcount_warn_saturate+0xa5/0x130 [ 41.517923] ? report_bug+0x164/0x190 [ 41.518240] ? handle_bug+0x3d/0x70 [ 41.518541] ? exc_invalid_op+0x17/0x70 [ 41.520972] ? asm_exc_invalid_op+0x1a/0x20 [ 41.521325] ? refcount_warn_saturate+0xa5/0x130 [ 41.521708] ipv6_get_ifaddr+0xda/0xe0 [ 41.522035] inet6_rtm_getaddr+0x342/0x3f0 [ 41.522376] ? __pfx_inet6_rtm_getaddr+0x10/0x10 [ 41.522758] rtnetlink_rcv_msg+0x334/0x3d0 [ 41.523102] ? netlink_unicast+0x30f/0x390 [ 41.523445] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 41.523832] netlink_rcv_skb+0x53/0x100 [ 41.524157] netlink_unicast+0x23b/0x390 [ 41.524484] netlink_sendmsg+0x1f2/0x440 [ 41.524826] __sys_sendto+0x1d8/0x1f0 [ 41.525145] __x64_sys_sendto+0x1f/0x30 [ 41.525467] do_syscall_64+0xa5/0x1b0 [ 41.525794] entry_SYSCALL_64_after_hwframe+0x72/0x7a [ 41.526213] RIP: 0033:0x7fbc4cfcea9a [ 41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 [ 41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a [ 41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005 [ 41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c [ 41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b [ 41.531573] </TASK>
Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jiri Benc jbenc@redhat.com Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.171258580... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/addrconf.h | 4 ++++ net/ipv6/addrconf.c | 7 ++++--- 2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 61ebe723ee4d5..facb7a469efad 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -437,6 +437,10 @@ static inline void in6_ifa_hold(struct inet6_ifaddr *ifp) refcount_inc(&ifp->refcnt); }
+static inline bool in6_ifa_hold_safe(struct inet6_ifaddr *ifp) +{ + return refcount_inc_not_zero(&ifp->refcnt); +}
/* * compute link-local solicited-node multicast address diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 055230b669cf2..37d48aa073c3c 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2061,9 +2061,10 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add if (ipv6_addr_equal(&ifp->addr, addr)) { if (!dev || ifp->idev->dev == dev || !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) { - result = ifp; - in6_ifa_hold(ifp); - break; + if (in6_ifa_hold_safe(ifp)) { + result = ifp; + break; + } } } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shannon Nelson shannon.nelson@amd.com
[ Upstream commit 2cbab3c296f1addd73b40549a2271b30f960df8b ]
We get the benefit of all the PCI reset locking and recovery if we use the existing pci_reset_function() that will call our local reset handlers.
Reviewed-by: Brett Creeley brett.creeley@amd.com Signed-off-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 81665adf25d2 ("pds_core: Fix pdsc_check_pci_health function to use work thread") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amd/pds_core/core.c | 3 +-- drivers/net/ethernet/amd/pds_core/core.h | 3 --- drivers/net/ethernet/amd/pds_core/main.c | 7 ++++--- 3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c index 7658a72867675..0d148795a8d09 100644 --- a/drivers/net/ethernet/amd/pds_core/core.c +++ b/drivers/net/ethernet/amd/pds_core/core.c @@ -609,8 +609,7 @@ static void pdsc_check_pci_health(struct pdsc *pdsc) if (fw_status != PDS_RC_BAD_PCI) return;
- pdsc_reset_prepare(pdsc->pdev); - pdsc_reset_done(pdsc->pdev); + pci_reset_function(pdsc->pdev); }
void pdsc_health_thread(struct work_struct *work) diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h index 110c4b826b22d..f410f7d132056 100644 --- a/drivers/net/ethernet/amd/pds_core/core.h +++ b/drivers/net/ethernet/amd/pds_core/core.h @@ -283,9 +283,6 @@ int pdsc_devcmd_init(struct pdsc *pdsc); int pdsc_devcmd_reset(struct pdsc *pdsc); int pdsc_dev_init(struct pdsc *pdsc);
-void pdsc_reset_prepare(struct pci_dev *pdev); -void pdsc_reset_done(struct pci_dev *pdev); - int pdsc_intr_alloc(struct pdsc *pdsc, char *name, irq_handler_t handler, void *data); void pdsc_intr_free(struct pdsc *pdsc, int index); diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c index 0050c5894563b..345b16127fe8b 100644 --- a/drivers/net/ethernet/amd/pds_core/main.c +++ b/drivers/net/ethernet/amd/pds_core/main.c @@ -468,7 +468,7 @@ static void pdsc_restart_health_thread(struct pdsc *pdsc) mod_timer(&pdsc->wdtimer, jiffies + 1); }
-void pdsc_reset_prepare(struct pci_dev *pdev) +static void pdsc_reset_prepare(struct pci_dev *pdev) { struct pdsc *pdsc = pci_get_drvdata(pdev);
@@ -477,10 +477,11 @@ void pdsc_reset_prepare(struct pci_dev *pdev)
pdsc_unmap_bars(pdsc); pci_release_regions(pdev); - pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); }
-void pdsc_reset_done(struct pci_dev *pdev) +static void pdsc_reset_done(struct pci_dev *pdev) { struct pdsc *pdsc = pci_get_drvdata(pdev); struct device *dev = pdsc->dev;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brett Creeley brett.creeley@amd.com
[ Upstream commit 81665adf25d28a00a986533f1d3a5df76b79cad9 ]
When the driver notices fw_status == 0xff it tries to perform a PCI reset on itself via pci_reset_function() in the context of the driver's health thread. However, pdsc_reset_prepare calls pdsc_stop_health_thread(), which attempts to stop/flush the health thread. This results in a deadlock because the stop/flush will never complete since the driver called pci_reset_function() from the health thread context. Fix by changing the pdsc_check_pci_health_function() to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's work queue.
Unloading the driver in the fw_down/dead state uncovered another issue, which can be seen in the following trace:
WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440 [...] RIP: 0010:__queue_work+0x358/0x440 [...] Call Trace: <TASK> ? __warn+0x85/0x140 ? __queue_work+0x358/0x440 ? report_bug+0xfc/0x1e0 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __queue_work+0x358/0x440 queue_work_on+0x28/0x30 pdsc_devcmd_locked+0x96/0xe0 [pds_core] pdsc_devcmd_reset+0x71/0xb0 [pds_core] pdsc_teardown+0x51/0xe0 [pds_core] pdsc_remove+0x106/0x200 [pds_core] pci_device_remove+0x37/0xc0 device_release_driver_internal+0xae/0x140 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xa0 pdsc_cleanup_module+0x10/0x780 [pds_core] __x64_sys_delete_module+0x142/0x2b0 ? syscall_trace_enter.isra.18+0x126/0x1a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fbd9d03a14b [...]
Fix this by preventing the devcmd reset if the FW is not running.
Fixes: d9407ff11809 ("pds_core: Prevent health thread from running during reset/remove") Reviewed-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: Brett Creeley brett.creeley@amd.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amd/pds_core/core.c | 13 ++++++++++++- drivers/net/ethernet/amd/pds_core/core.h | 2 ++ drivers/net/ethernet/amd/pds_core/dev.c | 3 +++ drivers/net/ethernet/amd/pds_core/main.c | 1 + 4 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c index 0d148795a8d09..dd4f0965bbe64 100644 --- a/drivers/net/ethernet/amd/pds_core/core.c +++ b/drivers/net/ethernet/amd/pds_core/core.c @@ -595,6 +595,16 @@ void pdsc_fw_up(struct pdsc *pdsc) pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY); }
+void pdsc_pci_reset_thread(struct work_struct *work) +{ + struct pdsc *pdsc = container_of(work, struct pdsc, pci_reset_work); + struct pci_dev *pdev = pdsc->pdev; + + pci_dev_get(pdev); + pci_reset_function(pdev); + pci_dev_put(pdev); +} + static void pdsc_check_pci_health(struct pdsc *pdsc) { u8 fw_status; @@ -609,7 +619,8 @@ static void pdsc_check_pci_health(struct pdsc *pdsc) if (fw_status != PDS_RC_BAD_PCI) return;
- pci_reset_function(pdsc->pdev); + /* prevent deadlock between pdsc_reset_prepare and pdsc_health_thread */ + queue_work(pdsc->wq, &pdsc->pci_reset_work); }
void pdsc_health_thread(struct work_struct *work) diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h index f410f7d132056..401ff56eba0dc 100644 --- a/drivers/net/ethernet/amd/pds_core/core.h +++ b/drivers/net/ethernet/amd/pds_core/core.h @@ -197,6 +197,7 @@ struct pdsc { struct pdsc_qcq notifyqcq; u64 last_eid; struct pdsc_viftype *viftype_status; + struct work_struct pci_reset_work; };
/** enum pds_core_dbell_bits - bitwise composition of dbell values. @@ -312,5 +313,6 @@ int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
void pdsc_fw_down(struct pdsc *pdsc); void pdsc_fw_up(struct pdsc *pdsc); +void pdsc_pci_reset_thread(struct work_struct *work);
#endif /* _PDSC_H_ */ diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c index e65a1632df505..bfb79c5aac391 100644 --- a/drivers/net/ethernet/amd/pds_core/dev.c +++ b/drivers/net/ethernet/amd/pds_core/dev.c @@ -229,6 +229,9 @@ int pdsc_devcmd_reset(struct pdsc *pdsc) .reset.opcode = PDS_CORE_CMD_RESET, };
+ if (!pdsc_is_fw_running(pdsc)) + return 0; + return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout); }
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c index 345b16127fe8b..a375d612d2875 100644 --- a/drivers/net/ethernet/amd/pds_core/main.c +++ b/drivers/net/ethernet/amd/pds_core/main.c @@ -238,6 +238,7 @@ static int pdsc_init_pf(struct pdsc *pdsc) snprintf(wq_name, sizeof(wq_name), "%s.%d", PDS_CORE_DRV_NAME, pdsc->uid); pdsc->wq = create_singlethread_workqueue(wq_name); INIT_WORK(&pdsc->health_work, pdsc_health_thread); + INIT_WORK(&pdsc->pci_reset_work, pdsc_pci_reset_thread); timer_setup(&pdsc->wdtimer, pdsc_wdtimer_cb, 0); pdsc->wdtimer_period = PDSC_WATCHDOG_SECS * HZ;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 42ed95de82c01184a88945d3ca274be6a7ea607d ]
This aligns broadcast sync_timeout with existing connection timeouts which are 20 seconds long.
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: b37cab587aa3 ("Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 2 ++ net/bluetooth/iso.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index 7ffa8c192c3f2..9fe95a22abeb7 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -164,6 +164,8 @@ struct bt_voice { #define BT_ISO_QOS_BIG_UNSET 0xff #define BT_ISO_QOS_BIS_UNSET 0xff
+#define BT_ISO_SYNC_TIMEOUT 0x07d0 /* 20 secs */ + struct bt_iso_io_qos { __u32 interval; __u16 latency; diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 04f6572d35f17..4fa1f3b779a71 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -837,10 +837,10 @@ static struct bt_iso_qos default_qos = { .bcode = {0x00}, .options = 0x00, .skip = 0x0000, - .sync_timeout = 0x4000, + .sync_timeout = BT_ISO_SYNC_TIMEOUT, .sync_cte_type = 0x00, .mse = 0x00, - .timeout = 0x4000, + .timeout = BT_ISO_SYNC_TIMEOUT, }, };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit b37cab587aa3c9ab29c6b10aa55627dad713011f ]
Consider certain values (0x00) as unset and load proper default if an application has not set them properly.
Fixes: 0fe8c8d07134 ("Bluetooth: Split bt_iso_qos into dedicated structures") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 4fa1f3b779a71..3681e3673654a 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1430,8 +1430,8 @@ static bool check_ucast_qos(struct bt_iso_qos *qos)
static bool check_bcast_qos(struct bt_iso_qos *qos) { - if (qos->bcast.sync_factor == 0x00) - return false; + if (!qos->bcast.sync_factor) + qos->bcast.sync_factor = 0x01;
if (qos->bcast.packing > 0x01) return false; @@ -1454,6 +1454,9 @@ static bool check_bcast_qos(struct bt_iso_qos *qos) if (qos->bcast.skip > 0x01f3) return false;
+ if (!qos->bcast.sync_timeout) + qos->bcast.sync_timeout = BT_ISO_SYNC_TIMEOUT; + if (qos->bcast.sync_timeout < 0x000a || qos->bcast.sync_timeout > 0x4000) return false;
@@ -1463,6 +1466,9 @@ static bool check_bcast_qos(struct bt_iso_qos *qos) if (qos->bcast.mse > 0x1f) return false;
+ if (!qos->bcast.timeout) + qos->bcast.sync_timeout = BT_ISO_SYNC_TIMEOUT; + if (qos->bcast.timeout < 0x000a || qos->bcast.timeout > 0x4000) return false;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 22cbf4f84c00da64196eb15034feee868e63eef0 ]
This used the hci_conn QoS to determine which PHY to scan when creating a PA Sync.
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Stable-dep-of: 53cb4197e63a ("Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY") Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 66 +++++++++++++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 11 deletions(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 824ce03bb361b..89bd1c1a3e0e8 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -2611,6 +2611,14 @@ static u8 hci_update_accept_list_sync(struct hci_dev *hdev) return filter_policy; }
+static void hci_le_scan_phy_params(struct hci_cp_le_scan_phy_params *cp, + u8 type, u16 interval, u16 window) +{ + cp->type = type; + cp->interval = cpu_to_le16(interval); + cp->window = cpu_to_le16(window); +} + static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type, u16 interval, u16 window, u8 own_addr_type, u8 filter_policy) @@ -2618,7 +2626,7 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type, struct hci_cp_le_set_ext_scan_params *cp; struct hci_cp_le_scan_phy_params *phy; u8 data[sizeof(*cp) + sizeof(*phy) * 2]; - u8 num_phy = 0; + u8 num_phy = 0x00;
cp = (void *)data; phy = (void *)cp->data; @@ -2628,28 +2636,64 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type, cp->own_addr_type = own_addr_type; cp->filter_policy = filter_policy;
+ /* Check if PA Sync is in progress then select the PHY based on the + * hci_conn.iso_qos. + */ + if (hci_dev_test_flag(hdev, HCI_PA_SYNC)) { + struct hci_cp_le_add_to_accept_list *sent; + + sent = hci_sent_cmd_data(hdev, HCI_OP_LE_ADD_TO_ACCEPT_LIST); + if (sent) { + struct hci_conn *conn; + + conn = hci_conn_hash_lookup_ba(hdev, ISO_LINK, + &sent->bdaddr); + if (conn) { + struct bt_iso_qos *qos = &conn->iso_qos; + + if (qos->bcast.in.phy & BT_ISO_PHY_1M || + qos->bcast.in.phy & BT_ISO_PHY_2M) { + cp->scanning_phys |= LE_SCAN_PHY_1M; + hci_le_scan_phy_params(phy, type, + interval, + window); + num_phy++; + phy++; + } + + if (qos->bcast.in.phy & BT_ISO_PHY_CODED) { + cp->scanning_phys |= LE_SCAN_PHY_CODED; + hci_le_scan_phy_params(phy, type, + interval, + window); + num_phy++; + phy++; + } + + if (num_phy) + goto done; + } + } + } + if (scan_1m(hdev) || scan_2m(hdev)) { cp->scanning_phys |= LE_SCAN_PHY_1M; - - phy->type = type; - phy->interval = cpu_to_le16(interval); - phy->window = cpu_to_le16(window); - + hci_le_scan_phy_params(phy, type, interval, window); num_phy++; phy++; }
if (scan_coded(hdev)) { cp->scanning_phys |= LE_SCAN_PHY_CODED; - - phy->type = type; - phy->interval = cpu_to_le16(interval); - phy->window = cpu_to_le16(window); - + hci_le_scan_phy_params(phy, type, interval, window); num_phy++; phy++; }
+done: + if (!num_phy) + return -EINVAL; + return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_EXT_SCAN_PARAMS, sizeof(*cp) + sizeof(*phy) * num_phy, data, HCI_CMD_TIMEOUT);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 53cb4197e63ab2363aa28c3029061e4d516e7626 ]
Coded PHY recommended intervals are 3 time bigger than the 1M PHY so this aligns with that by multiplying by 3 the values given to 1M PHY since the code already used recommended values for that.
Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 89bd1c1a3e0e8..e1050d7d21a59 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -2664,8 +2664,8 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type, if (qos->bcast.in.phy & BT_ISO_PHY_CODED) { cp->scanning_phys |= LE_SCAN_PHY_CODED; hci_le_scan_phy_params(phy, type, - interval, - window); + interval * 3, + window * 3); num_phy++; phy++; } @@ -2685,7 +2685,7 @@ static int hci_le_set_ext_scan_param_sync(struct hci_dev *hdev, u8 type,
if (scan_coded(hdev)) { cp->scanning_phys |= LE_SCAN_PHY_CODED; - hci_le_scan_phy_params(phy, type, interval, window); + hci_le_scan_phy_params(phy, type, interval * 3, window * 3); num_phy++; phy++; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 51eda36d33e43201e7a4fd35232e069b2c850b01 ]
syzbot reported sco_sock_setsockopt() is copying data without checking user input length.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90 net/bluetooth/sco.c:893 Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578
Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option") Fixes: b96e9c671b05 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket") Fixes: 00398e1d5183 ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections") Fixes: f6873401a608 ("Bluetooth: Allow setting of codec for HFP offload use case") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 9 +++++++++ net/bluetooth/sco.c | 23 ++++++++++------------- 2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index 9fe95a22abeb7..eaec5d6caa29d 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -585,6 +585,15 @@ static inline struct sk_buff *bt_skb_sendmmsg(struct sock *sk, return skb; }
+static inline int bt_copy_from_sockptr(void *dst, size_t dst_size, + sockptr_t src, size_t src_size) +{ + if (dst_size > src_size) + return -EINVAL; + + return copy_from_sockptr(dst, src, dst_size); +} + int bt_to_errno(u16 code); __u8 bt_status(int err);
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index c736186aba26b..8e4f39b8601cb 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -823,7 +823,7 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; - int len, err = 0; + int err = 0; struct bt_voice voice; u32 opt; struct bt_codecs *codecs; @@ -842,10 +842,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); @@ -862,11 +861,10 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
voice.setting = sco_pi(sk)->setting;
- len = min_t(unsigned int, sizeof(voice), optlen); - if (copy_from_sockptr(&voice, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&voice, sizeof(voice), optval, + optlen); + if (err) break; - }
/* Explicitly check for these values */ if (voice.setting != BT_VOICE_TRANSPARENT && @@ -889,10 +887,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_PKT_STATUS: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_PKT_STATUS, &bt_sk(sk)->flags); @@ -933,9 +930,9 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(buffer, optval, optlen)) { + err = bt_copy_from_sockptr(buffer, optlen, optval, optlen); + if (err) { hci_dev_put(hdev); - err = -EFAULT; break; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit a97de7bff13b1cc825c1b1344eaed8d6c2d3e695 ]
syzbot reported rfcomm_sock_setsockopt_old() is copying data without checking user input length.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old net/bluetooth/rfcomm/sock.c:632 [inline] BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70 net/bluetooth/rfcomm/sock.c:673 Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064
Fixes: 9f2c8a03fbb3 ("Bluetooth: Replace RFCOMM link mode with security level") Fixes: bb23c0ab8246 ("Bluetooth: Add support for deferring RFCOMM connection setup") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/rfcomm/sock.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index b54e8a530f55a..29aa07e9db9d7 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -629,7 +629,7 @@ static int rfcomm_sock_setsockopt_old(struct socket *sock, int optname,
switch (optname) { case RFCOMM_LM: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { + if (bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen)) { err = -EFAULT; break; } @@ -664,7 +664,6 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname, struct sock *sk = sock->sk; struct bt_security sec; int err = 0; - size_t len; u32 opt;
BT_DBG("sk %p", sk); @@ -686,11 +685,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
sec.level = BT_SECURITY_LOW;
- len = min_t(unsigned int, sizeof(sec), optlen); - if (copy_from_sockptr(&sec, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen); + if (err) break; - }
if (sec.level > BT_SECURITY_HIGH) { err = -EINVAL; @@ -706,10 +703,9 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 4f3951242ace5efc7131932e2e01e6ac6baed846 ]
Check user input length before copying data.
Fixes: 33575df7be67 ("Bluetooth: move l2cap_sock_setsockopt() to l2cap_sock.c") Fixes: 3ee7b7cd8390 ("Bluetooth: Add BT_MODE socket option") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_sock.c | 52 +++++++++++++++----------------------- 1 file changed, 20 insertions(+), 32 deletions(-)
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index ee7a41d6994fc..1eeea5d1306c2 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -726,7 +726,7 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, struct sock *sk = sock->sk; struct l2cap_chan *chan = l2cap_pi(sk)->chan; struct l2cap_options opts; - int len, err = 0; + int err = 0; u32 opt;
BT_DBG("sk %p", sk); @@ -753,11 +753,9 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, opts.max_tx = chan->max_tx; opts.txwin_size = chan->tx_win;
- len = min_t(unsigned int, sizeof(opts), optlen); - if (copy_from_sockptr(&opts, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opts, sizeof(opts), optval, optlen); + if (err) break; - }
if (opts.txwin_size > L2CAP_DEFAULT_EXT_WINDOW) { err = -EINVAL; @@ -800,10 +798,9 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, break;
case L2CAP_LM: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt & L2CAP_LM_FIPS) { err = -EINVAL; @@ -884,7 +881,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, struct bt_security sec; struct bt_power pwr; struct l2cap_conn *conn; - int len, err = 0; + int err = 0; u32 opt; u16 mtu; u8 mode; @@ -910,11 +907,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
sec.level = BT_SECURITY_LOW;
- len = min_t(unsigned int, sizeof(sec), optlen); - if (copy_from_sockptr(&sec, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen); + if (err) break; - }
if (sec.level < BT_SECURITY_LOW || sec.level > BT_SECURITY_FIPS) { @@ -959,10 +954,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) { set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); @@ -974,10 +968,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_FLUSHABLE: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt > BT_FLUSHABLE_ON) { err = -EINVAL; @@ -1009,11 +1002,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
pwr.force_active = BT_POWER_FORCE_ACTIVE_ON;
- len = min_t(unsigned int, sizeof(pwr), optlen); - if (copy_from_sockptr(&pwr, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&pwr, sizeof(pwr), optval, optlen); + if (err) break; - }
if (pwr.force_active) set_bit(FLAG_FORCE_ACTIVE, &chan->flags); @@ -1022,10 +1013,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_CHANNEL_POLICY: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
err = -EOPNOTSUPP; break; @@ -1054,10 +1044,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&mtu, optval, sizeof(u16))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&mtu, sizeof(mtu), optval, optlen); + if (err) break; - }
if (chan->mode == L2CAP_MODE_EXT_FLOWCTL && sk->sk_state == BT_CONNECTED) @@ -1085,10 +1074,9 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&mode, optval, sizeof(u8))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&mode, sizeof(mode), optval, optlen); + if (err) break; - }
BT_DBG("mode %u", mode);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 9e8742cdfc4b0e65266bb4a901a19462bda9285e ]
Check user input length before copying data.
Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Fixes: 0731c5ab4d51 ("Bluetooth: ISO: Add support for BT_PKT_STATUS") Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 36 ++++++++++++------------------------ 1 file changed, 12 insertions(+), 24 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 3681e3673654a..a8b05baa8e5a9 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1479,7 +1479,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; - int len, err = 0; + int err = 0; struct bt_iso_qos qos = default_qos; u32 opt;
@@ -1494,10 +1494,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); @@ -1506,10 +1505,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_PKT_STATUS: - if (copy_from_sockptr(&opt, optval, sizeof(u32))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt) set_bit(BT_SK_PKT_STATUS, &bt_sk(sk)->flags); @@ -1524,17 +1522,9 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- len = min_t(unsigned int, sizeof(qos), optlen); - - if (copy_from_sockptr(&qos, optval, len)) { - err = -EFAULT; - break; - } - - if (len == sizeof(qos.ucast) && !check_ucast_qos(&qos)) { - err = -EINVAL; + err = bt_copy_from_sockptr(&qos, sizeof(qos), optval, optlen); + if (err) break; - }
iso_pi(sk)->qos = qos; iso_pi(sk)->qos_user_set = true; @@ -1549,18 +1539,16 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, }
if (optlen > sizeof(iso_pi(sk)->base)) { - err = -EOVERFLOW; + err = -EINVAL; break; }
- len = min_t(unsigned int, sizeof(iso_pi(sk)->base), optlen); - - if (copy_from_sockptr(iso_pi(sk)->base, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(iso_pi(sk)->base, optlen, optval, + optlen); + if (err) break; - }
- iso_pi(sk)->base_len = len; + iso_pi(sk)->base_len = optlen;
break;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit b2186061d6043d6345a97100460363e990af0d46 ]
Check user input length before copying data.
Fixes: 09572fca7223 ("Bluetooth: hci_sock: Add support for BT_{SND,RCV}BUF") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sock.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 3e7cd330d731a..3f5f0932330d2 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -1946,10 +1946,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname,
switch (optname) { case HCI_DATA_DIR: - if (copy_from_sockptr(&opt, optval, sizeof(opt))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + if (err) break; - }
if (opt) hci_pi(sk)->cmsg_mask |= HCI_CMSG_DIR; @@ -1958,10 +1957,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, break;
case HCI_TIME_STAMP: - if (copy_from_sockptr(&opt, optval, sizeof(opt))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + if (err) break; - }
if (opt) hci_pi(sk)->cmsg_mask |= HCI_CMSG_TSTAMP; @@ -1979,11 +1977,9 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, uf.event_mask[1] = *((u32 *) f->event_mask + 1); }
- len = min_t(unsigned int, len, sizeof(uf)); - if (copy_from_sockptr(&uf, optval, len)) { - err = -EFAULT; + err = bt_copy_from_sockptr(&uf, sizeof(uf), optval, len); + if (err) break; - }
if (!capable(CAP_NET_RAW)) { uf.type_mask &= hci_sec_filter.type_mask; @@ -2042,10 +2038,9 @@ static int hci_sock_setsockopt(struct socket *sock, int level, int optname, goto done; }
- if (copy_from_sockptr(&opt, optval, sizeof(opt))) { - err = -EFAULT; + err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + if (err) break; - }
hci_pi(sk)->mtu = opt; break;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Archie Pusaka apusaka@chromium.org
[ Upstream commit 600b0bbe73d3a9a264694da0e4c2c0800309141e ]
The bit is set and tested inside mgmt_device_connected(), therefore we must not set it just outside the function.
Fixes: eeda1bf97bb5 ("Bluetooth: hci_event: Fix not indicating new connection for BIG Sync") Signed-off-by: Archie Pusaka apusaka@chromium.org Reviewed-by: Manish Mandlik mmandlik@chromium.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index ab5a9d42fae71..706d2478ddb33 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4054,8 +4054,7 @@ static int l2cap_connect_req(struct l2cap_conn *conn, return -EPROTO;
hci_dev_lock(hdev); - if (hci_dev_test_flag(hdev, HCI_MGMT) && - !test_and_set_bit(HCI_CONN_MGMT_CONNECTED, &hcon->flags)) + if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_device_connected(hdev, hcon, NULL, 0); hci_dev_unlock(hdev);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 65acf6e0501ac8880a4f73980d01b5d27648b956 ]
In my recent commit, I missed that do_replace() handlers use copy_from_sockptr() (which I fixed), followed by unsafe copy_from_sockptr_offset() calls.
In all functions, we can perform the @optlen validation before even calling xt_alloc_table_info() with the following check:
if ((u64)optlen < (u64)tmp.size + sizeof(tmp)) return -EINVAL;
Fixes: 0c83842df40f ("netfilter: validate user input for expected length") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/netfilter/arp_tables.c | 4 ++++ net/ipv4/netfilter/ip_tables.c | 4 ++++ net/ipv6/netfilter/ip6_tables.c | 4 ++++ 3 files changed, 12 insertions(+)
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index b150c9929b12e..14365b20f1c5c 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -966,6 +966,8 @@ static int do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
@@ -1266,6 +1268,8 @@ static int compat_do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 4876707595781..fe89a056eb06c 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -1118,6 +1118,8 @@ do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
@@ -1504,6 +1506,8 @@ compat_do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 636b360311c53..131f7bb2110d3 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1135,6 +1135,8 @@ do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
@@ -1513,6 +1515,8 @@ compat_do_replace(struct net *net, sockptr_t arg, unsigned int len) return -ENOMEM; if (tmp.num_counters == 0) return -EINVAL; + if ((u64)len < (u64)tmp.size + sizeof(tmp)) + return -EINVAL;
tmp.name[sizeof(tmp.name)-1] = 0;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 137cef6d55564fb687d12fbc5f85be43ff7b53a7 ]
When PF/VF teardown is called the driver sets the flag MLX5_BREAK_FW_WAIT to stop waiting for FW loading and initializing. Same should be applied to SF driver teardown to cut waiting time. On mlx5_sf_dev_remove() set the flag before draining health WQ as recovery flow may also wait for FW reloading while it is not relevant anymore.
Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Aya Levin ayal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock") Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/sf/dev/driver.c | 21 ++++++++++++------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c index 169c2c68ed5c2..bc863e1f062e6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c @@ -95,24 +95,29 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia static void mlx5_sf_dev_remove(struct auxiliary_device *adev) { struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev); - struct devlink *devlink = priv_to_devlink(sf_dev->mdev); + struct mlx5_core_dev *mdev = sf_dev->mdev; + struct devlink *devlink;
- mlx5_drain_health_wq(sf_dev->mdev); + devlink = priv_to_devlink(mdev); + set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state); + mlx5_drain_health_wq(mdev); devlink_unregister(devlink); - if (mlx5_dev_is_lightweight(sf_dev->mdev)) - mlx5_uninit_one_light(sf_dev->mdev); + if (mlx5_dev_is_lightweight(mdev)) + mlx5_uninit_one_light(mdev); else - mlx5_uninit_one(sf_dev->mdev); - iounmap(sf_dev->mdev->iseg); - mlx5_mdev_uninit(sf_dev->mdev); + mlx5_uninit_one(mdev); + iounmap(mdev->iseg); + mlx5_mdev_uninit(mdev); mlx5_devlink_free(devlink); }
static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev) { struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev); + struct mlx5_core_dev *mdev = sf_dev->mdev;
- mlx5_unload_one(sf_dev->mdev, false); + set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state); + mlx5_unload_one(mdev, false); }
static const struct auxiliary_device_id mlx5_sf_dev_id_table[] = {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shay Drory shayd@nvidia.com
[ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]
In case device is having a non fatal FW error during probe, the driver will report the error to user via devlink. This will trigger a WARN_ON, since mlx5 is calling devlink_register() last. In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register() first under devlink lock.
[1] WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0 CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core] RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0 Call Trace: <TASK> ? __warn+0x79/0x120 ? devlink_recover_notify.constprop.0+0xb8/0xc0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? devlink_recover_notify.constprop.0+0xb8/0xc0 devlink_health_report+0x4a/0x1c0 mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core] process_one_work+0x1bb/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x4d/0x3c0 ? process_one_work+0x3c0/0x3c0 kthread+0xc6/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK>
Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/main.c | 37 ++++++++++--------- .../mellanox/mlx5/core/sf/dev/driver.c | 1 - 2 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index bccf6e53556c6..131a836c127e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1480,6 +1480,14 @@ int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev) if (err) goto err_register;
+ err = mlx5_crdump_enable(dev); + if (err) + mlx5_core_err(dev, "mlx5_crdump_enable failed with error code %d\n", err); + + err = mlx5_hwmon_dev_register(dev); + if (err) + mlx5_core_err(dev, "mlx5_hwmon_dev_register failed with error code %d\n", err); + mutex_unlock(&dev->intf_state_mutex); return 0;
@@ -1505,7 +1513,10 @@ int mlx5_init_one(struct mlx5_core_dev *dev) int err;
devl_lock(devlink); + devl_register(devlink); err = mlx5_init_one_devl_locked(dev); + if (err) + devl_unregister(devlink); devl_unlock(devlink); return err; } @@ -1517,6 +1528,8 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev) devl_lock(devlink); mutex_lock(&dev->intf_state_mutex);
+ mlx5_hwmon_dev_unregister(dev); + mlx5_crdump_disable(dev); mlx5_unregister_device(dev);
if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) { @@ -1534,6 +1547,7 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev) mlx5_function_teardown(dev, true); out: mutex_unlock(&dev->intf_state_mutex); + devl_unregister(devlink); devl_unlock(devlink); }
@@ -1680,16 +1694,20 @@ int mlx5_init_one_light(struct mlx5_core_dev *dev) }
devl_lock(devlink); + devl_register(devlink); + err = mlx5_devlink_params_register(priv_to_devlink(dev)); - devl_unlock(devlink); if (err) { mlx5_core_warn(dev, "mlx5_devlink_param_reg err = %d\n", err); goto query_hca_caps_err; }
+ devl_unlock(devlink); return 0;
query_hca_caps_err: + devl_unregister(devlink); + devl_unlock(devlink); mlx5_function_disable(dev, true); out: dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; @@ -1702,6 +1720,7 @@ void mlx5_uninit_one_light(struct mlx5_core_dev *dev)
devl_lock(devlink); mlx5_devlink_params_unregister(priv_to_devlink(dev)); + devl_unregister(devlink); devl_unlock(devlink); if (dev->state != MLX5_DEVICE_STATE_UP) return; @@ -1943,16 +1962,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id) goto err_init_one; }
- err = mlx5_crdump_enable(dev); - if (err) - dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code %d\n", err); - - err = mlx5_hwmon_dev_register(dev); - if (err) - mlx5_core_err(dev, "mlx5_hwmon_dev_register failed with error code %d\n", err); - pci_save_state(pdev); - devlink_register(devlink); return 0;
err_init_one: @@ -1973,16 +1983,9 @@ static void remove_one(struct pci_dev *pdev) struct devlink *devlink = priv_to_devlink(dev);
set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state); - /* mlx5_drain_fw_reset() and mlx5_drain_health_wq() are using - * devlink notify APIs. - * Hence, we must drain them before unregistering the devlink. - */ mlx5_drain_fw_reset(dev); mlx5_drain_health_wq(dev); - devlink_unregister(devlink); mlx5_sriov_disable(pdev, false); - mlx5_hwmon_dev_unregister(dev); - mlx5_crdump_disable(dev); mlx5_uninit_one(dev); mlx5_pci_close(dev); mlx5_mdev_uninit(dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c index bc863e1f062e6..e3bf8c7e4baa6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c @@ -101,7 +101,6 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev) devlink = priv_to_devlink(mdev); set_bit(MLX5_BREAK_FW_WAIT, &mdev->intf_state); mlx5_drain_health_wq(mdev); - devlink_unregister(devlink); if (mlx5_dev_is_lightweight(mdev)) mlx5_uninit_one_light(mdev); else
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Liang mliang@purestorage.com
[ Upstream commit 9f7e8fbb91f8fa29548e2f6ab50c03b628c67ede ]
The mlx5 comp irq name scheme is changed a little bit between commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index") and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation"). The index in the comp irq name used to start from 0 but now it starts from 1. There is nothing critical here, but it's harmless to change back to the old behavior, a.k.a starting from 0.
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Reviewed-by: Mohamed Khalfella mkhalfella@purestorage.com Reviewed-by: Yuanyuan Zhong yzhong@purestorage.com Signed-off-by: Michael Liang mliang@purestorage.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c index 4dcf995cb1a20..6bac8ad70ba60 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c @@ -19,6 +19,7 @@ #define MLX5_IRQ_CTRL_SF_MAX 8 /* min num of vectors for SFs to be enabled */ #define MLX5_IRQ_VEC_COMP_BASE_SF 2 +#define MLX5_IRQ_VEC_COMP_BASE 1
#define MLX5_EQ_SHARE_IRQ_MAX_COMP (8) #define MLX5_EQ_SHARE_IRQ_MAX_CTRL (UINT_MAX) @@ -246,6 +247,7 @@ static void irq_set_name(struct mlx5_irq_pool *pool, char *name, int vecidx) return; }
+ vecidx -= MLX5_IRQ_VEC_COMP_BASE; snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", vecidx); }
@@ -585,7 +587,7 @@ struct mlx5_irq *mlx5_irq_request_vector(struct mlx5_core_dev *dev, u16 cpu, struct mlx5_irq_table *table = mlx5_irq_table_get(dev); struct mlx5_irq_pool *pool = table->pcif_pool; struct irq_affinity_desc af_desc; - int offset = 1; + int offset = MLX5_IRQ_VEC_COMP_BASE;
if (!pool->xa_num_irqs.max) offset = 0;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Ratiu cratiu@nvidia.com
[ Upstream commit 7c6782ad4911cbee874e85630226ed389ff2e453 ]
Previously, add_rule_fg would only add newly created rules from the handle into the tree when they had a refcount of 1. On the other hand, create_flow_handle tries hard to find and reference already existing identical rules instead of creating new ones.
These two behaviors can result in a situation where create_flow_handle 1) creates a new rule and references it, then 2) in a subsequent step during the same handle creation references it again, resulting in a rule with a refcount of 2 that is not linked into the tree, will have a NULL parent and root and will result in a crash when the flow group is deleted because del_sw_hw_rule, invoked on rule deletion, assumes node->parent is != NULL.
This happened in the wild, due to another bug related to incorrect handling of duplicate pkt_reformat ids, which lead to the code in create_flow_handle incorrectly referencing a just-added rule in the same flow handle, resulting in the problem described above. Full details are at [1].
This patch changes add_rule_fg to add new rules without parents into the tree, properly initializing them and avoiding the crash. This makes it more consistent with how rules are added to an FTE in create_flow_handle.
Fixes: 74491de93712 ("net/mlx5: Add multi dest support") Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/... [1] Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index e6bfa7e4f146c..2a9421342a503 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1808,8 +1808,9 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg, } trace_mlx5_fs_set_fte(fte, false);
+ /* Link newly added rules into the tree. */ for (i = 0; i < handle->num_rules; i++) { - if (refcount_read(&handle->rule[i]->node.refcount) == 1) { + if (!handle->rule[i]->node.parent) { tree_add_node(&handle->rule[i]->node, &fte->node); trace_mlx5_fs_add_rule(handle->rule[i]); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Ratiu cratiu@nvidia.com
[ Upstream commit 9eca93f4d5ab03905516a68683674d9c50ff95bd ]
struct mlx5_pkt_reformat contains a naked union of a u32 id and a dr_action pointer which is used when the action is SW-managed (when pkt_reformat.owner is set to MLX5_FLOW_RESOURCE_OWNER_SW). Using id directly in that case is incorrect, as it maps to the least significant 32 bits of the 64-bit pointer in mlx5_fs_dr_action and not to the pkt reformat id allocated in firmware.
For the purpose of comparing whether two rules are identical, interpreting the least significant 32 bits of the mlx5_fs_dr_action pointer as an id mostly works... until it breaks horribly and produces the outcome described in [1].
This patch fixes mlx5_flow_dests_cmp to correctly compare ids using mlx5_fs_dr_action_get_pkt_reformat_id for the SW-managed rules.
Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/... [1]
Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 2a9421342a503..cf085a478e3e4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1664,6 +1664,16 @@ static int create_auto_flow_group(struct mlx5_flow_table *ft, return err; }
+static bool mlx5_pkt_reformat_cmp(struct mlx5_pkt_reformat *p1, + struct mlx5_pkt_reformat *p2) +{ + return p1->owner == p2->owner && + (p1->owner == MLX5_FLOW_RESOURCE_OWNER_FW ? + p1->id == p2->id : + mlx5_fs_dr_action_get_pkt_reformat_id(p1) == + mlx5_fs_dr_action_get_pkt_reformat_id(p2)); +} + static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1, struct mlx5_flow_destination *d2) { @@ -1675,8 +1685,8 @@ static bool mlx5_flow_dests_cmp(struct mlx5_flow_destination *d1, ((d1->vport.flags & MLX5_FLOW_DEST_VPORT_VHCA_ID) ? (d1->vport.vhca_id == d2->vport.vhca_id) : true) && ((d1->vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID) ? - (d1->vport.pkt_reformat->id == - d2->vport.pkt_reformat->id) : true)) || + mlx5_pkt_reformat_cmp(d1->vport.pkt_reformat, + d2->vport.pkt_reformat) : true)) || (d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE && d1->ft == d2->ft) || (d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit ee3572409f74a838154af74ce1e56e62c17786a8 ]
Changing the channels number after configuring the receive flow hash indirection table may affect the RSS table size. The previous configuration may no longer be compatible with the new receive flow hash indirection table.
Block changing the channels number when RXFH is configured and changing the channels number requires resizing the RSS table size.
Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels") Signed-off-by: Carolina Jubran cjubran@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/en_ethtool.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index cc51ce16df14a..93461b0c5703b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -451,6 +451,23 @@ int mlx5e_ethtool_set_channels(struct mlx5e_priv *priv,
mutex_lock(&priv->state_lock);
+ /* If RXFH is configured, changing the channels number is allowed only if + * it does not require resizing the RSS table. This is because the previous + * configuration may no longer be compatible with the new RSS table. + */ + if (netif_is_rxfh_configured(priv->netdev)) { + int cur_rqt_size = mlx5e_rqt_size(priv->mdev, cur_params->num_channels); + int new_rqt_size = mlx5e_rqt_size(priv->mdev, count); + + if (new_rqt_size != cur_rqt_size) { + err = -EINVAL; + netdev_err(priv->netdev, + "%s: RXFH is configured, block changing channels number that affects RSS table size (new: %d, current: %d)\n", + __func__, new_rqt_size, cur_rqt_size); + goto out; + } + } + /* Don't allow changing the number of channels if HTB offload is active, * because the numeration of the QoS SQs will change, while per-queue * qdiscs are attached.
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit ecb829459a841198e142f72fadab56424ae96519 ]
When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using lockdep_is_held().
Acquire the state_lock in mlx5e_selq_cleanup().
Kernel log: ============================= WARNING: suspicious RCU usage 6.8.0-rc3_net_next_841a9b5 #1 Not tainted ----------------------------- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by systemd-modules/293: #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core] #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core]
stack backtrace: CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8a/0xa0 lockdep_rcu_suspicious+0x154/0x1a0 mlx5e_selq_apply+0x94/0xa0 [mlx5_core] mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core] mlx5e_priv_init+0x2be/0x2f0 [mlx5_core] mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core] rdma_init_netdev+0x4e/0x80 [ib_core] ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core] ipoib_intf_init+0x64/0x550 [ib_ipoib] ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib] ipoib_add_one+0xb0/0x360 [ib_ipoib] add_client_context+0x112/0x1c0 [ib_core] ib_register_client+0x166/0x1b0 [ib_core] ? 0xffffffffa0573000 ipoib_init_module+0xeb/0x1a0 [ib_ipoib] do_one_initcall+0x61/0x250 do_init_module+0x8a/0x270 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x17d/0x230 __x64_sys_finit_module+0x61/0xb0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e </TASK>
Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters") Signed-off-by: Carolina Jubran cjubran@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/selq.c b/drivers/net/ethernet/mellanox/mlx5/core/en/selq.c index f675b1926340f..f66bbc8464645 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/selq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/selq.c @@ -57,6 +57,7 @@ int mlx5e_selq_init(struct mlx5e_selq *selq, struct mutex *state_lock)
void mlx5e_selq_cleanup(struct mlx5e_selq *selq) { + mutex_lock(selq->state_lock); WARN_ON_ONCE(selq->is_prepared);
kvfree(selq->standby); @@ -67,6 +68,7 @@ void mlx5e_selq_cleanup(struct mlx5e_selq *selq)
kvfree(selq->standby); selq->standby = NULL; + mutex_unlock(selq->state_lock); }
void mlx5e_selq_prepare_params(struct mlx5e_selq *selq, struct mlx5e_params *params) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index c8e8f512803ef..952f1f98138cc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5695,9 +5695,7 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv) kfree(priv->tx_rates); kfree(priv->txq2sq); destroy_workqueue(priv->wq); - mutex_lock(&priv->state_lock); mlx5e_selq_cleanup(&priv->selq); - mutex_unlock(&priv->state_lock); free_cpumask_var(priv->scratchpad.cpumask);
for (i = 0; i < priv->htb_max_qos_sqs; i++)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit 2f436f1869771d46e1a9f85738d5a1a7c5653a4e ]
When creating a new HTB class while the interface is down, the variable that follows the number of QoS SQs (htb_max_qos_sqs) may not be consistent with the number of HTB classes.
Previously, we compared these two values to ensure that the node_qid is lower than the number of QoS SQs, and we allocated stats for that SQ when they are equal.
Change the check to compare the node_qid with the current number of leaf nodes and fix the checking conditions to ensure allocation of stats_list and stats for each node.
Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Carolina Jubran cjubran@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/qos.c | 33 ++++++++++--------- 1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c index 34adf8c3f81a0..922bc5b7c10e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c @@ -83,24 +83,25 @@ int mlx5e_open_qos_sq(struct mlx5e_priv *priv, struct mlx5e_channels *chs,
txq_ix = mlx5e_qid_from_qos(chs, node_qid);
- WARN_ON(node_qid > priv->htb_max_qos_sqs); - if (node_qid == priv->htb_max_qos_sqs) { - struct mlx5e_sq_stats *stats, **stats_list = NULL; - - if (priv->htb_max_qos_sqs == 0) { - stats_list = kvcalloc(mlx5e_qos_max_leaf_nodes(priv->mdev), - sizeof(*stats_list), - GFP_KERNEL); - if (!stats_list) - return -ENOMEM; - } + WARN_ON(node_qid >= mlx5e_htb_cur_leaf_nodes(priv->htb)); + if (!priv->htb_qos_sq_stats) { + struct mlx5e_sq_stats **stats_list; + + stats_list = kvcalloc(mlx5e_qos_max_leaf_nodes(priv->mdev), + sizeof(*stats_list), GFP_KERNEL); + if (!stats_list) + return -ENOMEM; + + WRITE_ONCE(priv->htb_qos_sq_stats, stats_list); + } + + if (!priv->htb_qos_sq_stats[node_qid]) { + struct mlx5e_sq_stats *stats; + stats = kzalloc(sizeof(*stats), GFP_KERNEL); - if (!stats) { - kvfree(stats_list); + if (!stats) return -ENOMEM; - } - if (stats_list) - WRITE_ONCE(priv->htb_qos_sq_stats, stats_list); + WRITE_ONCE(priv->htb_qos_sq_stats[node_qid], stats); /* Order htb_max_qos_sqs increment after writing the array pointer. * Pairs with smp_load_acquire in en_stats.c.
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rahul Rameshbabu rrameshbabu@nvidia.com
[ Upstream commit 86b0ca5b118d3a0bae5e5645a13e66f8a4f6c525 ]
Free Tx port timestamping metadata entries in the NAPI poll context and consume metadata enties in the WQE xmit path. Do not free a Tx port timestamping metadata entry in the WQE xmit path even in the error path to avoid a race between two metadata entry producers.
Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu rrameshbabu@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240409190820.227554-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h | 8 +++++++- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 7 +++---- 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h index 86f1854698b4e..883c044852f1d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h @@ -95,9 +95,15 @@ static inline void mlx5e_ptp_metadata_fifo_push(struct mlx5e_ptp_metadata_fifo * }
static inline u8 +mlx5e_ptp_metadata_fifo_peek(struct mlx5e_ptp_metadata_fifo *fifo) +{ + return fifo->data[fifo->mask & fifo->cc]; +} + +static inline void mlx5e_ptp_metadata_fifo_pop(struct mlx5e_ptp_metadata_fifo *fifo) { - return fifo->data[fifo->mask & fifo->cc++]; + fifo->cc++; }
static inline void diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 2fa076b23fbea..e21a3b4128ce8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -398,6 +398,8 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb, (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) { u8 metadata_index = be32_to_cpu(eseg->flow_table_metadata);
+ mlx5e_ptp_metadata_fifo_pop(&sq->ptpsq->metadata_freelist); + mlx5e_skb_cb_hwtstamp_init(skb); mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb, metadata_index); @@ -496,9 +498,6 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
err_drop: stats->dropped++; - if (unlikely(sq->ptpsq && (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) - mlx5e_ptp_metadata_fifo_push(&sq->ptpsq->metadata_freelist, - be32_to_cpu(eseg->flow_table_metadata)); dev_kfree_skb_any(skb); mlx5e_tx_flush(sq); } @@ -657,7 +656,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb, { if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) eseg->flow_table_metadata = - cpu_to_be32(mlx5e_ptp_metadata_fifo_pop(&ptpsq->metadata_freelist)); + cpu_to_be32(mlx5e_ptp_metadata_fifo_peek(&ptpsq->metadata_freelist)); }
static void mlx5e_txwqe_build_eseg(struct mlx5e_priv *priv, struct mlx5e_txqsq *sq,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Machon daniel.machon@microchip.com
[ Upstream commit 33623113a48ea906f1955cbf71094f6aa4462e8f ]
The wrong port config is being used if the PCS is reconfigured. Fix this by correctly using the new config instead of the old one.
Fixes: 946e7fd5053a ("net: sparx5: add port module support") Signed-off-by: Daniel Machon daniel.machon@microchip.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a50... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/sparx5/sparx5_port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c index 3a1b1a1f5a195..60dd2fd603a85 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c @@ -731,7 +731,7 @@ static int sparx5_port_pcs_low_set(struct sparx5 *sparx5, bool sgmii = false, inband_aneg = false; int err;
- if (port->conf.inband) { + if (conf->inband) { if (conf->portmode == PHY_INTERFACE_MODE_SGMII || conf->portmode == PHY_INTERFACE_MODE_QSGMII) inband_aneg = true; /* Cisco-SGMII in-band-aneg */ @@ -948,7 +948,7 @@ int sparx5_port_pcs_set(struct sparx5 *sparx5, if (err) return -EINVAL;
- if (port->conf.inband) { + if (conf->inband) { /* Enable/disable 1G counters in ASM */ spx5_rmw(ASM_PORT_CFG_CSC_STAT_DIS_SET(high_speed_dev), ASM_PORT_CFG_CSC_STAT_DIS,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerd Bayer gbayer@linux.ibm.com
[ Upstream commit d51dc8dd6ab6f93a894ff8b38d3b8d02c98eb9fb ]
This reverts commit 58effa3476536215530c9ec4910ffc981613b413. Review was not finished on this patch. So it's not ready for upstreaming.
Signed-off-by: Gerd Bayer gbayer@linux.ibm.com Link: https://lore.kernel.org/r/20240409113753.2181368-1-gbayer@linux.ibm.com Fixes: 58effa347653 ("s390/ism: fix receive message buffer allocation") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/ism_drv.c | 38 +++++++++----------------------------- 1 file changed, 9 insertions(+), 29 deletions(-)
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index affb05521e146..2c8e964425dc3 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -14,8 +14,6 @@ #include <linux/err.h> #include <linux/ctype.h> #include <linux/processor.h> -#include <linux/dma-mapping.h> -#include <linux/mm.h>
#include "ism.h"
@@ -294,15 +292,13 @@ static int ism_read_local_gid(struct ism_dev *ism) static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { clear_bit(dmb->sba_idx, ism->sba_bitmap); - dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len, - DMA_FROM_DEVICE); - folio_put(virt_to_folio(dmb->cpu_addr)); + dma_free_coherent(&ism->pdev->dev, dmb->dmb_len, + dmb->cpu_addr, dmb->dma_addr); }
static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { unsigned long bit; - int rc;
if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev)) return -EINVAL; @@ -319,30 +315,14 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb) test_and_set_bit(dmb->sba_idx, ism->sba_bitmap)) return -EINVAL;
- dmb->cpu_addr = - folio_address(folio_alloc(GFP_KERNEL | __GFP_NOWARN | - __GFP_NOMEMALLOC | __GFP_NORETRY, - get_order(dmb->dmb_len))); + dmb->cpu_addr = dma_alloc_coherent(&ism->pdev->dev, dmb->dmb_len, + &dmb->dma_addr, + GFP_KERNEL | __GFP_NOWARN | + __GFP_NOMEMALLOC | __GFP_NORETRY); + if (!dmb->cpu_addr) + clear_bit(dmb->sba_idx, ism->sba_bitmap);
- if (!dmb->cpu_addr) { - rc = -ENOMEM; - goto out_bit; - } - dmb->dma_addr = dma_map_page(&ism->pdev->dev, - virt_to_page(dmb->cpu_addr), 0, - dmb->dmb_len, DMA_FROM_DEVICE); - if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) { - rc = -ENOMEM; - goto out_free; - } - - return 0; - -out_free: - kfree(dmb->cpu_addr); -out_bit: - clear_bit(dmb->sba_idx, ism->sba_bitmap); - return rc; + return dmb->cpu_addr ? 0 : -ENOMEM; }
int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit 17c560113231ddc20088553c7b499b289b664311 ]
In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer (DLL) of the Open Systems Interconnection basic reference model (OSI/RM) are described; the medium access control (MAC) and logical link control (LLC) sublayers. The MAC sublayer is the one facing the physical layer.
In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A Bridge component comprises a MAC Relay Entity for interconnecting the Ports of the Bridge, at least two Ports, and higher layer entities with at least a Spanning Tree Protocol Entity included.
Each Bridge Port also functions as an end station and shall provide the MAC Service to an LLC Entity. Each instance of the MAC Service is provided to a distinct LLC Entity that supports protocol identification, multiplexing, and demultiplexing, for protocol data unit (PDU) transmission and reception by one or more higher layer entities.
It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC Entity associated with each Bridge Port is modeled as being directly connected to the attached Local Area Network (LAN).
On the switch with CPU port architecture, CPU port functions as Management Port, and the Management Port functionality is provided by software which functions as an end station. Software is connected to an IEEE 802 LAN that is wholly contained within the system that incorporates the Bridge. Software provides access to the LLC Entity associated with each Bridge Port by the value of the source port field on the special tag on the frame received by software.
We call frames that carry control information to determine the active topology and current extent of each Virtual Local Area Network (VLAN), i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN Registration Protocol Data Units (MVRPDUs), and frames from other link constrained protocols, such as Extensible Authentication Protocol over LAN (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They are not forwarded by a Bridge. Permanently configured entries in the filtering database (FDB) ensure that such frames are discarded by the Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in detail:
Each of the reserved MAC addresses specified in Table 8-1 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be permanently configured in the FDB in C-VLAN components and ERs.
Each of the reserved MAC addresses specified in Table 8-2 (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently configured in the FDB in S-VLAN components.
Each of the reserved MAC addresses specified in Table 8-3 (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB in TPMR components.
The FDB entries for reserved MAC addresses shall specify filtering for all Bridge Ports and all VIDs. Management shall not provide the capability to modify or remove entries for reserved MAC addresses.
The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of propagation of PDUs within a Bridged Network, as follows:
The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN) component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward. PDUs transmitted using this destination address, or any other addresses that appear in Table 8-1, Table 8-2, and Table 8-3 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can therefore travel no further than those stations that can be reached via a single individual LAN from the originating station.
The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an address that no conformant S-VLAN component, C-VLAN component, or MAC Bridge can forward; however, this address is relayed by a TPMR component. PDUs using this destination address, or any of the other addresses that appear in both Table 8-1 and Table 8-2 but not in Table 8-3 (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed by any TPMRs but will propagate no further than the nearest S-VLAN component, C-VLAN component, or MAC Bridge.
The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an address that no conformant C-VLAN component, MAC Bridge can forward; however, it is relayed by TPMR components and S-VLAN components. PDUs using this destination address, or any of the other addresses that appear in Table 8-1 but not in either Table 8-2 or Table 8-3 (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and S-VLAN components but will propagate no further than the nearest C-VLAN component or MAC Bridge.
Because the LLC Entity associated with each Bridge Port is provided via CPU port, we must not filter these frames but forward them to CPU port.
In a Bridge, the transmission Port is majorly decided by ingress and egress rules, FDB, and spanning tree Port State functions of the Forwarding Process. For link-local frames, only CPU port should be designated as destination port in the FDB, and the other functions of the Forwarding Process must not interfere with the decision of the transmission Port. We call this process trapping frames to CPU port.
Therefore, on the switch with CPU port architecture, link-local frames must be trapped to CPU port, and certain link-local frames received by a Port of a Bridge comprising a TPMR component or an S-VLAN component must be excluded from it.
A Bridge of the switch with CPU port architecture cannot comprise a Two-Port MAC Relay (TPMR) component as a TPMR component supports only a subset of the functionality of a MAC Bridge. A Bridge comprising two Ports (Management Port doesn't count) of this architecture will either function as a standard MAC Bridge or a standard VLAN Bridge.
Therefore, a Bridge of this architecture can only comprise S-VLAN components, C-VLAN components, or MAC Bridge components. Since there's no TPMR component, we don't need to relay PDUs using the destination addresses specified on the Nearest non-TPMR section, and the proportion of the Nearest Customer Bridge section where they must be relayed by TPMR components.
One option to trap link-local frames to CPU port is to add static FDB entries with CPU port designated as destination port. However, because that Independent VLAN Learning (IVL) is being used on every VID, each entry only applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC Bridge component or a C-VLAN component, there would have to be 16 times 4096 entries. This switch intellectual property can only hold a maximum of 2048 entries. Using this option, there also isn't a mechanism to prevent link-local frames from being discarded when the spanning tree Port State of the reception Port is discarding.
The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4 registers. Whilst this applies to every VID, it doesn't contain all of the reserved MAC addresses without affecting the remaining Standard Group MAC Addresses. The REV_UN frame tag utilised using the RGAC4 register covers the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF destination addresses which may be relayed by MAC Bridges or VLAN Bridges. The latter option provides better but not complete conformance.
This switch intellectual property also does not provide a mechanism to trap link-local frames with specific destination addresses to CPU port by Bridge, to conform to the filtering rules for the distinct Bridge components.
Therefore, regardless of the type of the Bridge component, link-local frames with these destination addresses will be trapped to CPU port:
01-80-C2-00-00-[00,01,02,03,0E]
In a Bridge comprising a MAC Bridge component or a C-VLAN component:
Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022:
01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]
In a Bridge comprising an S-VLAN component:
Link-local frames with these destination addresses will be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022:
01-80-C2-00-00-00
Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022:
01-80-C2-00-00-[04,05,06,07,08,09,0A]
Currently on this switch intellectual property, if the spanning tree Port State of the reception Port is discarding, link-local frames will be discarded.
To trap link-local frames regardless of the spanning tree Port State, make the switch regard them as Bridge Protocol Data Units (BPDUs). This switch intellectual property only lets the frames regarded as BPDUs bypass the spanning tree Port State function of the Forwarding Process.
With this change, the only remaining interference is the ingress rules. When the reception Port has no PVID assigned on software, VLAN-untagged frames won't be allowed in. There doesn't seem to be a mechanism on the switch intellectual property to have link-local frames bypass this function of the Forwarding Process.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Reviewed-by: Daniel Golle daniel@makrotopia.org Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 229 +++++++++++++++++++++++++++++++++------ drivers/net/dsa/mt7530.h | 5 + 2 files changed, 200 insertions(+), 34 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 40ae44c9945b1..22b97505fa536 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -998,20 +998,173 @@ static void mt7530_setup_port5(struct dsa_switch *ds, phy_interface_t interface) mutex_unlock(&priv->reg_mutex); }
-/* On page 205, section "8.6.3 Frame filtering" of the active standard, IEEE Std - * 802.1Q™-2022, it is stated that frames with 01:80:C2:00:00:00-0F as MAC DA - * must only be propagated to C-VLAN and MAC Bridge components. That means - * VLAN-aware and VLAN-unaware bridges. On the switch designs with CPU ports, - * these frames are supposed to be processed by the CPU (software). So we make - * the switch only forward them to the CPU port. And if received from a CPU - * port, forward to a single port. The software is responsible of making the - * switch conform to the latter by setting a single port as destination port on - * the special tag. +/* In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer (DLL) + * of the Open Systems Interconnection basic reference model (OSI/RM) are + * described; the medium access control (MAC) and logical link control (LLC) + * sublayers. The MAC sublayer is the one facing the physical layer. * - * This switch intellectual property cannot conform to this part of the standard - * fully. Whilst the REV_UN frame tag covers the remaining :04-0D and :0F MAC - * DAs, it also includes :22-FF which the scope of propagation is not supposed - * to be restricted for these MAC DAs. + * In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A + * Bridge component comprises a MAC Relay Entity for interconnecting the Ports + * of the Bridge, at least two Ports, and higher layer entities with at least a + * Spanning Tree Protocol Entity included. + * + * Each Bridge Port also functions as an end station and shall provide the MAC + * Service to an LLC Entity. Each instance of the MAC Service is provided to a + * distinct LLC Entity that supports protocol identification, multiplexing, and + * demultiplexing, for protocol data unit (PDU) transmission and reception by + * one or more higher layer entities. + * + * It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC + * Entity associated with each Bridge Port is modeled as being directly + * connected to the attached Local Area Network (LAN). + * + * On the switch with CPU port architecture, CPU port functions as Management + * Port, and the Management Port functionality is provided by software which + * functions as an end station. Software is connected to an IEEE 802 LAN that is + * wholly contained within the system that incorporates the Bridge. Software + * provides access to the LLC Entity associated with each Bridge Port by the + * value of the source port field on the special tag on the frame received by + * software. + * + * We call frames that carry control information to determine the active + * topology and current extent of each Virtual Local Area Network (VLAN), i.e., + * spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN Registration + * Protocol Data Units (MVRPDUs), and frames from other link constrained + * protocols, such as Extensible Authentication Protocol over LAN (EAPOL) and + * Link Layer Discovery Protocol (LLDP), link-local frames. They are not + * forwarded by a Bridge. Permanently configured entries in the filtering + * database (FDB) ensure that such frames are discarded by the Forwarding + * Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in detail: + * + * Each of the reserved MAC addresses specified in Table 8-1 + * (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be + * permanently configured in the FDB in C-VLAN components and ERs. + * + * Each of the reserved MAC addresses specified in Table 8-2 + * (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently + * configured in the FDB in S-VLAN components. + * + * Each of the reserved MAC addresses specified in Table 8-3 + * (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB in + * TPMR components. + * + * The FDB entries for reserved MAC addresses shall specify filtering for all + * Bridge Ports and all VIDs. Management shall not provide the capability to + * modify or remove entries for reserved MAC addresses. + * + * The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of + * propagation of PDUs within a Bridged Network, as follows: + * + * The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that no + * conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN) + * component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward. + * PDUs transmitted using this destination address, or any other addresses + * that appear in Table 8-1, Table 8-2, and Table 8-3 + * (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can + * therefore travel no further than those stations that can be reached via a + * single individual LAN from the originating station. + * + * The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an + * address that no conformant S-VLAN component, C-VLAN component, or MAC + * Bridge can forward; however, this address is relayed by a TPMR component. + * PDUs using this destination address, or any of the other addresses that + * appear in both Table 8-1 and Table 8-2 but not in Table 8-3 + * (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed by + * any TPMRs but will propagate no further than the nearest S-VLAN component, + * C-VLAN component, or MAC Bridge. + * + * The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an address + * that no conformant C-VLAN component, MAC Bridge can forward; however, it is + * relayed by TPMR components and S-VLAN components. PDUs using this + * destination address, or any of the other addresses that appear in Table 8-1 + * but not in either Table 8-2 or Table 8-3 (01-80-C2-00-00-[00,0B,0C,0D,0F]), + * will be relayed by TPMR components and S-VLAN components but will propagate + * no further than the nearest C-VLAN component or MAC Bridge. + * + * Because the LLC Entity associated with each Bridge Port is provided via CPU + * port, we must not filter these frames but forward them to CPU port. + * + * In a Bridge, the transmission Port is majorly decided by ingress and egress + * rules, FDB, and spanning tree Port State functions of the Forwarding Process. + * For link-local frames, only CPU port should be designated as destination port + * in the FDB, and the other functions of the Forwarding Process must not + * interfere with the decision of the transmission Port. We call this process + * trapping frames to CPU port. + * + * Therefore, on the switch with CPU port architecture, link-local frames must + * be trapped to CPU port, and certain link-local frames received by a Port of a + * Bridge comprising a TPMR component or an S-VLAN component must be excluded + * from it. + * + * A Bridge of the switch with CPU port architecture cannot comprise a Two-Port + * MAC Relay (TPMR) component as a TPMR component supports only a subset of the + * functionality of a MAC Bridge. A Bridge comprising two Ports (Management Port + * doesn't count) of this architecture will either function as a standard MAC + * Bridge or a standard VLAN Bridge. + * + * Therefore, a Bridge of this architecture can only comprise S-VLAN components, + * C-VLAN components, or MAC Bridge components. Since there's no TPMR component, + * we don't need to relay PDUs using the destination addresses specified on the + * Nearest non-TPMR section, and the proportion of the Nearest Customer Bridge + * section where they must be relayed by TPMR components. + * + * One option to trap link-local frames to CPU port is to add static FDB entries + * with CPU port designated as destination port. However, because that + * Independent VLAN Learning (IVL) is being used on every VID, each entry only + * applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC + * Bridge component or a C-VLAN component, there would have to be 16 times 4096 + * entries. This switch intellectual property can only hold a maximum of 2048 + * entries. Using this option, there also isn't a mechanism to prevent + * link-local frames from being discarded when the spanning tree Port State of + * the reception Port is discarding. + * + * The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4 + * registers. Whilst this applies to every VID, it doesn't contain all of the + * reserved MAC addresses without affecting the remaining Standard Group MAC + * Addresses. The REV_UN frame tag utilised using the RGAC4 register covers the + * remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination + * addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF + * destination addresses which may be relayed by MAC Bridges or VLAN Bridges. + * The latter option provides better but not complete conformance. + * + * This switch intellectual property also does not provide a mechanism to trap + * link-local frames with specific destination addresses to CPU port by Bridge, + * to conform to the filtering rules for the distinct Bridge components. + * + * Therefore, regardless of the type of the Bridge component, link-local frames + * with these destination addresses will be trapped to CPU port: + * + * 01-80-C2-00-00-[00,01,02,03,0E] + * + * In a Bridge comprising a MAC Bridge component or a C-VLAN component: + * + * Link-local frames with these destination addresses won't be trapped to CPU + * port which won't conform to IEEE Std 802.1Q-2022: + * + * 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] + * + * In a Bridge comprising an S-VLAN component: + * + * Link-local frames with these destination addresses will be trapped to CPU + * port which won't conform to IEEE Std 802.1Q-2022: + * + * 01-80-C2-00-00-00 + * + * Link-local frames with these destination addresses won't be trapped to CPU + * port which won't conform to IEEE Std 802.1Q-2022: + * + * 01-80-C2-00-00-[04,05,06,07,08,09,0A] + * + * To trap link-local frames to CPU port as conformant as this switch + * intellectual property can allow, link-local frames are made to be regarded as + * Bridge Protocol Data Units (BPDUs). This is because this switch intellectual + * property only lets the frames regarded as BPDUs bypass the spanning tree Port + * State function of the Forwarding Process. + * + * The only remaining interference is the ingress rules. When the reception Port + * has no PVID assigned on software, VLAN-untagged frames won't be allowed in. + * There doesn't seem to be a mechanism on the switch intellectual property to + * have link-local frames bypass this function of the Forwarding Process. */ static void mt753x_trap_frames(struct mt7530_priv *priv) @@ -1019,35 +1172,43 @@ mt753x_trap_frames(struct mt7530_priv *priv) /* Trap 802.1X PAE frames and BPDUs to the CPU port(s) and egress them * VLAN-untagged. */ - mt7530_rmw(priv, MT753X_BPC, MT753X_PAE_EG_TAG_MASK | - MT753X_PAE_PORT_FW_MASK | MT753X_BPDU_EG_TAG_MASK | - MT753X_BPDU_PORT_FW_MASK, - MT753X_PAE_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_PAE_PORT_FW(MT753X_BPDU_CPU_ONLY) | - MT753X_BPDU_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_BPDU_CPU_ONLY); + mt7530_rmw(priv, MT753X_BPC, + MT753X_PAE_BPDU_FR | MT753X_PAE_EG_TAG_MASK | + MT753X_PAE_PORT_FW_MASK | MT753X_BPDU_EG_TAG_MASK | + MT753X_BPDU_PORT_FW_MASK, + MT753X_PAE_BPDU_FR | + MT753X_PAE_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_PAE_PORT_FW(MT753X_BPDU_CPU_ONLY) | + MT753X_BPDU_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_BPDU_CPU_ONLY);
/* Trap frames with :01 and :02 MAC DAs to the CPU port(s) and egress * them VLAN-untagged. */ - mt7530_rmw(priv, MT753X_RGAC1, MT753X_R02_EG_TAG_MASK | - MT753X_R02_PORT_FW_MASK | MT753X_R01_EG_TAG_MASK | - MT753X_R01_PORT_FW_MASK, - MT753X_R02_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_R02_PORT_FW(MT753X_BPDU_CPU_ONLY) | - MT753X_R01_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_BPDU_CPU_ONLY); + mt7530_rmw(priv, MT753X_RGAC1, + MT753X_R02_BPDU_FR | MT753X_R02_EG_TAG_MASK | + MT753X_R02_PORT_FW_MASK | MT753X_R01_BPDU_FR | + MT753X_R01_EG_TAG_MASK | MT753X_R01_PORT_FW_MASK, + MT753X_R02_BPDU_FR | + MT753X_R02_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_R02_PORT_FW(MT753X_BPDU_CPU_ONLY) | + MT753X_R01_BPDU_FR | + MT753X_R01_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_BPDU_CPU_ONLY);
/* Trap frames with :03 and :0E MAC DAs to the CPU port(s) and egress * them VLAN-untagged. */ - mt7530_rmw(priv, MT753X_RGAC2, MT753X_R0E_EG_TAG_MASK | - MT753X_R0E_PORT_FW_MASK | MT753X_R03_EG_TAG_MASK | - MT753X_R03_PORT_FW_MASK, - MT753X_R0E_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY) | - MT753X_R03_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | - MT753X_BPDU_CPU_ONLY); + mt7530_rmw(priv, MT753X_RGAC2, + MT753X_R0E_BPDU_FR | MT753X_R0E_EG_TAG_MASK | + MT753X_R0E_PORT_FW_MASK | MT753X_R03_BPDU_FR | + MT753X_R03_EG_TAG_MASK | MT753X_R03_PORT_FW_MASK, + MT753X_R0E_BPDU_FR | + MT753X_R0E_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY) | + MT753X_R03_BPDU_FR | + MT753X_R03_EG_TAG(MT7530_VLAN_EG_UNTAGGED) | + MT753X_BPDU_CPU_ONLY); }
static int diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index 75bc9043c8c0a..ddefeb69afda1 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -65,6 +65,7 @@ enum mt753x_id {
/* Registers for BPDU and PAE frame control*/ #define MT753X_BPC 0x24 +#define MT753X_PAE_BPDU_FR BIT(25) #define MT753X_PAE_EG_TAG_MASK GENMASK(24, 22) #define MT753X_PAE_EG_TAG(x) FIELD_PREP(MT753X_PAE_EG_TAG_MASK, x) #define MT753X_PAE_PORT_FW_MASK GENMASK(18, 16) @@ -75,20 +76,24 @@ enum mt753x_id {
/* Register for :01 and :02 MAC DA frame control */ #define MT753X_RGAC1 0x28 +#define MT753X_R02_BPDU_FR BIT(25) #define MT753X_R02_EG_TAG_MASK GENMASK(24, 22) #define MT753X_R02_EG_TAG(x) FIELD_PREP(MT753X_R02_EG_TAG_MASK, x) #define MT753X_R02_PORT_FW_MASK GENMASK(18, 16) #define MT753X_R02_PORT_FW(x) FIELD_PREP(MT753X_R02_PORT_FW_MASK, x) +#define MT753X_R01_BPDU_FR BIT(9) #define MT753X_R01_EG_TAG_MASK GENMASK(8, 6) #define MT753X_R01_EG_TAG(x) FIELD_PREP(MT753X_R01_EG_TAG_MASK, x) #define MT753X_R01_PORT_FW_MASK GENMASK(2, 0)
/* Register for :03 and :0E MAC DA frame control */ #define MT753X_RGAC2 0x2c +#define MT753X_R0E_BPDU_FR BIT(25) #define MT753X_R0E_EG_TAG_MASK GENMASK(24, 22) #define MT753X_R0E_EG_TAG(x) FIELD_PREP(MT753X_R0E_EG_TAG_MASK, x) #define MT753X_R0E_PORT_FW_MASK GENMASK(18, 16) #define MT753X_R0E_PORT_FW(x) FIELD_PREP(MT753X_R0E_PORT_FW_MASK, x) +#define MT753X_R03_BPDU_FR BIT(9) #define MT753X_R03_EG_TAG_MASK GENMASK(8, 6) #define MT753X_R03_EG_TAG(x) FIELD_PREP(MT753X_R03_EG_TAG_MASK, x) #define MT753X_R03_PORT_FW_MASK GENMASK(2, 0)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]
When touching unix_sk(sk)->inflight, we are always under spin_lock(&unix_gc_lock).
Let's convert unix_sk(sk)->inflight to the normal unsigned long.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/af_unix.h | 2 +- net/unix/af_unix.c | 4 ++-- net/unix/garbage.c | 17 ++++++++--------- net/unix/scm.c | 8 +++++--- 4 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index afd40dce40f3d..d1b07ddbe677e 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -55,7 +55,7 @@ struct unix_sock { struct mutex iolock, bindlock; struct sock *peer; struct list_head link; - atomic_long_t inflight; + unsigned long inflight; spinlock_t lock; unsigned long gc_flags; #define UNIX_GC_CANDIDATE 0 diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 484874872fa6f..e37cf913818a1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -980,11 +980,11 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, sk->sk_write_space = unix_write_space; sk->sk_max_ack_backlog = net->unx.sysctl_max_dgram_qlen; sk->sk_destruct = unix_sock_destructor; - u = unix_sk(sk); + u = unix_sk(sk); + u->inflight = 0; u->path.dentry = NULL; u->path.mnt = NULL; spin_lock_init(&u->lock); - atomic_long_set(&u->inflight, 0); INIT_LIST_HEAD(&u->link); mutex_init(&u->iolock); /* single task reading lock */ mutex_init(&u->bindlock); /* single task binding lock */ diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 027c86e804f8a..aea222796dfdc 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -166,17 +166,18 @@ static void scan_children(struct sock *x, void (*func)(struct unix_sock *),
static void dec_inflight(struct unix_sock *usk) { - atomic_long_dec(&usk->inflight); + usk->inflight--; }
static void inc_inflight(struct unix_sock *usk) { - atomic_long_inc(&usk->inflight); + usk->inflight++; }
static void inc_inflight_move_tail(struct unix_sock *u) { - atomic_long_inc(&u->inflight); + u->inflight++; + /* If this still might be part of a cycle, move it to the end * of the list, so that it's checked even if it was already * passed over @@ -237,14 +238,12 @@ void unix_gc(void) */ list_for_each_entry_safe(u, next, &gc_inflight_list, link) { long total_refs; - long inflight_refs;
total_refs = file_count(u->sk.sk_socket->file); - inflight_refs = atomic_long_read(&u->inflight);
- BUG_ON(inflight_refs < 1); - BUG_ON(total_refs < inflight_refs); - if (total_refs == inflight_refs) { + BUG_ON(!u->inflight); + BUG_ON(total_refs < u->inflight); + if (total_refs == u->inflight) { list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); @@ -271,7 +270,7 @@ void unix_gc(void) /* Move cursor to after the current position. */ list_move(&cursor, &u->link);
- if (atomic_long_read(&u->inflight) > 0) { + if (u->inflight) { list_move_tail(&u->link, ¬_cycle_list); __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); scan_children(&u->sk, inc_inflight_move_tail, NULL); diff --git a/net/unix/scm.c b/net/unix/scm.c index 822ce0d0d7915..e92f2fad64105 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -53,12 +53,13 @@ void unix_inflight(struct user_struct *user, struct file *fp) if (s) { struct unix_sock *u = unix_sk(s);
- if (atomic_long_inc_return(&u->inflight) == 1) { + if (!u->inflight) { BUG_ON(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); } else { BUG_ON(list_empty(&u->link)); } + u->inflight++; /* Paired with READ_ONCE() in wait_for_unix_gc() */ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); } @@ -75,10 +76,11 @@ void unix_notinflight(struct user_struct *user, struct file *fp) if (s) { struct unix_sock *u = unix_sk(s);
- BUG_ON(!atomic_long_read(&u->inflight)); + BUG_ON(!u->inflight); BUG_ON(list_empty(&u->link));
- if (atomic_long_dec_and_test(&u->inflight)) + u->inflight--; + if (!u->inflight) list_del_init(&u->link); /* Paired with READ_ONCE() in wait_for_unix_gc() */ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]
Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list.
sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped
connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- -----------
NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0
NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V])
// V became in-flight // V count=2 inflight=1
close(V)
// V count=1 inflight=1 // GC candidate condition met
for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates
// gc_candidates={L, V}
for u in gc_candidates: scan_children(u, dec_inflight)
// embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail)
// V count=1 inflight=2 (!)
If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another embryo coming, it can not possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen because unix_gc_lock is already taken. Inflight graph remains unaffected.
Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.") Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/garbage.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index aea222796dfdc..8734c0c1fc197 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -235,11 +235,22 @@ void unix_gc(void) * receive queues. Other, non candidate sockets _can_ be * added to queue, so we must make sure only to touch * candidates. + * + * Embryos, though never candidates themselves, affect which + * candidates are reachable by the garbage collector. Before + * being added to a listener's queue, an embryo may already + * receive data carrying SCM_RIGHTS, potentially making the + * passed socket a candidate that is not yet reachable by the + * collector. It becomes reachable once the embryo is + * enqueued. Therefore, we must ensure that no SCM-laden + * embryo appears in a (candidate) listener's queue between + * consecutive scan_children() calls. */ list_for_each_entry_safe(u, next, &gc_inflight_list, link) { + struct sock *sk = &u->sk; long total_refs;
- total_refs = file_count(u->sk.sk_socket->file); + total_refs = file_count(sk->sk_socket->file);
BUG_ON(!u->inflight); BUG_ON(total_refs < u->inflight); @@ -247,6 +258,11 @@ void unix_gc(void) list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); + + if (sk->sk_state == TCP_LISTEN) { + unix_state_lock(sk); + unix_state_unlock(sk); + } } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Arinzon darinzon@amazon.com
[ Upstream commit 713a85195aad25d8a26786a37b674e3e5ec09e3c ]
Small unsigned types are promoted to larger signed types in the case of multiplication, the result of which may overflow. In case the result of such a multiplication has its MSB turned on, it will be sign extended with '1's. This changes the multiplication result.
Code example of the phenomenon: ------------------------------- u16 x, y; size_t z1, z2;
x = y = 0xffff; printk("x=%x y=%x\n",x,y);
z1 = x*y; z2 = (size_t)x*y;
printk("z1=%lx z2=%lx\n", z1, z2);
Output: ------- x=ffff y=ffff z1=fffffffffffe0001 z2=fffe0001
The expected result of ffff*ffff is fffe0001, and without the explicit casting to avoid the unwanted sign extension we got fffffffffffe0001.
This commit adds an explicit casting to avoid the sign extension issue.
Fixes: 689b2bdaaa14 ("net: ena: add functions for handling Low Latency Queues in ena_com") Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 633b321d7fdd9..4db689372980e 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -362,7 +362,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev, ENA_COM_BOUNCE_BUFFER_CNTRL_CNT; io_sq->bounce_buf_ctrl.next_to_use = 0;
- size = io_sq->bounce_buf_ctrl.buffer_size * + size = (size_t)io_sq->bounce_buf_ctrl.buffer_size * io_sq->bounce_buf_ctrl.buffers_num;
dev_node = dev_to_node(ena_dev->dmadev);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Arinzon darinzon@amazon.com
[ Upstream commit f7e417180665234fdb7af2ebe33d89aaa434d16f ]
Missing IO completions check is called every second (HZ jiffies). This commit fixes several issues with this check:
1. Duplicate queues check: Max of 4 queues are scanned on each check due to monitor budget. Once reaching the budget, this check exits under the assumption that the next check will continue to scan the remainder of the queues, but in practice, next check will first scan the last already scanned queue which is not necessary and may cause the full queue scan to last a couple of seconds longer. The fix is to start every check with the next queue to scan. For example, on 8 IO queues: Bug: [0,1,2,3], [3,4,5,6], [6,7] Fix: [0,1,2,3], [4,5,6,7]
2. Unbalanced queues check: In case the number of active IO queues is not a multiple of budget, there will be checks which don't utilize the full budget because the full scan exits when reaching the last queue id. The fix is to run every TX completion check with exact queue budget regardless of the queue id. For example, on 7 IO queues: Bug: [0,1,2,3], [4,5,6], [0,1,2,3] Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4] The budget may be lowered in case the number of IO queues is less than the budget (4) to make sure there are no duplicate queues on the same check. For example, on 3 IO queues: Bug: [0,1,2,0], [1,2,0,1] Fix: [0,1,2], [0,1,2]
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Amit Bernstein amitbern@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 +++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 5482015411f2f..1835581cd7230 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -3421,10 +3421,11 @@ static void check_for_missing_completions(struct ena_adapter *adapter) { struct ena_ring *tx_ring; struct ena_ring *rx_ring; - int i, budget, rc; + int qid, budget, rc; int io_queue_count;
io_queue_count = adapter->xdp_num_queues + adapter->num_io_queues; + /* Make sure the driver doesn't turn the device in other process */ smp_rmb();
@@ -3437,27 +3438,29 @@ static void check_for_missing_completions(struct ena_adapter *adapter) if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT) return;
- budget = ENA_MONITORED_TX_QUEUES; + budget = min_t(u32, io_queue_count, ENA_MONITORED_TX_QUEUES);
- for (i = adapter->last_monitored_tx_qid; i < io_queue_count; i++) { - tx_ring = &adapter->tx_ring[i]; - rx_ring = &adapter->rx_ring[i]; + qid = adapter->last_monitored_tx_qid; + + while (budget) { + qid = (qid + 1) % io_queue_count; + + tx_ring = &adapter->tx_ring[qid]; + rx_ring = &adapter->rx_ring[qid];
rc = check_missing_comp_in_tx_queue(adapter, tx_ring); if (unlikely(rc)) return;
- rc = !ENA_IS_XDP_INDEX(adapter, i) ? + rc = !ENA_IS_XDP_INDEX(adapter, qid) ? check_for_rx_interrupt_queue(adapter, rx_ring) : 0; if (unlikely(rc)) return;
budget--; - if (!budget) - break; }
- adapter->last_monitored_tx_qid = i % io_queue_count; + adapter->last_monitored_tx_qid = qid; }
/* trigger napi schedule after 2 consecutive detections */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Arinzon darinzon@amazon.com
[ Upstream commit bf02d9fe00632d22fa91d34749c7aacf397b6cde ]
ENA has two types of TX queues: - queues which only process TX packets arriving from the network stack - queues which only process TX packets forwarded to it by XDP_REDIRECT or XDP_TX instructions
The ena_free_tx_bufs() cycles through all descriptors in a TX queue and unmaps + frees every descriptor that hasn't been acknowledged yet by the device (uncompleted TX transactions). The function assumes that the processed TX queue is necessarily from the first category listed above and ends up using napi_consume_skb() for descriptors belonging to an XDP specific queue.
This patch solves a bug in which, in case of a VF reset, the descriptors aren't freed correctly, leading to crashes.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 1835581cd7230..95ed32542edfe 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -696,8 +696,11 @@ void ena_unmap_tx_buff(struct ena_ring *tx_ring, static void ena_free_tx_bufs(struct ena_ring *tx_ring) { bool print_once = true; + bool is_xdp_ring; u32 i;
+ is_xdp_ring = ENA_IS_XDP_INDEX(tx_ring->adapter, tx_ring->qid); + for (i = 0; i < tx_ring->ring_size; i++) { struct ena_tx_buffer *tx_info = &tx_ring->tx_buffer_info[i];
@@ -717,10 +720,15 @@ static void ena_free_tx_bufs(struct ena_ring *tx_ring)
ena_unmap_tx_buff(tx_ring, tx_info);
- dev_kfree_skb_any(tx_info->skb); + if (is_xdp_ring) + xdp_return_frame(tx_info->xdpf); + else + dev_kfree_skb_any(tx_info->skb); } - netdev_tx_reset_queue(netdev_get_tx_queue(tx_ring->netdev, - tx_ring->qid)); + + if (!is_xdp_ring) + netdev_tx_reset_queue(netdev_get_tx_queue(tx_ring->netdev, + tx_ring->qid)); }
static void ena_free_all_tx_bufs(struct ena_adapter *adapter)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Arinzon darinzon@amazon.com
[ Upstream commit 36a1ca01f0452f2549420e7279c2588729bd94df ]
The patch mentioned in the `Fixes` tag removed the explicit assignment of tx_info->xdpf to NULL with the justification that there's no need to set tx_info->xdpf to NULL and tx_info->num_of_bufs to 0 in case of a mapping error. Both values won't be used once the mapping function returns an error, and their values would be overridden by the next transmitted packet.
While both values do indeed get overridden in the next transmission call, the value of tx_info->xdpf is also used to check whether a TX descriptor's transmission has been completed (i.e. a completion for it was polled).
An example scenario: 1. Mapping failed, tx_info->xdpf wasn't set to NULL 2. A VF reset occurred leading to IO resource destruction and a call to ena_free_tx_bufs() function 3. Although the descriptor whose mapping failed was freed by the transmission function, it still passes the check if (!tx_info->skb)
(skb and xdp_frame are in a union) 4. The xdp_frame associated with the descriptor is freed twice
This patch returns the assignment of NULL to tx_info->xdpf to make the cleaning function knows that the descriptor is already freed.
Fixes: 504fd6a5390c ("net: ena: fix DMA mapping function issues in XDP") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_xdp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_xdp.c b/drivers/net/ethernet/amazon/ena/ena_xdp.c index fc1c4ef73ba32..34d73c72f7803 100644 --- a/drivers/net/ethernet/amazon/ena/ena_xdp.c +++ b/drivers/net/ethernet/amazon/ena/ena_xdp.c @@ -89,7 +89,7 @@ int ena_xdp_xmit_frame(struct ena_ring *tx_ring,
rc = ena_xdp_tx_map_frame(tx_ring, tx_info, xdpf, &ena_tx_ctx); if (unlikely(rc)) - return rc; + goto err;
ena_tx_ctx.req_id = req_id;
@@ -112,7 +112,9 @@ int ena_xdp_xmit_frame(struct ena_ring *tx_ring,
error_unmap_dma: ena_unmap_tx_buff(tx_ring, tx_info); +err: tx_info->xdpf = NULL; + return rc; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lucas De Marchi lucas.demarchi@intel.com
[ Upstream commit 50a9b7fc151e67b9e642232d32e8c5a5ac13e64a ]
All of these mutexes are already initialized by the display side since commit 3fef3e6ff86a ("drm/i915: move display mutex inits to display code"), so the xe shouldn´t initialize them.
Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Arun R Murthy arun.r.murthy@intel.com Reviewed-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240405200711.2041428-1-lucas... Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com (cherry picked from commit 117de185edf2c5767f03575219bf7a43b161ff0d) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_display.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c index e4db069f0db3f..6ec375c1c4b6c 100644 --- a/drivers/gpu/drm/xe/xe_display.c +++ b/drivers/gpu/drm/xe/xe_display.c @@ -108,11 +108,6 @@ int xe_display_create(struct xe_device *xe) xe->display.hotplug.dp_wq = alloc_ordered_workqueue("xe-dp", 0);
drmm_mutex_init(&xe->drm, &xe->sb_lock); - drmm_mutex_init(&xe->drm, &xe->display.backlight.lock); - drmm_mutex_init(&xe->drm, &xe->display.audio.mutex); - drmm_mutex_init(&xe->drm, &xe->display.wm.wm_mutex); - drmm_mutex_init(&xe->drm, &xe->display.pps.mutex); - drmm_mutex_init(&xe->drm, &xe->display.hdcp.hdcp_mutex); xe->enabled_irq_mask = ~0;
err = drmm_add_action_or_reset(&xe->drm, display_destroy, NULL);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karthik Poosa karthik.poosa@intel.com
[ Upstream commit a8ad8715472bb8f6a2ea8b4072a28151eb9f4f24 ]
Address potential overflow in result of left shift of a lower precision (u32) operand before assignment to higher precision (u64) variable.
v2: - Update commit message. (Himal)
Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval") Signed-off-by: Karthik Poosa karthik.poosa@intel.com Reviewed-by: Anshuman Gupta anshuman.gupta@intel.com Cc: Badal Nilawar badal.nilawar@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240405130127.1392426-5-karth... Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com (cherry picked from commit 883232b47b81108b0252197c747f396ecd51455a) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_hwmon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c index 174ed2185481e..a6f43446c779a 100644 --- a/drivers/gpu/drm/xe/xe_hwmon.c +++ b/drivers/gpu/drm/xe/xe_hwmon.c @@ -288,7 +288,7 @@ xe_hwmon_power1_max_interval_show(struct device *dev, struct device_attribute *a * As y can be < 2, we compute tau4 = (4 | x) << y * and then add 2 when doing the final right shift to account for units */ - tau4 = ((1 << x_w) | x) << y; + tau4 = (u64)((1 << x_w) | x) << y;
/* val in hwmon interface units (millisec) */ out = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w); @@ -328,7 +328,7 @@ xe_hwmon_power1_max_interval_store(struct device *dev, struct device_attribute * r = FIELD_PREP(PKG_MAX_WIN, PKG_MAX_WIN_DEFAULT); x = REG_FIELD_GET(PKG_MAX_WIN_X, r); y = REG_FIELD_GET(PKG_MAX_WIN_Y, r); - tau4 = ((1 << x_w) | x) << y; + tau4 = (u64)((1 << x_w) | x) << y; max_win = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w);
if (val > max_win)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 5281ec83454d70d98b71f1836fb16512566c01cd ]
When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the unused ftrace_event_id_fops variable:
kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=] 2155 | static const struct file_operations ftrace_event_id_fops = {
Hide this in the same #ifdef as the reference to it.
Link: https://lore.kernel.org/linux-trace-kernel/20240403080702.3509288-7-arnd@ker...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Oleg Nesterov oleg@redhat.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Zheng Yejian zhengyejian1@huawei.com Cc: Kees Cook keescook@chromium.org Cc: Ajay Kaher akaher@vmware.com Cc: Jinjie Ruan ruanjinjie@huawei.com Cc: Clément Léger cleger@rivosinc.com Cc: Dan Carpenter dan.carpenter@linaro.org Cc: "Tzvetomir Stoyanov (VMware)" tz.stoyanov@gmail.com Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_events.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 7c364b87352ee..52f75c36bbca4 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1670,6 +1670,7 @@ static int trace_format_open(struct inode *inode, struct file *file) return 0; }
+#ifdef CONFIG_PERF_EVENTS static ssize_t event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) { @@ -1684,6 +1685,7 @@ event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
return simple_read_from_buffer(ubuf, cnt, ppos, buf, len); } +#endif
static ssize_t event_filter_read(struct file *filp, char __user *ubuf, size_t cnt, @@ -2152,10 +2154,12 @@ static const struct file_operations ftrace_event_format_fops = { .release = seq_release, };
+#ifdef CONFIG_PERF_EVENTS static const struct file_operations ftrace_event_id_fops = { .read = event_id_read, .llseek = default_llseek, }; +#endif
static const struct file_operations ftrace_event_filter_fops = { .open = tracing_open_file_tr,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xuchun Shang xuchun.shang@linux.alibaba.com
[ Upstream commit 5b3625a4f6422e8982f90f0c11b5546149c962b8 ]
The commit "iommu/vt-d: Add IOMMU perfmon support" introduce IOMMU PMU feature, but use the wrong config when set pasid filter.
Fixes: 7232ab8b89e9 ("iommu/vt-d: Add IOMMU perfmon support") Signed-off-by: Xuchun Shang xuchun.shang@linux.alibaba.com Reviewed-by: Kan Liang kan.liang@linux.intel.com Link: https://lore.kernel.org/r/20240401060753.3321318-1-xuchun.shang@linux.alibab... Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/perfmon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c index cf43e798eca49..44083d01852db 100644 --- a/drivers/iommu/intel/perfmon.c +++ b/drivers/iommu/intel/perfmon.c @@ -438,7 +438,7 @@ static int iommu_pmu_assign_event(struct iommu_pmu *iommu_pmu, iommu_pmu_set_filter(domain, event->attr.config1, IOMMU_PMU_FILTER_DOMAIN, idx, event->attr.config1); - iommu_pmu_set_filter(pasid, event->attr.config1, + iommu_pmu_set_filter(pasid, event->attr.config2, IOMMU_PMU_FILTER_PASID, idx, event->attr.config1); iommu_pmu_set_filter(ats, event->attr.config2,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Pan jacob.jun.pan@linux.intel.com
[ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ]
The page request queue is per IOMMU, its allocation should be made NUMA-aware for performance reasons.
Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling") Signed-off-by: Jacob Pan jacob.jun.pan@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.... Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index ec47ec81f0ecd..4d269df0082fb 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -67,7 +67,7 @@ int intel_svm_enable_prq(struct intel_iommu *iommu) struct page *pages; int irq, ret;
- pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, PRQ_ORDER); + pages = alloc_pages_node(iommu->node, GFP_KERNEL | __GFP_ZERO, PRQ_ORDER); if (!pages) { pr_warn("IOMMU: %s: Failed to allocate page request queue\n", iommu->name);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Baolu baolu.lu@linux.intel.com
[ Upstream commit 89436f4f54125b1297aec1f466efd8acb4ec613d ]
Commit 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices") adds all devices probed by the iommu driver in a rbtree indexed by the source ID of each device. It assumes that each device has a unique source ID. This assumption is incorrect and the VT-d spec doesn't state this requirement either.
The reason for using a rbtree to track devices is to look up the device with PCI bus and devfunc in the paths of handling ATS invalidation time out error and the PRI I/O page faults. Both are PCI ATS feature related.
Only track the devices that have PCI ATS capabilities in the rbtree to avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some platforms below kernel splat will be displayed and the iommu probe results in failure.
WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90 Call Trace: <TASK> ? __warn+0x7e/0x180 ? intel_iommu_probe_device+0x319/0xd90 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? intel_iommu_probe_device+0x319/0xd90 ? debug_mutex_init+0x37/0x50 __iommu_probe_device+0xf2/0x4f0 iommu_probe_device+0x22/0x70 iommu_bus_notifier+0x1e/0x40 notifier_call_chain+0x46/0x150 blocking_notifier_call_chain+0x42/0x60 bus_notify+0x2f/0x50 device_add+0x5ed/0x7e0 platform_device_add+0xf5/0x240 mfd_add_devices+0x3f9/0x500 ? preempt_count_add+0x4c/0xa0 ? up_write+0xa2/0x1b0 ? __debugfs_create_file+0xe3/0x150 intel_lpss_probe+0x49f/0x5b0 ? pci_conf1_write+0xa3/0xf0 intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci] pci_device_probe+0x95/0x120 really_probe+0xd9/0x370 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x73/0x150 driver_probe_device+0x19/0xa0 __driver_attach+0xb6/0x180 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x77/0xd0 bus_add_driver+0x114/0x210 driver_register+0x5b/0x110 ? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci] do_one_initcall+0x57/0x2b0 ? kmalloc_trace+0x21e/0x280 ? do_init_module+0x1e/0x210 do_init_module+0x5f/0x210 load_module+0x1d37/0x1fc0 ? init_module_from_file+0x86/0xd0 init_module_from_file+0x86/0xd0 idempotent_init_module+0x17c/0x230 __x64_sys_finit_module+0x56/0xb0 do_syscall_64+0x6e/0x140 entry_SYSCALL_64_after_hwframe+0x71/0x79
Fixes: 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689 Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/iommu.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5dba58f322f03..d7e10f1311aaf 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4381,9 +4381,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) }
dev_iommu_priv_set(dev, info); - ret = device_rbtree_insert(iommu, info); - if (ret) - goto free; + if (pdev && pci_ats_supported(pdev)) { + ret = device_rbtree_insert(iommu, info); + if (ret) + goto free; + }
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) { ret = intel_pasid_alloc_table(dev); @@ -4410,7 +4412,8 @@ static void intel_iommu_release_device(struct device *dev) struct intel_iommu *iommu = info->iommu;
mutex_lock(&iommu->iopf_lock); - device_rbtree_remove(info); + if (dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev))) + device_rbtree_remove(info); mutex_unlock(&iommu->iopf_lock);
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
Commit e0e4ab52d17096d96c21a6805ccd424b283c3c6d upstream.
We disallow DEFER_TASKRUN multishots from running by io-wq, which is checked by individual opcodes in the issue path. We can consolidate all it in io_wq_submit_work() at the same time moving the checks out of the hot path.
Suggested-by: Jens Axboe axboe@kernel.dk Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.170990572... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 20 ++++++++++++++++++++ io_uring/net.c | 21 --------------------- io_uring/rw.c | 2 -- 3 files changed, 20 insertions(+), 23 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -949,6 +949,8 @@ bool io_fill_cqe_req_aux(struct io_kiocb u64 user_data = req->cqe.user_data; struct io_uring_cqe *cqe;
+ lockdep_assert(!io_wq_current_is_worker()); + if (!defer) return __io_post_aux_cqe(ctx, user_data, res, cflags, false);
@@ -1950,6 +1952,24 @@ fail: goto fail; }
+ /* + * If DEFER_TASKRUN is set, it's only allowed to post CQEs from the + * submitter task context. Final request completions are handed to the + * right context, however this is not the case of auxiliary CQEs, + * which is the main mean of operation for multishot requests. + * Don't allow any multishot execution from io-wq. It's more restrictive + * than necessary and also cleaner. + */ + if (req->flags & REQ_F_APOLL_MULTISHOT) { + err = -EBADFD; + if (!file_can_poll(req->file)) + goto fail; + err = -ECANCELED; + if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK) + goto fail; + return; + } + if (req->flags & REQ_F_FORCE_ASYNC) { bool opcode_poll = def->pollin || def->pollout;
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -78,19 +78,6 @@ struct io_sr_msg { */ #define MULTISHOT_MAX_RETRY 32
-static inline bool io_check_multishot(struct io_kiocb *req, - unsigned int issue_flags) -{ - /* - * When ->locked_cq is set we only allow to post CQEs from the original - * task context. Usual request completions will be handled in other - * generic paths but multipoll may decide to post extra cqes. - */ - return !(issue_flags & IO_URING_F_IOWQ) || - !(req->flags & REQ_F_APOLL_MULTISHOT) || - !req->ctx->task_complete; -} - int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown); @@ -837,9 +824,6 @@ int io_recvmsg(struct io_kiocb *req, uns (sr->flags & IORING_RECVSEND_POLL_FIRST)) return io_setup_async_msg(req, kmsg, issue_flags);
- if (!io_check_multishot(req, issue_flags)) - return io_setup_async_msg(req, kmsg, issue_flags); - retry_multishot: if (io_do_buffer_select(req)) { void __user *buf; @@ -935,9 +919,6 @@ int io_recv(struct io_kiocb *req, unsign (sr->flags & IORING_RECVSEND_POLL_FIRST)) return -EAGAIN;
- if (!io_check_multishot(req, issue_flags)) - return -EAGAIN; - sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; @@ -1386,8 +1367,6 @@ int io_accept(struct io_kiocb *req, unsi struct file *file; int ret, fd;
- if (!io_check_multishot(req, issue_flags)) - return -EAGAIN; retry: if (!fixed) { fd = __get_unused_fd_flags(accept->flags, accept->nofile); --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -932,8 +932,6 @@ int io_read_mshot(struct io_kiocb *req, */ if (!file_can_poll(req->file)) return -EBADFD; - if (issue_flags & IO_URING_F_IOWQ) - return -EAGAIN;
ret = __io_read(req, issue_flags);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
Commit bee1d5becdf5bf23d4ca0cd9c6b60bdf3c61d72b upstream.
Do the same check for direct io-wq execution for multishot requests that commit 2a975d426c82 did for the inline execution, and disable multishot mode (and revert to single shot) if the file type doesn't support NOWAIT, and isn't opened in O_NONBLOCK mode. For multishot to work properly, it's a requirement that nonblocking read attempts can be done.
Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1964,10 +1964,15 @@ fail: err = -EBADFD; if (!file_can_poll(req->file)) goto fail; - err = -ECANCELED; - if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK) - goto fail; - return; + if (req->file->f_flags & O_NONBLOCK || + req->file->f_mode & FMODE_NOWAIT) { + err = -ECANCELED; + if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK) + goto fail; + return; + } else { + req->flags &= ~REQ_F_APOLL_MULTISHOT; + } }
if (req->flags & REQ_F_FORCE_ASYNC) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit 141fb8cd206ace23c02cd2791c6da52c1d77d42a upstream.
We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and pertrans reservations for subvolumes when quotas are enabled. The convert function does not properly increment pertrans after decrementing prealloc, so the count is not accurate.
Note: we check that the fs is not read-only to mirror the logic in qgroup_convert_meta, which checks that before adding to the pertrans rsv.
Fixes: 8287475a2055 ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/qgroup.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -4432,6 +4432,8 @@ void btrfs_qgroup_convert_reserved_meta( BTRFS_QGROUP_RSV_META_PREALLOC); trace_qgroup_meta_convert(root, num_bytes); qgroup_convert_meta(fs_info, root->root_key.objectid, num_bytes); + if (!sb_rdonly(fs_info->sb)) + add_root_meta_rsv(root, num_bytes, BTRFS_QGROUP_RSV_META_PERTRANS); }
/*
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit 74e97958121aa1f5854da6effba70143f051b0cd upstream.
Create subvolume, create snapshot and delete subvolume all use btrfs_subvolume_reserve_metadata() to reserve metadata for the changes done to the parent subvolume's fs tree, which cannot be mediated in the normal way via start_transaction. When quota groups (squota or qgroups) are enabled, this reserves qgroup metadata of type PREALLOC. Once the operation is associated to a transaction, we convert PREALLOC to PERTRANS, which gets cleared in bulk at the end of the transaction.
However, the error paths of these three operations were not implementing this lifecycle correctly. They unconditionally converted the PREALLOC to PERTRANS in a generic cleanup step regardless of errors or whether the operation was fully associated to a transaction or not. This resulted in error paths occasionally converting this rsv to PERTRANS without calling record_root_in_trans successfully, which meant that unless that root got recorded in the transaction by some other thread, the end of the transaction would not free that root's PERTRANS, leaking it. Ultimately, this resulted in hitting a WARN in CONFIG_BTRFS_DEBUG builds at unmount for the leaked reservation.
The fix is to ensure that every qgroup PREALLOC reservation observes the following properties:
1. any failure before record_root_in_trans is called successfully results in freeing the PREALLOC reservation. 2. after record_root_in_trans, we convert to PERTRANS, and now the transaction owns freeing the reservation.
This patch enforces those properties on the three operations. Without it, generic/269 with squotas enabled at mkfs time would fail in ~5-10 runs on my system. With this patch, it ran successfully 1000 times in a row.
Fixes: e85fde5162bf ("btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 13 ++++++++++++- fs/btrfs/ioctl.c | 37 ++++++++++++++++++++++++++++--------- fs/btrfs/root-tree.c | 10 ---------- fs/btrfs/root-tree.h | 2 -- 4 files changed, 40 insertions(+), 22 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4476,6 +4476,7 @@ int btrfs_delete_subvolume(struct btrfs_ struct btrfs_trans_handle *trans; struct btrfs_block_rsv block_rsv; u64 root_flags; + u64 qgroup_reserved = 0; int ret;
down_write(&fs_info->subvol_sem); @@ -4520,12 +4521,20 @@ int btrfs_delete_subvolume(struct btrfs_ ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 5, true); if (ret) goto out_undead; + qgroup_reserved = block_rsv.qgroup_rsv_reserved;
trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto out_release; } + ret = btrfs_record_root_in_trans(trans, root); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto out_end_trans; + } + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); + qgroup_reserved = 0; trans->block_rsv = &block_rsv; trans->bytes_reserved = block_rsv.size;
@@ -4584,7 +4593,9 @@ out_end_trans: ret = btrfs_end_transaction(trans); inode->i_flags |= S_DEAD; out_release: - btrfs_subvolume_release_metadata(root, &block_rsv); + btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL); + if (qgroup_reserved) + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); out_undead: if (ret) { spin_lock(&dest->root_item_lock); --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -603,6 +603,7 @@ static noinline int create_subvol(struct int ret; dev_t anon_dev; u64 objectid; + u64 qgroup_reserved = 0;
root_item = kzalloc(sizeof(*root_item), GFP_KERNEL); if (!root_item) @@ -640,13 +641,18 @@ static noinline int create_subvol(struct trans_num_items, false); if (ret) goto out_new_inode_args; + qgroup_reserved = block_rsv.qgroup_rsv_reserved;
trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); - btrfs_subvolume_release_metadata(root, &block_rsv); - goto out_new_inode_args; + goto out_release_rsv; } + ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root); + if (ret) + goto out; + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); + qgroup_reserved = 0; trans->block_rsv = &block_rsv; trans->bytes_reserved = block_rsv.size; /* Tree log can't currently deal with an inode which is a new root. */ @@ -757,9 +763,11 @@ static noinline int create_subvol(struct out: trans->block_rsv = NULL; trans->bytes_reserved = 0; - btrfs_subvolume_release_metadata(root, &block_rsv); - btrfs_end_transaction(trans); +out_release_rsv: + btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL); + if (qgroup_reserved) + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); out_new_inode_args: btrfs_new_inode_args_destroy(&new_inode_args); out_inode: @@ -781,6 +789,8 @@ static int create_snapshot(struct btrfs_ struct btrfs_pending_snapshot *pending_snapshot; unsigned int trans_num_items; struct btrfs_trans_handle *trans; + struct btrfs_block_rsv *block_rsv; + u64 qgroup_reserved = 0; int ret;
/* We do not support snapshotting right now. */ @@ -817,19 +827,19 @@ static int create_snapshot(struct btrfs_ goto free_pending; }
- btrfs_init_block_rsv(&pending_snapshot->block_rsv, - BTRFS_BLOCK_RSV_TEMP); + block_rsv = &pending_snapshot->block_rsv; + btrfs_init_block_rsv(block_rsv, BTRFS_BLOCK_RSV_TEMP); /* * 1 to add dir item * 1 to add dir index * 1 to update parent inode item */ trans_num_items = create_subvol_num_items(inherit) + 3; - ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, - &pending_snapshot->block_rsv, + ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, block_rsv, trans_num_items, false); if (ret) goto free_pending; + qgroup_reserved = block_rsv->qgroup_rsv_reserved;
pending_snapshot->dentry = dentry; pending_snapshot->root = root; @@ -842,6 +852,13 @@ static int create_snapshot(struct btrfs_ ret = PTR_ERR(trans); goto fail; } + ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root); + if (ret) { + btrfs_end_transaction(trans); + goto fail; + } + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); + qgroup_reserved = 0;
trans->pending_snapshot = pending_snapshot;
@@ -871,7 +888,9 @@ fail: if (ret && pending_snapshot->snap) pending_snapshot->snap->anon_dev = 0; btrfs_put_root(pending_snapshot->snap); - btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv); + btrfs_block_rsv_release(fs_info, block_rsv, (u64)-1, NULL); + if (qgroup_reserved) + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); free_pending: if (pending_snapshot->anon_dev) free_anon_bdev(pending_snapshot->anon_dev); --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -539,13 +539,3 @@ int btrfs_subvolume_reserve_metadata(str } return ret; } - -void btrfs_subvolume_release_metadata(struct btrfs_root *root, - struct btrfs_block_rsv *rsv) -{ - struct btrfs_fs_info *fs_info = root->fs_info; - u64 qgroup_to_release; - - btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); - btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release); -} --- a/fs/btrfs/root-tree.h +++ b/fs/btrfs/root-tree.h @@ -8,8 +8,6 @@ struct fscrypt_str; int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, int nitems, bool use_global_rsv); -void btrfs_subvolume_release_metadata(struct btrfs_root *root, - struct btrfs_block_rsv *rsv); int btrfs_add_root_ref(struct btrfs_trans_handle *trans, u64 root_id, u64 ref_id, u64 dirid, u64 sequence, const struct fscrypt_str *name);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit 71537e35c324ea6fbd68377a4f26bb93a831ae35 upstream.
When running delayed inode updates, we do not record the inode's root in the transaction, but we do allocate PREALLOC and thus converted PERTRANS space for it. To be sure we free that PERTRANS meta rsv, we must ensure that we record the root in the transaction.
Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/delayed-inode.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1128,6 +1128,9 @@ __btrfs_commit_inode_delayed_items(struc if (ret) return ret;
+ ret = btrfs_record_root_in_trans(trans, node->root); + if (ret) + return ret; ret = btrfs_update_delayed_inode(trans, node->root, path, node); return ret; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit 211de93367304ab395357f8cb12568a4d1e20701 upstream.
The transaction is only able to free PERTRANS reservations for a root once that root has been recorded with the TRANS tag on the roots radix tree. Therefore, until we are sure that this root will get tagged, it isn't safe to convert. Generally, this is not an issue as *some* transaction will likely tag the root before long and this reservation will get freed in that transaction, but technically it could stick around until unmount and result in a warning about leaked metadata reservation space.
This path is most exercised by running the generic/269 fstest with CONFIG_BTRFS_DEBUG.
Fixes: a6496849671a ("btrfs: fix start transaction qgroup rsv double free") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/transaction.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
--- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -747,14 +747,6 @@ again: h->reloc_reserved = reloc_reserved; }
- /* - * Now that we have found a transaction to be a part of, convert the - * qgroup reservation from prealloc to pertrans. A different transaction - * can't race in and free our pertrans out from under us. - */ - if (qgroup_reserved) - btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); - got_it: if (!current->journal_info) current->journal_info = h; @@ -788,8 +780,15 @@ got_it: * not just freed. */ btrfs_end_transaction(h); - return ERR_PTR(ret); + goto reserve_fail; } + /* + * Now that we have found a transaction to be a part of, convert the + * qgroup reservation from prealloc to pertrans. A different transaction + * can't race in and free our pertrans out from under us. + */ + if (qgroup_reserved) + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
return h;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit 4fe82aedeb8a8cb09bfa60f55ab57b5c10a74ac4 upstream.
cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries") reinstatiates msg_control before every __sys_sendmsg_sock(), since the function can overwrite the value in msghdr. We need to do same for zerocopy sendmsg.
Cc: stable@vger.kernel.org Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg") Link: https://github.com/axboe/liburing/issues/1067 Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.171259617... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 1 + 1 file changed, 1 insertion(+)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -1255,6 +1255,7 @@ int io_sendmsg_zc(struct io_kiocb *req,
if (req_has_async_data(req)) { kmsg = req->async_data; + kmsg->msg.msg_control_user = sr->msg_control; } else { ret = io_sendmsg_copy_hdr(req, &iomsg); if (ret)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Yejian zhengyejian1@huawei.com
commit 325f3fb551f8cd672dbbfc4cf58b14f9ee3fc9e8 upstream.
When unloading a module, its state is changing MODULE_STATE_LIVE -> MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take a time. `is_module_text_address()` and `__module_text_address()` works with MODULE_STATE_LIVE and MODULE_STATE_GOING. If we use `is_module_text_address()` and `__module_text_address()` separately, there is a chance that the first one is succeeded but the next one is failed because module->state becomes MODULE_STATE_UNFORMED between those operations.
In `check_kprobe_address_safe()`, if the second `__module_text_address()` is failed, that is ignored because it expected a kernel_text address. But it may have failed simply because module->state has been changed to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify non-exist module text address (use-after-free).
To fix this problem, we should not use separated `is_module_text_address()` and `__module_text_address()`, but use only `__module_text_address()` once and do `try_module_get(module)` which is only available with MODULE_STATE_LIVE.
Link: https://lore.kernel.org/all/20240410015802.265220-1-zhengyejian1@huawei.com/
Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas") Cc: stable@vger.kernel.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kprobes.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1567,10 +1567,17 @@ static int check_kprobe_address_safe(str jump_label_lock(); preempt_disable();
- /* Ensure it is not in reserved area nor out of text */ - if (!(core_kernel_text((unsigned long) p->addr) || - is_module_text_address((unsigned long) p->addr)) || - in_gate_area_no_mm((unsigned long) p->addr) || + /* Ensure the address is in a text area, and find a module if exists. */ + *probed_mod = NULL; + if (!core_kernel_text((unsigned long) p->addr)) { + *probed_mod = __module_text_address((unsigned long) p->addr); + if (!(*probed_mod)) { + ret = -EINVAL; + goto out; + } + } + /* Ensure it is not in reserved area. */ + if (in_gate_area_no_mm((unsigned long) p->addr) || within_kprobe_blacklist((unsigned long) p->addr) || jump_label_text_reserved(p->addr, p->addr) || static_call_text_reserved(p->addr, p->addr) || @@ -1580,8 +1587,7 @@ static int check_kprobe_address_safe(str goto out; }
- /* Check if 'p' is probing a module. */ - *probed_mod = __module_text_address((unsigned long) p->addr); + /* Get module refcount and reject __init functions for loaded modules. */ if (*probed_mod) { /* * We must hold a refcount of the probed module while updating
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhenhua Huang quic_zhenhuah@quicinc.com
commit fbbdc255fbee59b4207a5398fdb4f04590681a79 upstream.
commit 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig") adds bootloader argument comments into /proc/bootconfig.
/proc/bootconfig shows boot_command_line[] multiple times following every xbc key value pair, that's duplicated and not necessary. Remove redundant ones.
Output before and after the fix is like: key1 = value1 *bootloader argument comments* key2 = value2 *bootloader argument comments* key3 = value3 *bootloader argument comments* ...
key1 = value1 key2 = value2 key3 = value3 *bootloader argument comments* ...
Link: https://lore.kernel.org/all/20240409044358.1156477-1-paulmck@kernel.org/
Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig") Signed-off-by: Zhenhua Huang quic_zhenhuah@quicinc.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Cc: linux-trace-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/bootconfig.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c index 902b326e1e56..e5635a6b127b 100644 --- a/fs/proc/bootconfig.c +++ b/fs/proc/bootconfig.c @@ -62,12 +62,12 @@ static int __init copy_xbc_key_value_list(char *dst, size_t size) break; dst += ret; } - if (ret >= 0 && boot_command_line[0]) { - ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n", - boot_command_line); - if (ret > 0) - dst += ret; - } + } + if (ret >= 0 && boot_command_line[0]) { + ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n", + boot_command_line); + if (ret > 0) + dst += ret; } out: kfree(key);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit c722cea208789d9e2660992bcd05fb9fac3adb56 upstream.
If the "bootconfig" kernel command-line argument was specified or if the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are no embedded kernel parameter, omit the "# Parameters from bootloader:" comment from the /proc/bootconfig file. This will cause automation to fall back to the /proc/cmdline file, which will be identical to the comment in this no-embedded-kernel-parameters case.
Link: https://lore.kernel.org/all/20240409044358.1156477-2-paulmck@kernel.org/
Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/bootconfig.c | 2 +- include/linux/bootconfig.h | 1 + init/main.c | 5 +++++ 3 files changed, 7 insertions(+), 1 deletion(-)
--- a/fs/proc/bootconfig.c +++ b/fs/proc/bootconfig.c @@ -63,7 +63,7 @@ static int __init copy_xbc_key_value_lis dst += ret; } } - if (ret >= 0 && boot_command_line[0]) { + if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) { ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n", boot_command_line); if (ret > 0) --- a/include/linux/bootconfig.h +++ b/include/linux/bootconfig.h @@ -10,6 +10,7 @@ #ifdef __KERNEL__ #include <linux/kernel.h> #include <linux/types.h> +bool __init cmdline_has_extra_options(void); #else /* !__KERNEL__ */ /* * NOTE: This is only for tools/bootconfig, because tools/bootconfig will --- a/init/main.c +++ b/init/main.c @@ -485,6 +485,11 @@ static int __init warn_bootconfig(char *
early_param("bootconfig", warn_bootconfig);
+bool __init cmdline_has_extra_options(void) +{ + return extra_command_line || extra_init_args; +} + /* Change NUL term back to "=", to make "param" the whole string. */ static void __init repair_env_string(char *param, char *val) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Wetzel Alexander@wetzel-home.de
commit 27f58c04a8f438078583041468ec60597841284d upstream.
sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling scsi_device_put().
sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device.
The resulting NULL pointer exception will then crash the kernel.
Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") Cc: stable@vger.kernel.org Signed-off-by: Alexander Wetzel Alexander@wetzel-home.de Link: https://lore.kernel.org/r/20240320213032.18221-1-Alexander@wetzel-home.de Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/sg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2207,6 +2207,7 @@ sg_remove_sfp_usercontext(struct work_st { struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work); struct sg_device *sdp = sfp->parentdp; + struct scsi_device *device = sdp->device; Sg_request *srp; unsigned long iflags;
@@ -2232,8 +2233,9 @@ sg_remove_sfp_usercontext(struct work_st "sg_remove_sfp: sfp=0x%p\n", sfp)); kfree(sfp);
- scsi_device_put(sdp->device); + WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1); kref_put(&sdp->d_ref, sg_device_destroy); + scsi_device_put(device); module_put(THIS_MODULE); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Wetzel Alexander@wetzel-home.de
commit d4e655c49f474deffaf5ed7e65034b8167ee39c8 upstream.
Commit 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race") introduced an incorrect WARN_ON_ONCE() and missed a sequence where sg_device_destroy() was used after scsi_device_put().
sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device.
Drop the incorrect WARN_ON_ONCE() - allowing more than one concurrent access to the sg device - and make sure sg_device_destroy() is not used after scsi_device_put() in the error handling.
Link: https://lore.kernel.org/all/5375B275-D137-4D5F-BE25-6AF8ACAE41EF@linux.ibm.c... Fixes: 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race") Cc: stable@vger.kernel.org Signed-off-by: Alexander Wetzel Alexander@wetzel-home.de Link: https://lore.kernel.org/r/20240401191038.18359-1-Alexander@wetzel-home.de Tested-by: Sachin Sant sachinp@linux.ibm.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/sg.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
--- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -285,6 +285,7 @@ sg_open(struct inode *inode, struct file int dev = iminor(inode); int flags = filp->f_flags; struct request_queue *q; + struct scsi_device *device; Sg_device *sdp; Sg_fd *sfp; int retval; @@ -301,11 +302,12 @@ sg_open(struct inode *inode, struct file
/* This driver's module count bumped by fops_get in <linux/fs.h> */ /* Prevent the device driver from vanishing while we sleep */ - retval = scsi_device_get(sdp->device); + device = sdp->device; + retval = scsi_device_get(device); if (retval) goto sg_put;
- retval = scsi_autopm_get_device(sdp->device); + retval = scsi_autopm_get_device(device); if (retval) goto sdp_put;
@@ -313,7 +315,7 @@ sg_open(struct inode *inode, struct file * check if O_NONBLOCK. Permits SCSI commands to be issued * during error recovery. Tread carefully. */ if (!((flags & O_NONBLOCK) || - scsi_block_when_processing_errors(sdp->device))) { + scsi_block_when_processing_errors(device))) { retval = -ENXIO; /* we are in error recovery for this device */ goto error_out; @@ -344,7 +346,7 @@ sg_open(struct inode *inode, struct file
if (sdp->open_cnt < 1) { /* no existing opens */ sdp->sgdebug = 0; - q = sdp->device->request_queue; + q = device->request_queue; sdp->sg_tablesize = queue_max_segments(q); } sfp = sg_add_sfp(sdp); @@ -370,10 +372,11 @@ out_undo: error_mutex_locked: mutex_unlock(&sdp->open_rel_lock); error_out: - scsi_autopm_put_device(sdp->device); + scsi_autopm_put_device(device); sdp_put: - scsi_device_put(sdp->device); - goto sg_put; + kref_put(&sdp->d_ref, sg_device_destroy); + scsi_device_put(device); + return retval; }
/* Release resources associated with a successful sg_open() @@ -2233,7 +2236,6 @@ sg_remove_sfp_usercontext(struct work_st "sg_remove_sfp: sfp=0x%p\n", sfp)); kfree(sfp);
- WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1); kref_put(&sdp->d_ref, sg_device_destroy); scsi_device_put(device); module_put(THIS_MODULE);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wachowski, Karol karol.wachowski@intel.com
commit f0cf7ffcd02953c72fed5995378805883d16203e upstream.
Return value of drmm_mutex_init(ipc->lock) was unchecked.
Fixes: 5d7422cfb498 ("accel/ivpu: Add IPC driver and JSM messages") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Wachowski, Karol karol.wachowski@intel.com Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-2-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_ipc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/accel/ivpu/ivpu_ipc.c +++ b/drivers/accel/ivpu/ivpu_ipc.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (C) 2020-2023 Intel Corporation + * Copyright (C) 2020-2024 Intel Corporation */
#include <linux/genalloc.h> @@ -501,7 +501,11 @@ int ivpu_ipc_init(struct ivpu_device *vd spin_lock_init(&ipc->cons_lock); INIT_LIST_HEAD(&ipc->cons_list); INIT_LIST_HEAD(&ipc->cb_msg_list); - drmm_mutex_init(&vdev->drm, &ipc->lock); + ret = drmm_mutex_init(&vdev->drm, &ipc->lock); + if (ret) { + ivpu_err(vdev, "Failed to initialize ipc->lock, ret %d\n", ret); + goto err_free_rx; + } ivpu_ipc_reset(vdev); return 0;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wachowski, Karol karol.wachowski@intel.com
commit 3534eacbf101f6e66105f03d869a03893407c384 upstream.
In case of failed power up we end up left in PCI D3hot state making it impossible to access NPU registers on retry. Enter D0 state on retry before proceeding with power up sequence.
Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Wachowski, Karol karol.wachowski@intel.com Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-4-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_pm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -74,10 +74,10 @@ static int ivpu_resume(struct ivpu_devic { int ret;
- pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0); +retry: pci_restore_state(to_pci_dev(vdev->drm.dev)); + pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0);
-retry: ret = ivpu_hw_power_up(vdev); if (ret) { ivpu_err(vdev, "Failed to power up HW: %d\n", ret);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
commit 875bc9cd1b33eb027a5663f5e6878a43d98e9a16 upstream.
Put NPU in D3hot after ivpu_resume() fails to power up the device. This will assure that D3->D0 power cycle will be performed before the next resume and also will minimize power usage in this corner case.
Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-5-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_pm.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -100,6 +100,7 @@ err_mmu_disable: ivpu_mmu_disable(vdev); err_power_down: ivpu_hw_power_down(vdev); + pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
if (!ivpu_fw_is_cold_boot(vdev)) { ivpu_pm_prepare_cold_boot(vdev);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
commit c52c35e5b404b95a5bcff39af9be1b9293be3434 upstream.
DRM_IVPU_PARAM_CORE_CLOCK_RATE returns current NPU frequency which could be 0 if device was sleeping. This value isn't really useful to the user space, so return max freq instead which can be used to estimate NPU performance.
Fixes: c39dc15191c4 ("accel/ivpu: Read clock rate only if device is up") Cc: stable@vger.kernel.org # v6.7 Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-7-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_drv.c | 18 +----------------- drivers/accel/ivpu/ivpu_hw.h | 6 ++++++ drivers/accel/ivpu/ivpu_hw_37xx.c | 7 ++++--- drivers/accel/ivpu/ivpu_hw_40xx.c | 6 ++++++ 4 files changed, 17 insertions(+), 20 deletions(-)
--- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -131,22 +131,6 @@ static int ivpu_get_capabilities(struct return 0; }
-static int ivpu_get_core_clock_rate(struct ivpu_device *vdev, u64 *clk_rate) -{ - int ret; - - ret = ivpu_rpm_get_if_active(vdev); - if (ret < 0) - return ret; - - *clk_rate = ret ? ivpu_hw_reg_pll_freq_get(vdev) : 0; - - if (ret) - ivpu_rpm_put(vdev); - - return 0; -} - static int ivpu_get_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct ivpu_file_priv *file_priv = file->driver_priv; @@ -170,7 +154,7 @@ static int ivpu_get_param_ioctl(struct d args->value = vdev->platform; break; case DRM_IVPU_PARAM_CORE_CLOCK_RATE: - ret = ivpu_get_core_clock_rate(vdev, &args->value); + args->value = ivpu_hw_ratio_to_freq(vdev, vdev->hw->pll.max_ratio); break; case DRM_IVPU_PARAM_NUM_CONTEXTS: args->value = ivpu_get_context_count(vdev); --- a/drivers/accel/ivpu/ivpu_hw.h +++ b/drivers/accel/ivpu/ivpu_hw.h @@ -21,6 +21,7 @@ struct ivpu_hw_ops { u32 (*profiling_freq_get)(struct ivpu_device *vdev); void (*profiling_freq_drive)(struct ivpu_device *vdev, bool enable); u32 (*reg_pll_freq_get)(struct ivpu_device *vdev); + u32 (*ratio_to_freq)(struct ivpu_device *vdev, u32 ratio); u32 (*reg_telemetry_offset_get)(struct ivpu_device *vdev); u32 (*reg_telemetry_size_get)(struct ivpu_device *vdev); u32 (*reg_telemetry_enable_get)(struct ivpu_device *vdev); @@ -130,6 +131,11 @@ static inline u32 ivpu_hw_reg_pll_freq_g return vdev->hw->ops->reg_pll_freq_get(vdev); };
+static inline u32 ivpu_hw_ratio_to_freq(struct ivpu_device *vdev, u32 ratio) +{ + return vdev->hw->ops->ratio_to_freq(vdev, ratio); +} + static inline u32 ivpu_hw_reg_telemetry_offset_get(struct ivpu_device *vdev) { return vdev->hw->ops->reg_telemetry_offset_get(vdev); --- a/drivers/accel/ivpu/ivpu_hw_37xx.c +++ b/drivers/accel/ivpu/ivpu_hw_37xx.c @@ -805,12 +805,12 @@ static void ivpu_hw_37xx_profiling_freq_ /* Profiling freq - is a debug feature. Unavailable on VPU 37XX. */ }
-static u32 ivpu_hw_37xx_pll_to_freq(u32 ratio, u32 config) +static u32 ivpu_hw_37xx_ratio_to_freq(struct ivpu_device *vdev, u32 ratio) { u32 pll_clock = PLL_REF_CLK_FREQ * ratio; u32 cpu_clock;
- if ((config & 0xff) == PLL_RATIO_4_3) + if ((vdev->hw->config & 0xff) == PLL_RATIO_4_3) cpu_clock = pll_clock * 2 / 4; else cpu_clock = pll_clock * 2 / 5; @@ -829,7 +829,7 @@ static u32 ivpu_hw_37xx_reg_pll_freq_get if (!ivpu_is_silicon(vdev)) return PLL_SIMULATION_FREQ;
- return ivpu_hw_37xx_pll_to_freq(pll_curr_ratio, vdev->hw->config); + return ivpu_hw_37xx_ratio_to_freq(vdev, pll_curr_ratio); }
static u32 ivpu_hw_37xx_reg_telemetry_offset_get(struct ivpu_device *vdev) @@ -1052,6 +1052,7 @@ const struct ivpu_hw_ops ivpu_hw_37xx_op .profiling_freq_get = ivpu_hw_37xx_profiling_freq_get, .profiling_freq_drive = ivpu_hw_37xx_profiling_freq_drive, .reg_pll_freq_get = ivpu_hw_37xx_reg_pll_freq_get, + .ratio_to_freq = ivpu_hw_37xx_ratio_to_freq, .reg_telemetry_offset_get = ivpu_hw_37xx_reg_telemetry_offset_get, .reg_telemetry_size_get = ivpu_hw_37xx_reg_telemetry_size_get, .reg_telemetry_enable_get = ivpu_hw_37xx_reg_telemetry_enable_get, --- a/drivers/accel/ivpu/ivpu_hw_40xx.c +++ b/drivers/accel/ivpu/ivpu_hw_40xx.c @@ -980,6 +980,11 @@ static u32 ivpu_hw_40xx_reg_pll_freq_get return PLL_RATIO_TO_FREQ(pll_curr_ratio); }
+static u32 ivpu_hw_40xx_ratio_to_freq(struct ivpu_device *vdev, u32 ratio) +{ + return PLL_RATIO_TO_FREQ(ratio); +} + static u32 ivpu_hw_40xx_reg_telemetry_offset_get(struct ivpu_device *vdev) { return REGB_RD32(VPU_40XX_BUTTRESS_VPU_TELEMETRY_OFFSET); @@ -1230,6 +1235,7 @@ const struct ivpu_hw_ops ivpu_hw_40xx_op .profiling_freq_get = ivpu_hw_40xx_profiling_freq_get, .profiling_freq_drive = ivpu_hw_40xx_profiling_freq_drive, .reg_pll_freq_get = ivpu_hw_40xx_reg_pll_freq_get, + .ratio_to_freq = ivpu_hw_40xx_ratio_to_freq, .reg_telemetry_offset_get = ivpu_hw_40xx_reg_telemetry_offset_get, .reg_telemetry_size_get = ivpu_hw_40xx_reg_telemetry_size_get, .reg_telemetry_enable_get = ivpu_hw_40xx_reg_telemetry_enable_get,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
commit fd7726e75968b27fe98534ccbf47ccd6fef686f3 upstream.
ivpu_device->context_xa is locked both in kernel thread and IRQ context. It requires XA_FLAGS_LOCK_IRQ flag to be passed during initialization otherwise the lock could be acquired from a thread and interrupted by an IRQ that locks it for the second time causing the deadlock.
This deadlock was reported by lockdep and observed in internal tests.
Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-9-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -514,7 +514,7 @@ static int ivpu_dev_init(struct ivpu_dev vdev->context_xa_limit.min = IVPU_USER_CONTEXT_MIN_SSID; vdev->context_xa_limit.max = IVPU_USER_CONTEXT_MAX_SSID; atomic64_set(&vdev->unique_id_counter, 0); - xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC); + xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); xa_init_flags(&vdev->submitted_jobs_xa, XA_FLAGS_ALLOC1); lockdep_set_class(&vdev->submitted_jobs_xa.xa_lock, &submitted_jobs_xa_lock_class_key); INIT_LIST_HEAD(&vdev->bo_list);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zack Rusin zack.rusin@broadcom.com
commit 4c08f01934ab67d1d283d5cbaa52b923abcfe4cd upstream.
Enable DMA mappings in vmwgfx after TTM has been fixed in commit 3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for page-based iomem")
This enables full guest-backed memory support and in particular allows usage of screen targets as the presentation mechanism.
Signed-off-by: Zack Rusin zack.rusin@broadcom.com Reported-by: Ye Li ye.li@broadcom.com Tested-by: Ye Li ye.li@broadcom.com Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active") Cc: Broadcom internal kernel review list bcm-kernel-feedback-list@broadcom.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v6.6+ Reviewed-by: Martin Krastev martin.krastev@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20240408022802.358641-1-zack.r... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -666,11 +666,12 @@ static int vmw_dma_select_mode(struct vm [vmw_dma_map_populate] = "Caching DMA mappings.", [vmw_dma_map_bind] = "Giving up DMA mappings early."};
- /* TTM currently doesn't fully support SEV encryption. */ - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) - return -EINVAL; - - if (vmw_force_coherent) + /* + * When running with SEV we always want dma mappings, because + * otherwise ttm tt pool pages will bounce through swiotlb running + * out of available space. + */ + if (vmw_force_coherent || cc_platform_has(CC_ATTR_MEM_ENCRYPT)) dev_priv->map_mode = vmw_dma_alloc_coherent; else if (vmw_restrict_iommu) dev_priv->map_mode = vmw_dma_map_bind;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit dcd8992e47f13afb5c11a61e8d9c141c35e23751 upstream.
All joined pipes share the same transcoder/timing generator. Currently we just do the commits per-pipe, which doesn't really work if we need to change switch between non-VRR and VRR timings generators on the fly, or even when sending the push to the transcoder. For now just disable VRR when bigjoiner is needed.
Cc: stable@vger.kernel.org Tested-by: Vidya Srinivas vidya.srinivas@intel.com Reviewed-by: Vandita Kulkarni vandita.kulkarni@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-6-ville.s... Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com (cherry picked from commit f9d5e51db65652dbd8a2102fd7619440e3599fd2) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_vrr.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_vrr.c +++ b/drivers/gpu/drm/i915/display/intel_vrr.c @@ -117,6 +117,13 @@ intel_vrr_compute_config(struct intel_cr const struct drm_display_info *info = &connector->base.display_info; int vmin, vmax;
+ /* + * FIXME all joined pipes share the same transcoder. + * Need to account for that during VRR toggle/push/etc. + */ + if (crtc_state->bigjoiner_pipes) + return; + if (adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE) return;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com
commit 8bdfb4ea95ca738d33ef71376c21eba20130f2eb upstream.
Currently, with F32 HWS GPU reset is only when unmap queue fails.
However, if compute queue doesn't repond to preemption request in time unmap will return without any error. In this case, only preemption error is logged and Reset is not triggered. Call GPU reset in this case also.
Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Reviewed-by: Mukul Joshi mukul.joshi@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1997,6 +1997,7 @@ static int unmap_queues_cpsch(struct dev dev_err(dev, "HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out\n"); while (halt_if_hws_hang) schedule(); + kfd_hws_hang(dqm); return -ETIME; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jammy Huang jammy_huang@aspeedtech.com
commit bc004f5038220b1891ef4107134ccae44be55109 upstream.
There is a while-loop in ast_dp_set_on_off() that could lead to infinite-loop. This is because the register, VGACRI-Dx, checked in this API is a scratch register actually controlled by a MCU, named DPMCU, in BMC.
These scratch registers are protected by scu-lock. If suc-lock is not off, DPMCU can not update these registers and then host will have soft lockup due to never updated status.
DPMCU is used to control DP and relative registers to handshake with host's VGA driver. Even the most time-consuming task, DP's link training, is less than 100ms. 200ms should be enough.
Signed-off-by: Jammy Huang jammy_huang@aspeedtech.com Fixes: 594e9c04b586 ("drm/ast: Create the driver for ASPEED proprietory Display-Port") Reviewed-by: Jocelyn Falempe jfalempe@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: KuoHsiang Chou kuohsiang_chou@aspeedtech.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Dave Airlie airlied@redhat.com Cc: Jocelyn Falempe jfalempe@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.19+ Link: https://patchwork.freedesktop.org/patch/msgid/20240403090246.1495487-1-jammy... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/ast/ast_dp.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/ast/ast_dp.c +++ b/drivers/gpu/drm/ast/ast_dp.c @@ -180,6 +180,7 @@ void ast_dp_set_on_off(struct drm_device { struct ast_device *ast = to_ast_device(dev); u8 video_on_off = on; + u32 i = 0;
// Video On/Off ast_set_index_reg_mask(ast, AST_IO_VGACRI, 0xE3, (u8) ~AST_DP_VIDEO_ENABLE, on); @@ -192,6 +193,8 @@ void ast_dp_set_on_off(struct drm_device ASTDP_MIRROR_VIDEO_ENABLE) != video_on_off) { // wait 1 ms mdelay(1); + if (++i > 200) + break; } } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Brezillon boris.brezillon@collabora.com
commit 1fc9af813b25e146d3607669247d0f970f5a87c3 upstream.
Subject: [PATCH 6.8 127/172] drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
If some the pages or sgt allocation failed, we shouldn't release the pages ref we got earlier, otherwise we will end up with unbalanced get/put_pages() calls. We should instead leave everything in place and let the BO release function deal with extra cleanup when the object is destroyed, or let the fault handler try again next time it's called.
Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Cc: stable@vger.kernel.org Reviewed-by: Steven Price steven.price@arm.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Boris Brezillon boris.brezillon@collabora.com Co-developed-by: Dmitry Osipenko dmitry.osipenko@collabora.com Signed-off-by: Dmitry Osipenko dmitry.osipenko@collabora.com Link: https://patchwork.freedesktop.org/patch/msgid/20240105184624.508603-18-dmitr... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -502,11 +502,18 @@ static int panfrost_mmu_map_fault_addr(s mapping_set_unevictable(mapping);
for (i = page_offset; i < page_offset + NUM_FAULT_PAGES; i++) { + /* Can happen if the last fault only partially filled this + * section of the pages array before failing. In that case + * we skip already filled pages. + */ + if (pages[i]) + continue; + pages[i] = shmem_read_mapping_page(mapping, i); if (IS_ERR(pages[i])) { ret = PTR_ERR(pages[i]); pages[i] = NULL; - goto err_pages; + goto err_unlock; } }
@@ -514,7 +521,7 @@ static int panfrost_mmu_map_fault_addr(s ret = sg_alloc_table_from_pages(sgt, pages + page_offset, NUM_FAULT_PAGES, 0, SZ_2M, GFP_KERNEL); if (ret) - goto err_pages; + goto err_unlock;
ret = dma_map_sgtable(pfdev->dev, sgt, DMA_BIDIRECTIONAL, 0); if (ret) @@ -537,8 +544,6 @@ out:
err_map: sg_free_table(sgt); -err_pages: - drm_gem_shmem_put_pages(&bo->base); err_unlock: dma_resv_unlock(obj->resv); err_bo:
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 3eadd887dbac1df8f25f701e5d404d1b90fd0fea upstream.
The modes[] array contains pointers to modes on the connectors' mode lists, which are protected by dev->mode_config.mutex. Thus we need to extend modes[] the same protection or by the time we use it the elements may already be pointing to freed/reused memory.
Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10583 Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240404203336.10454-2-ville.s... Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Jani Nikula jani.nikula@intel.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_client_modeset.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/drm_client_modeset.c +++ b/drivers/gpu/drm/drm_client_modeset.c @@ -777,6 +777,7 @@ int drm_client_modeset_probe(struct drm_ unsigned int total_modes_count = 0; struct drm_client_offset *offsets; unsigned int connector_count = 0; + /* points to modes protected by mode_config.mutex */ struct drm_display_mode **modes; struct drm_crtc **crtcs; int i, ret = 0; @@ -845,7 +846,6 @@ int drm_client_modeset_probe(struct drm_ drm_client_pick_crtcs(client, connectors, connector_count, crtcs, modes, 0, width, height); } - mutex_unlock(&dev->mode_config.mutex);
drm_client_modeset_release(client);
@@ -875,6 +875,7 @@ int drm_client_modeset_probe(struct drm_ modeset->y = offset->y; } } + mutex_unlock(&dev->mode_config.mutex);
mutex_unlock(&client->modeset_mutex); out:
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 0640f47b742667fca6aac174f7cd62b6c2c7532c upstream.
Make sure to put the runtime PM usage count (and suspend) also when receiving a disconnect event while in the ST_MAINLINK_READY state.
This specifically avoids leaking a runtime PM usage count on every disconnect with display servers that do not automatically enable external displays when receiving a hotplug notification.
Fixes: 5814b8bf086a ("drm/msm/dp: incorporate pm_runtime framework into DP driver") Cc: stable@vger.kernel.org # 6.8 Cc: Kuogee Hsieh quic_khsieh@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/582744/ Link: https://lore.kernel.org/r/20240313164306.23133-2-johan+linaro@kernel.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/dp/dp_display.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/msm/dp/dp_display.c +++ b/drivers/gpu/drm/msm/dp/dp_display.c @@ -655,6 +655,7 @@ static int dp_hpd_unplug_handle(struct d dp_display_host_phy_exit(dp); dp->hpd_state = ST_DISCONNECTED; dp_display_notify_disconnect(&dp->dp_display.pdev->dev); + pm_runtime_put_sync(&pdev->dev); mutex_unlock(&dp->event_mutex); return 0; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit e86750b01a1560f198e4b3e21bb3f78bfd5bb2c3 upstream.
Make sure to balance the runtime PM usage counter (and suspend) before returning on connect failures (e.g. DPCD read failures after a spurious connect event or if link training fails).
Fixes: 5814b8bf086a ("drm/msm/dp: incorporate pm_runtime framework into DP driver") Cc: stable@vger.kernel.org # 6.8 Cc: Kuogee Hsieh quic_khsieh@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/582746/ Link: https://lore.kernel.org/r/20240313164306.23133-3-johan+linaro@kernel.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/dp/dp_display.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/msm/dp/dp_display.c +++ b/drivers/gpu/drm/msm/dp/dp_display.c @@ -598,6 +598,7 @@ static int dp_hpd_plug_handle(struct dp_ ret = dp_display_usbpd_configure_cb(&pdev->dev); if (ret) { /* link train failed */ dp->hpd_state = ST_DISCONNECTED; + pm_runtime_put_sync(&pdev->dev); } else { dp->hpd_state = ST_MAINLINK_READY; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lang Yu Lang.Yu@amd.com
commit 0f1bbcc2bab25d5fb2dfb1ee3e08131437690d3d upstream.
Otherwise the old one will be used during GPU reset. That's not expected.
Signed-off-by: Lang Yu Lang.Yu@amd.com Reviewed-by: Feifei Xu Feifei.Xu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c @@ -225,6 +225,8 @@ static int umsch_mm_v4_0_ring_start(stru
WREG32_SOC15(VCN, 0, regVCN_UMSCH_RB_SIZE, ring->ring_size);
+ ring->wptr = 0; + data = RREG32_SOC15(VCN, 0, regVCN_RB_ENABLE); data &= ~(VCN_RB_ENABLE__AUDIO_RB_EN_MASK); WREG32_SOC15(VCN, 0, regVCN_RB_ENABLE, data);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 00b436182138310bb8d362b912b12a9df8f72ca3 upstream.
can1_lpcg: clock-controller@5ace0000 { ... Col1 Col2 clocks = <&clk IMX_SC_R_CAN_1 IMX_SC_PM_CLK_PER>,// 0 0 <&dma_ipg_clk>, // 1 4 <&dma_ipg_clk>; // 2 5 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>; };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver
&flexcan2 { clocks = <&can1_lpcg 1>, <&can1_lpcg 0>; ^^ ^^ Should be: clocks = <&can1_lpcg IMX_LPCG_CLK_4>, <&can1_lpcg IMX_LPCG_CLK_0>; };
Arg0 is divided by 4 in lpcg driver. So flexcan get IMX_SC_PM_CLK_PER by <&can1_lpcg 1> and <&can1_lpcg 0>. Although function work, code logic is wrong. Fix it by using correct clock indices.
Cc: stable@vger.kernel.org Fixes: be85831de020 ("arm64: dts: imx8qm: add can node in devicetree") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi @@ -127,15 +127,15 @@ };
&flexcan2 { - clocks = <&can1_lpcg 1>, - <&can1_lpcg 0>; + clocks = <&can1_lpcg IMX_LPCG_CLK_4>, + <&can1_lpcg IMX_LPCG_CLK_0>; assigned-clocks = <&clk IMX_SC_R_CAN_1 IMX_SC_PM_CLK_PER>; fsl,clk-source = /bits/ 8 <1>; };
&flexcan3 { - clocks = <&can2_lpcg 1>, - <&can2_lpcg 0>; + clocks = <&can2_lpcg IMX_LPCG_CLK_4>, + <&can2_lpcg IMX_LPCG_CLK_0>; assigned-clocks = <&clk IMX_SC_R_CAN_2 IMX_SC_PM_CLK_PER>; fsl,clk-source = /bits/ 8 <1>; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 0893392334b5dffdf616a53679c6a2942c46391b upstream.
can0_lpcg: clock-controller@5acd0000 { ... Col1 Col2 clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>, // 0 0 <&dma_ipg_clk>, // 1 4 <&dma_ipg_clk>; // 2 5 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>; }
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
flexcan1: can@5a8d0000 { clocks = <&can0_lpcg 1>, <&can0_lpcg 0>; ^^ ^^ Should be: clocks = <&can0_lpcg IMX_LPCG_CLK_4>, <&can0_lpcg IMX_LPCG_CLK_0>; };
Arg0 is divided by 4 in lpcg driver. flexcan driver get IMX_SC_PM_CLK_PER by <&can0_lpcg 1> and <&can0_lpcg 0>. Although function can work, code logic is wrong. Fix it by using correct clock indices.
Cc: stable@vger.kernel.org Fixes: 5e7d5b023e03 ("arm64: dts: imx8qxp: add flexcan in adma") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi @@ -406,8 +406,8 @@ dma_subsys: bus@5a000000 { reg = <0x5a8d0000 0x10000>; interrupts = <GIC_SPI 235 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&can0_lpcg 1>, - <&can0_lpcg 0>; + clocks = <&can0_lpcg IMX_LPCG_CLK_4>, + <&can0_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "per"; assigned-clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <40000000>; @@ -427,8 +427,8 @@ dma_subsys: bus@5a000000 { * CAN1 shares CAN0's clock and to enable CAN0's clock it * has to be powered on. */ - clocks = <&can0_lpcg 1>, - <&can0_lpcg 0>; + clocks = <&can0_lpcg IMX_LPCG_CLK_4>, + <&can0_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "per"; assigned-clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <40000000>; @@ -448,8 +448,8 @@ dma_subsys: bus@5a000000 { * CAN2 shares CAN0's clock and to enable CAN0's clock it * has to be powered on. */ - clocks = <&can0_lpcg 1>, - <&can0_lpcg 0>; + clocks = <&can0_lpcg IMX_LPCG_CLK_4>, + <&can0_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "per"; assigned-clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <40000000>;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 81975080f14167610976e968e8016e92d836266f upstream.
adc0_lpcg: clock-controller@5ac80000 { ... Col1 Col2 clocks = <&clk IMX_SC_R_ADC_0 IMX_SC_PM_CLK_PER>, // 0 0 <&dma_ipg_clk>; // 1 4 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>; };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
adc0: adc@5a880000 { clocks = <&adc0_lpcg 0>, <&adc0_lpcg 1>; ^^ ^^ clocks = <&adc0_lpcg IMX_LPCG_CLK_0>, <&adc0_lpcg IMX_LPCG_CLK_4>;
Arg0 is divided by 4 in lpcg driver. So adc get IMX_SC_PM_CLK_PER by <&adc0_lpcg 0>, <&adc0_lpcg 1>. Although function can work, code logic is wrong. Fix it by using correct indices.
Cc: stable@vger.kernel.org Fixes: 1db044b25d2e ("arm64: dts: imx8dxl: add adc0 support") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi @@ -377,8 +377,8 @@ dma_subsys: bus@5a000000 { reg = <0x5a880000 0x10000>; interrupts = <GIC_SPI 240 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&adc0_lpcg 0>, - <&adc0_lpcg 1>; + clocks = <&adc0_lpcg IMX_LPCG_CLK_0>, + <&adc0_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_ADC_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>; @@ -392,8 +392,8 @@ dma_subsys: bus@5a000000 { reg = <0x5a890000 0x10000>; interrupts = <GIC_SPI 241 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&adc1_lpcg 0>, - <&adc1_lpcg 1>; + clocks = <&adc1_lpcg IMX_LPCG_CLK_0>, + <&adc1_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_ADC_1 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 808e7716edcdb39d3498b9f567ef6017858b49aa upstream.
usb2_lpcg: clock-controller@5b270000 { ... Col1 Col2 clocks = <&conn_ahb_clk>, <&conn_ipg_clk>; // 0 6 clock-indices = <IMX_LPCG_CLK_6>, <IMX_LPCG_CLK_7>; // 0 7 ... };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
usbotg1: usb@5b0d0000 { ... clocks = <&usb2_lpcg 0>; ^^ Should be: clocks = <&usb2_lpcg IMX_LPCG_CLK_6>; };
usbphy1: usbphy@5b100000 { clocks = <&usb2_lpcg 1>; ^^ SHould be: clocks = <&usb2_lpcg IMX_LPCG_CLK_7>; };
Arg0 is divided by 4 in lpcg driver. So lpcg will do dummy enable. Fix it by use correct clock indices.
Cc: stable@vger.kernel.org Fixes: 8065fc937f0f ("arm64: dts: imx8dxl: add usb1 and usb2 support") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi @@ -41,7 +41,7 @@ conn_subsys: bus@5b000000 { interrupts = <GIC_SPI 267 IRQ_TYPE_LEVEL_HIGH>; fsl,usbphy = <&usbphy1>; fsl,usbmisc = <&usbmisc1 0>; - clocks = <&usb2_lpcg 0>; + clocks = <&usb2_lpcg IMX_LPCG_CLK_6>; ahb-burst-config = <0x0>; tx-burst-size-dword = <0x10>; rx-burst-size-dword = <0x10>; @@ -58,7 +58,7 @@ conn_subsys: bus@5b000000 { usbphy1: usbphy@5b100000 { compatible = "fsl,imx7ulp-usbphy"; reg = <0x5b100000 0x1000>; - clocks = <&usb2_lpcg 1>; + clocks = <&usb2_lpcg IMX_LPCG_CLK_7>; power-domains = <&pd IMX_SC_R_USB_0_PHY>; status = "disabled"; };
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 9055d87bce7276234173fa90e9702af31b3f5353 upstream.
adma_pwm_lpcg: clock-controller@5a590000 { ... col1 col2 clocks = <&clk IMX_SC_R_LCD_0_PWM_0 IMX_SC_PM_CLK_PER>,// 0 0 <&dma_ipg_clk>; // 1 4 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>; ... };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
adma_pwm: pwm@5a190000 { ... clocks = <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>; ^^ ^^ Should be clocks = <&adma_pwm_lpcg IMX_LPCG_CLK_4>, <&adma_pwm_lpcg IMX_LPCG_CLK_0>; };
Arg0 will be divided by 4 in lcpg driver, so pwm will get IMX_SC_PM_CLK_PER by <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>. Although function can work, code logic is wrong. Fix it by use correct indices.
Cc: stable@vger.kernel.org Fixes: f1d6a6b991ef ("arm64: dts: imx8qxp: add adma_pwm in adma") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi @@ -144,8 +144,8 @@ dma_subsys: bus@5a000000 { compatible = "fsl,imx8qxp-pwm", "fsl,imx27-pwm"; reg = <0x5a190000 0x1000>; interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&adma_pwm_lpcg 1>, - <&adma_pwm_lpcg 0>; + clocks = <&adma_pwm_lpcg IMX_LPCG_CLK_4>, + <&adma_pwm_lpcg IMX_LPCG_CLK_0>; clock-names = "ipg", "per"; assigned-clocks = <&clk IMX_SC_R_LCD_0_PWM_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 1d86c2b3946e69d6b0b93568d312aae6247847c0 upstream.
lpcg's arg0 should use clock indices instead of index.
pwm0_lpcg: clock-controller@5d400000 { ... // Col1 Col2 clocks = <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 0 0 <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 1 1 <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 2 4 <&lsio_bus_clk>, // 3 5 <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>; // 4 6 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_1>, <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>, <IMX_LPCG_CLK_6>; };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
pwm1 { .... clocks = <&pwm1_lpcg 4>, <&pwm1_lpcg 1>; ^^ ^^ should be:
clocks = <&pwm1_lpcg IMX_LPCG_CLK_6>, <&pwm1_lpcg IMX_LPCG_CLK_1>; };
Arg0 is divided by 4 in lpcg driver, so index 0 and 1 will be get by pwm driver, which are same as IMX_LPCG_CLK_6 and IMX_LPCG_CLK_1. Even it can work, but code logic is wrong. Fixed it by use correct indices.
Cc: stable@vger.kernel.org Fixes: 23fa99b205ea ("arm64: dts: freescale: imx8-ss-lsio: add support for lsio_pwm0-3") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi @@ -25,8 +25,8 @@ lsio_subsys: bus@5d000000 { compatible = "fsl,imx27-pwm"; reg = <0x5d000000 0x10000>; clock-names = "ipg", "per"; - clocks = <&pwm0_lpcg 4>, - <&pwm0_lpcg 1>; + clocks = <&pwm0_lpcg IMX_LPCG_CLK_6>, + <&pwm0_lpcg IMX_LPCG_CLK_1>; assigned-clocks = <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>; #pwm-cells = <3>; @@ -38,8 +38,8 @@ lsio_subsys: bus@5d000000 { compatible = "fsl,imx27-pwm"; reg = <0x5d010000 0x10000>; clock-names = "ipg", "per"; - clocks = <&pwm1_lpcg 4>, - <&pwm1_lpcg 1>; + clocks = <&pwm1_lpcg IMX_LPCG_CLK_6>, + <&pwm1_lpcg IMX_LPCG_CLK_1>; assigned-clocks = <&clk IMX_SC_R_PWM_1 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>; #pwm-cells = <3>; @@ -51,8 +51,8 @@ lsio_subsys: bus@5d000000 { compatible = "fsl,imx27-pwm"; reg = <0x5d020000 0x10000>; clock-names = "ipg", "per"; - clocks = <&pwm2_lpcg 4>, - <&pwm2_lpcg 1>; + clocks = <&pwm2_lpcg IMX_LPCG_CLK_6>, + <&pwm2_lpcg IMX_LPCG_CLK_1>; assigned-clocks = <&clk IMX_SC_R_PWM_2 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>; #pwm-cells = <3>; @@ -64,8 +64,8 @@ lsio_subsys: bus@5d000000 { compatible = "fsl,imx27-pwm"; reg = <0x5d030000 0x10000>; clock-names = "ipg", "per"; - clocks = <&pwm3_lpcg 4>, - <&pwm3_lpcg 1>; + clocks = <&pwm3_lpcg IMX_LPCG_CLK_6>, + <&pwm3_lpcg IMX_LPCG_CLK_1>; assigned-clocks = <&clk IMX_SC_R_PWM_3 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <24000000>; #pwm-cells = <3>;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit f72b544a514c07d34a0d9d5380f5905b3731e647 upstream.
spi0_lpcg: clock-controller@5a400000 { ... Col0 Col1 clocks = <&clk IMX_SC_R_SPI_0 IMX_SC_PM_CLK_PER>,// 0 1 <&dma_ipg_clk>; // 1 4 clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>; };
Col1: index, which existing dts try to get. Col2: actual index in lpcg driver.
lpspi0: spi@5a000000 { ... clocks = <&spi0_lpcg 0>, <&spi0_lpcg 1>; ^ ^ Should be: clocks = <&spi0_lpcg IMX_LPCG_CLK_0>, <&spi0_lpcg IMX_LPCG_CLK_4>; };
Arg0 is divided by 4 in lpcg driver. <&spi0_lpcg 0> and <&spi0_lpcg 1> are IMX_SC_PM_CLK_PER. Although code can work, code logic is wrong. It should use IMX_LPCG_CLK_0 and IMX_LPCG_CLK_4 for lpcg arg0.
Cc: stable@vger.kernel.org Fixes: c4098885e790 ("arm64: dts: imx8dxl: add lpspi support") Signed-off-by: Frank Li Frank.Li@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi @@ -27,8 +27,8 @@ dma_subsys: bus@5a000000 { #size-cells = <0>; interrupts = <GIC_SPI 336 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&spi0_lpcg 0>, - <&spi0_lpcg 1>; + clocks = <&spi0_lpcg IMX_LPCG_CLK_0>, + <&spi0_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_SPI_0 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <60000000>; @@ -43,8 +43,8 @@ dma_subsys: bus@5a000000 { #size-cells = <0>; interrupts = <GIC_SPI 337 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&spi1_lpcg 0>, - <&spi1_lpcg 1>; + clocks = <&spi1_lpcg IMX_LPCG_CLK_0>, + <&spi1_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_SPI_1 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <60000000>; @@ -59,8 +59,8 @@ dma_subsys: bus@5a000000 { #size-cells = <0>; interrupts = <GIC_SPI 338 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&spi2_lpcg 0>, - <&spi2_lpcg 1>; + clocks = <&spi2_lpcg IMX_LPCG_CLK_0>, + <&spi2_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_SPI_2 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <60000000>; @@ -75,8 +75,8 @@ dma_subsys: bus@5a000000 { #size-cells = <0>; interrupts = <GIC_SPI 339 IRQ_TYPE_LEVEL_HIGH>; interrupt-parent = <&gic>; - clocks = <&spi3_lpcg 0>, - <&spi3_lpcg 1>; + clocks = <&spi3_lpcg IMX_LPCG_CLK_0>, + <&spi3_lpcg IMX_LPCG_CLK_4>; clock-names = "per", "ipg"; assigned-clocks = <&clk IMX_SC_R_SPI_3 IMX_SC_PM_CLK_PER>; assigned-clock-rates = <60000000>;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavin Shan gshan@redhat.com
commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream.
A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head!
Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch() returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit 275bf960ac697 ("vhost: better detection of available buffers").
Fixes: 275bf960ac69 ("vhost: better detection of available buffers") Cc: stable@kernel.org # v4.11+ Reported-by: Yihuang Yu yihyu@redhat.com Suggested-by: Will Deacon will@kernel.org Signed-off-by: Gavin Shan gshan@redhat.com Acked-by: Jason Wang jasowang@redhat.com Message-Id: 20240328002149.1141302-2-gshan@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/vhost.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2799,9 +2799,19 @@ bool vhost_vq_avail_empty(struct vhost_d r = vhost_get_avail_idx(vq, &avail_idx); if (unlikely(r)) return false; + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); + if (vq->avail_idx != vq->last_avail_idx) { + /* Since we have updated avail_idx, the following + * call to vhost_get_vq_desc() will read available + * ring entries. Make sure that read happens after + * the avail_idx read. + */ + smp_rmb(); + return false; + }
- return vq->avail_idx == vq->last_avail_idx; + return true; } EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gavin Shan gshan@redhat.com
commit df9ace7647d4123209395bb9967e998d5758c645 upstream.
A smp_rmb() has been missed in vhost_enable_notify(), inspired by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head!
Add the missed smp_rmb() in vhost_enable_notify(). When it returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()").
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: stable@kernel.org # v5.18+ Reported-by: Yihuang Yu yihyu@redhat.com Suggested-by: Will Deacon will@kernel.org Signed-off-by: Gavin Shan gshan@redhat.com Acked-by: Jason Wang jasowang@redhat.com Message-Id: 20240328002149.1141302-3-gshan@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/vhost.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2848,9 +2848,19 @@ bool vhost_enable_notify(struct vhost_de &vq->avail->idx, r); return false; } + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); + if (vq->avail_idx != vq->last_avail_idx) { + /* Since we have updated avail_idx, the following + * call to vhost_get_vq_desc() will read available + * ring entries. Make sure that read happens after + * the avail_idx read. + */ + smp_rmb(); + return true; + }
- return vq->avail_idx != vq->last_avail_idx; + return false; } EXPORT_SYMBOL_GPL(vhost_enable_notify);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
commit dec8ced871e17eea46f097542dd074d022be4bd1 upstream.
On x86 each struct cpu_hw_events maintains a table for counter assignment but it missed to update one for the deleted event in x86_pmu_del(). This can make perf_clear_dirty_counters() reset used counter if it's called before event scheduling or enabling. Then it would return out of range data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <linux/perf_event.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/syscall.h>
struct perf_event_attr attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .disabled = 1, };
void *worker(void *arg) { int cpu = (long)arg; int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); void *p;
do { ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0); p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0); ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0); munmap(p, 4096); ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0); } while (1);
return NULL; }
int main(void) { int i; int n = sysconf(_SC_NPROCESSORS_ONLN); pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, (void *)(long)i); for (i = 0; i < n; i++) pthread_join(th[i], NULL);
free(th); return 0; }
And you can see the out of range data using perf stat like this. Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread $ ./repro & $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }' 1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%) 1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%) 1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%) 2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%) 2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%) 2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%) 2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%) 3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%) 3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%) 3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%) 3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task") Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Kan Liang kan.liang@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1644,6 +1644,7 @@ static void x86_pmu_del(struct perf_even while (++i < cpuc->n_events) { cpuc->event_list[i-1] = cpuc->event_list[i]; cpuc->event_constraint[i-1] = cpuc->event_constraint[i]; + cpuc->assign[i-1] = cpuc->assign[i]; } cpuc->event_constraint[i-1] = NULL; --cpuc->n_events;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit f337a6a21e2fd67eadea471e93d05dd37baaa9be upstream.
Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly states that disabling SPECULATION_MITIGATIONS is supposed to turn off all mitigations by default.
│ If you say N, all mitigations will be disabled. You really │ should know what you are doing to say so.
As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs") Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Daniel Sneddon daniel.sneddon@linux.intel.com Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -3207,7 +3207,8 @@ enum cpu_mitigations { };
static enum cpu_mitigations cpu_mitigations __ro_after_init = - CPU_MITIGATIONS_AUTO; + IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO : + CPU_MITIGATIONS_OFF;
static int __init mitigations_parse_cmdline(char *arg) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit 6d029c25b71f2de2838a6f093ce0fa0e69336154 upstream.
check_timer_distribution() runs ten threads in a busy loop and tries to test that the kernel distributes a process posix CPU timer signal to every thread over time.
There is not guarantee that this is true even after commit bcb7ee79029d ("posix-timers: Prefer delivery of signals to the current thread") because that commit only avoids waking up the sleeping process leader thread, but that has nothing to do with the actual signal delivery.
As the signal is process wide the first thread which observes sigpending and wins the race to lock sighand will deliver the signal. Testing shows that this hangs on a regular base because some threads never win the race.
The comment "This primarily tests that the kernel does not favour any one." is wrong. The kernel does favour a thread which hits the timer interrupt when CLOCK_PROCESS_CPUTIME_ID expires.
Rewrite the test so it only checks that the group leader sleeping in join() never receives SIGALRM and the thread which burns CPU cycles receives all signals.
In older kernels which do not have commit bcb7ee79029d ("posix-timers: Prefer delivery of signals to the current thread") the test-case fails immediately, the very 1st tick wakes the leader up. Otherwise it quickly succeeds after 100 ticks.
CI testing wants to use newer selftest versions on stable kernels. In this case the test is guaranteed to fail.
So check in the failure case whether the kernel version is less than v6.3 and skip the test result in that case.
[ tglx: Massaged change log, renamed the version check helper ]
Fixes: e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads") Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240409133802.GD29396@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/kselftest.h | 13 +++ tools/testing/selftests/timers/posix_timers.c | 111 +++++++++++--------------- 2 files changed, 64 insertions(+), 60 deletions(-)
--- a/tools/testing/selftests/kselftest.h +++ b/tools/testing/selftests/kselftest.h @@ -50,6 +50,7 @@ #include <stdarg.h> #include <string.h> #include <stdio.h> +#include <sys/utsname.h> #endif
#ifndef ARRAY_SIZE @@ -343,4 +344,16 @@ static inline __printf(1, 2) int ksft_ex exit(KSFT_SKIP); }
+static inline int ksft_min_kernel_version(unsigned int min_major, + unsigned int min_minor) +{ + unsigned int major, minor; + struct utsname info; + + if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2) + ksft_exit_fail_msg("Can't parse kernel version\n"); + + return major > min_major || (major == min_major && minor >= min_minor); +} + #endif /* __KSELFTEST_H */ --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -184,80 +184,71 @@ static int check_timer_create(int which) return 0; }
-int remain; -__thread int got_signal; +static pthread_t ctd_thread; +static volatile int ctd_count, ctd_failed;
-static void *distribution_thread(void *arg) +static void ctd_sighandler(int sig) { - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); - return NULL; + if (pthread_self() != ctd_thread) + ctd_failed = 1; + ctd_count--; }
-static void distribution_handler(int nr) +static void *ctd_thread_func(void *arg) { - if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED)) - __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED); -} - -/* - * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID - * timer signals. This primarily tests that the kernel does not favour any one. - */ -static int check_timer_distribution(void) -{ - int err, i; - timer_t id; - const int nthreads = 10; - pthread_t threads[nthreads]; struct itimerspec val = { .it_value.tv_sec = 0, .it_value.tv_nsec = 1000 * 1000, .it_interval.tv_sec = 0, .it_interval.tv_nsec = 1000 * 1000, }; + timer_t id; + + /* 1/10 seconds to ensure the leader sleeps */ + usleep(10000); + + ctd_count = 100; + if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id)) + return "Can't create timer\n"; + if (timer_settime(id, 0, &val, NULL)) + return "Can't set timer\n"; + + while (ctd_count > 0 && !ctd_failed) + ; + + if (timer_delete(id)) + return "Can't delete timer\n"; + + return NULL; +} + +/* + * Test that only the running thread receives the timer signal. + */ +static int check_timer_distribution(void) +{ + const char *errmsg;
- remain = nthreads + 1; /* worker threads + this thread */ - signal(SIGALRM, distribution_handler); - err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); - if (err < 0) { - ksft_perror("Can't create timer"); - return -1; - } - err = timer_settime(id, 0, &val, NULL); - if (err < 0) { - ksft_perror("Can't set timer"); - return -1; - } - - for (i = 0; i < nthreads; i++) { - err = pthread_create(&threads[i], NULL, distribution_thread, - NULL); - if (err) { - ksft_print_msg("Can't create thread: %s (%d)\n", - strerror(errno), errno); - return -1; - } - } - - /* Wait for all threads to receive the signal. */ - while (__atomic_load_n(&remain, __ATOMIC_RELAXED)); - - for (i = 0; i < nthreads; i++) { - err = pthread_join(threads[i], NULL); - if (err) { - ksft_print_msg("Can't join thread: %s (%d)\n", - strerror(errno), errno); - return -1; - } - } - - if (timer_delete(id)) { - ksft_perror("Can't delete timer"); - return -1; - } + signal(SIGALRM, ctd_sighandler);
- ksft_test_result_pass("check_timer_distribution\n"); + errmsg = "Can't create thread\n"; + if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL)) + goto err; + + errmsg = "Can't join thread\n"; + if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg) + goto err; + + if (!ctd_failed) + ksft_test_result_pass("check signal distribution\n"); + else if (ksft_min_kernel_version(6, 3)) + ksft_test_result_fail("check signal distribution\n"); + else + ksft_test_result_skip("check signal distribution (old kernel)\n"); return 0; +err: + ksft_print_msg(errmsg); + return -1; }
int main(int argc, char **argv)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz jstultz@google.com
commit e4a6bceac98eba3c00e874892736b34ea5fdaca3 upstream.
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()") the following warning occurs when building with an older gcc:
posix_timers.c:250:2: warning: format not a string literal and no format arguments [-Wformat-security] 250 | ksft_print_msg(errmsg); | ^~~~~~~~~~~~~~
Fix this up by changing it to ksft_print_msg("%s", errmsg)
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()") Signed-off-by: John Stultz jstultz@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Justin Stitt justinstitt@google.com Acked-by: Shuah Khan skhan@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410232637.4135564-1-jstultz@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/timers/posix_timers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c index d86a0e00711e..348f47176e0a 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -247,7 +247,7 @@ static int check_timer_distribution(void) ksft_test_result_skip("check signal distribution (old kernel)\n"); return 0; err: - ksft_print_msg(errmsg); + ksft_print_msg("%s", errmsg); return -1; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz jstultz@google.com
commit ed366de8ec89d4f960d66c85fc37d9de22f7bf6d upstream.
Building with clang results in the following warning:
posix_timers.c:69:6: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter of type 'int' which may cause truncation of value [-Wabsolute-value] if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) { ^ So switch to using llabs() instead.
Fixes: 0bc4b0cf1570 ("selftests: add basic posix timers selftests") Signed-off-by: John Stultz jstultz@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/timers/posix_timers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -66,7 +66,7 @@ static int check_diff(struct timeval sta diff = end.tv_usec - start.tv_usec; diff += (end.tv_sec - start.tv_sec) * USECS_PER_SEC;
- if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) { + if (llabs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) { printf("Diff too high: %lld..", diff); return -1; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit f7d5bcd35d427daac7e206b1073ca14f5db85c27 upstream.
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized] 398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2) | ^~~~~~~~~~~~ tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here 401 | return major > min_major || (major == min_major && minor >= min_minor); | ^~~~~ tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false 398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2) | ^~~~~~~~~~~~~~~ tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning 395 | unsigned int major, minor; | ^ | = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg() will be called, which unconditionally calls exit(), a noreturn function. However, clang does not know that ksft_exit_fail_msg() will call exit() at the point in the pipeline that the warning is emitted because inlining has not occurred, so it assumes control flow will resume normally after ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit() unconditionally in kselftest.h are noreturn transitively by marking them explicitly with '__attribute__((__noreturn__))', which clears up the warning above and any future warnings that may appear for the same reason.
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()") Reported-by: John Stultz jstultz@google.com Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Shuah Khan skhan@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240411-mark-kselftest-exit-funcs-noreturn-v1-1-b... Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/kselftest.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/kselftest.h +++ b/tools/testing/selftests/kselftest.h @@ -79,6 +79,9 @@ #define KSFT_XPASS 3 #define KSFT_SKIP 4
+#ifndef __noreturn +#define __noreturn __attribute__((__noreturn__)) +#endif #define __printf(a, b) __attribute__((format(printf, a, b)))
/* counters */ @@ -255,13 +258,13 @@ static inline __printf(1, 2) void ksft_t va_end(args); }
-static inline int ksft_exit_pass(void) +static inline __noreturn int ksft_exit_pass(void) { ksft_print_cnts(); exit(KSFT_PASS); }
-static inline int ksft_exit_fail(void) +static inline __noreturn int ksft_exit_fail(void) { ksft_print_cnts(); exit(KSFT_FAIL); @@ -288,7 +291,7 @@ static inline int ksft_exit_fail(void) ksft_cnt.ksft_xfail + \ ksft_cnt.ksft_xskip)
-static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...) +static inline __noreturn __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...) { int saved_errno = errno; va_list args; @@ -303,19 +306,19 @@ static inline __printf(1, 2) int ksft_ex exit(KSFT_FAIL); }
-static inline int ksft_exit_xfail(void) +static inline __noreturn int ksft_exit_xfail(void) { ksft_print_cnts(); exit(KSFT_XFAIL); }
-static inline int ksft_exit_xpass(void) +static inline __noreturn int ksft_exit_xpass(void) { ksft_print_cnts(); exit(KSFT_XPASS); }
-static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...) +static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...) { int saved_errno = errno; va_list args;
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adam Dunlap acdunlap@google.com
commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
When done from a virtual machine, instructions that touch APIC memory must be emulated. By convention, MMIO accesses are typically performed via io.h helpers such as readl() or writeq() to simplify instruction emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention, allowing the compiler to emit instructions other than the MOV instruction generated by readl(). In particular, when the kernel is compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler would emit a TESTL instruction which is not supported by the SEV-ES emulator, causing a boot failure in that environment. It is likely the same problem would happen in a TDX guest as that uses the same instruction emulator as SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use the readl() function in native_apic_mem_read(). It is expected that any emulator would support MOV in any addressing mode as it is the most generic and is what is usually emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined into apic_mem_wait_icr_idle(). The emulator comes from insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to extend insn_decode_mmio() to support more instructions since, in theory, the compiler could choose to output nearly any instruction for such reads which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.in...
[ bp: Massage commit message, fix typos. ]
Signed-off-by: Adam Dunlap acdunlap@google.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ard Biesheuvel ardb@kernel.org Tested-by: Kevin Loughlin kevinloughlin@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/apic.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -13,6 +13,7 @@ #include <asm/mpspec.h> #include <asm/msr.h> #include <asm/hardirq.h> +#include <asm/io.h>
#define ARCH_APICTIMER_STOPS_ON_C3 1
@@ -96,7 +97,7 @@ static inline void native_apic_mem_write
static inline u32 native_apic_mem_read(u32 reg) { - return *((volatile u32 *)(APIC_BASE + reg)); + return readl((void __iomem *)(APIC_BASE + reg)); }
static inline void native_apic_mem_eoi(void)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit c1d11fc2c8320871b40730991071dd0a0b405bc8 upstream.
When building with 'make W=1' but CONFIG_TRACE_IRQFLAGS=n, the unused argument to lockdep_hrtimer_exit() causes a warning:
kernel/time/hrtimer.c:1655:14: error: variable 'expires_in_hardirq' set but not used [-Werror=unused-but-set-variable]
This is intentional behavior, so add a cast to void to shut up the warning.
Fixes: 73d20564e0dc ("hrtimer: Don't dereference the hrtimer pointer after the callback") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240408074609.3170807-1-arnd@kernel.org Closes: https://lore.kernel.org/oe-kbuild-all/202311191229.55QXHVc6-lkp@intel.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/irqflags.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -114,7 +114,7 @@ do { \ # define lockdep_softirq_enter() do { } while (0) # define lockdep_softirq_exit() do { } while (0) # define lockdep_hrtimer_enter(__hrtimer) false -# define lockdep_hrtimer_exit(__context) do { } while (0) +# define lockdep_hrtimer_exit(__context) do { (void)(__context); } while (0) # define lockdep_posixtimer_enter() do { } while (0) # define lockdep_posixtimer_exit() do { } while (0) # define lockdep_irq_work_enter(__work) do { } while (0)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit 16767502aa990cca2cb7d1372b31d328c4c85b40 upstream.
As Mark explains ksft_min_kernel_version() can't be compiled with nolibc, it doesn't implement uname().
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()") Reported-by: Mark Brown broonie@kernel.org Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20240412123536.GA32444@redhat.com Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/kselftest.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/kselftest.h +++ b/tools/testing/selftests/kselftest.h @@ -350,6 +350,10 @@ static inline __noreturn __printf(1, 2) static inline int ksft_min_kernel_version(unsigned int min_major, unsigned int min_minor) { +#ifdef NOLIBC + ksft_print_msg("NOLIBC: Can't check kernel version: Function not implemented\n"); + return 0; +#else unsigned int major, minor; struct utsname info;
@@ -357,6 +361,7 @@ static inline int ksft_min_kernel_versio ksft_exit_fail_msg("Can't parse kernel version\n");
return major > min_major || (major == min_major && minor >= min_minor); +#endif }
#endif /* __KSELFTEST_H */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
commit 16b52bbee4823b01ab7fe3919373c981a38f3797 upstream.
The writable file /sys/power/resume may call vfs lookup helpers for arbitrary paths and readonly files can be read by overlayfs from vfs helpers when sysfs is a lower layer of overalyfs.
To avoid a lockdep warning of circular dependency between overlayfs inode lock and kernfs of->mutex, use a different lockdep class for writable and readonly kernfs files.
Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/kernfs/file.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -634,11 +634,18 @@ static int kernfs_fop_open(struct inode * each file a separate locking class. Let's differentiate on * whether the file has mmap or not for now. * - * Both paths of the branch look the same. They're supposed to + * For similar reasons, writable and readonly files are given different + * lockdep key, because the writable file /sys/power/resume may call vfs + * lookup helpers for arbitrary paths and readonly files can be read by + * overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. + * + * All three cases look the same. They're supposed to * look that way and give @of->mutex different static lockdep keys. */ if (has_mmap) mutex_init(&of->mutex); + else if (file->f_mode & FMODE_WRITE) + mutex_init(&of->mutex); else mutex_init(&of->mutex);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Sneddon daniel.sneddon@linux.intel.com
commit 04f4230e2f86a4e961ea5466eda3db8c1762004d upstream.
The definition of spectre_bhi_state() incorrectly returns a const char * const. This causes the a compiler warning when building with W=1:
warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2812 | static const char * const spectre_bhi_state(void)
Remove the const qualifier from the pointer.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Reported-by: Sean Christopherson seanjc@google.com Signed-off-by: Daniel Sneddon daniel.sneddon@linux.intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.inte... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -2808,7 +2808,7 @@ static char *pbrsb_eibrs_state(void) } }
-static const char * const spectre_bhi_state(void) +static const char *spectre_bhi_state(void) { if (!boot_cpu_has_bug(X86_BUG_BHI)) return "; BHI: Not affected";
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit dfe648903f42296866d79f10d03f8c85c9dfba30 upstream.
Fix up some inaccuracies in the BHI documentation.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Nikolay Borisov nik.borisov@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/hw-vuln/spectre.rst | 15 ++++++++------- Documentation/admin-guide/kernel-parameters.txt | 12 +++++++----- 2 files changed, 15 insertions(+), 12 deletions(-)
--- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -439,11 +439,11 @@ The possible values in this file are: - System is protected by retpoline * - BHI: BHI_DIS_S - System is protected by BHI_DIS_S - * - BHI: SW loop; KVM SW loop + * - BHI: SW loop, KVM SW loop - System is protected by software clearing sequence * - BHI: Syscall hardening - Syscalls are hardened against BHI - * - BHI: Syscall hardening; KVM: SW loop + * - BHI: Syscall hardening, KVM: SW loop - System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
Full mitigation might require a microcode update from the CPU @@ -666,13 +666,14 @@ kernel command line. of the HW BHI control and the SW BHB clearing sequence.
on - unconditionally enable. + (default) Enable the HW or SW mitigation as + needed. off - unconditionally disable. + Disable the mitigation. auto - enable if hardware mitigation - control(BHI_DIS_S) is available, otherwise - enable alternate mitigation in KVM. + Enable the HW mitigation if needed, but + *don't* enable the SW mitigation except for KVM. + The system may be vulnerable.
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3419,6 +3419,7 @@ reg_file_data_sampling=off [X86] retbleed=off [X86] spec_store_bypass_disable=off [X86,PPC] + spectre_bhi=off [X86] spectre_v2_user=off [X86] srbds=off [X86,INTEL] ssbd=force-off [ARM64] @@ -6037,11 +6038,12 @@ deployment of the HW BHI control and the SW BHB clearing sequence.
- on - unconditionally enable. - off - unconditionally disable. - auto - (default) enable hardware mitigation - (BHI_DIS_S) if available, otherwise enable - alternate mitigation in KVM. + on - (default) Enable the HW or SW mitigation + as needed. + off - Disable the mitigation. + auto - Enable the HW mitigation if needed, but + *don't* enable the SW mitigation except + for KVM. The system may be vulnerable.
spectre_v2= [X86] Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability.
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit cb2db5bb04d7f778fbc1a1ea2507aab436f1bff3 upstream.
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and over. It's even read in the BHI sysfs function which is a big no-no. Just read it once and cache it.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Nikolay Borisov nik.borisov@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,6 +61,8 @@ EXPORT_SYMBOL_GPL(x86_spec_ctrl_current) u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB; EXPORT_SYMBOL_GPL(x86_pred_cmd);
+static u64 __ro_after_init ia32_cap; + static DEFINE_MUTEX(spec_ctrl_mutex);
void (*x86_return_thunk)(void) __ro_after_init = __x86_return_thunk; @@ -144,6 +146,8 @@ void __init cpu_select_mitigations(void) x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK; }
+ ia32_cap = x86_read_arch_cap_msr(); + /* Select the proper CPU mitigations before patching alternatives: */ spectre_v1_select_mitigation(); spectre_v2_select_mitigation(); @@ -301,8 +305,6 @@ static const char * const taa_strings[]
static void __init taa_select_mitigation(void) { - u64 ia32_cap; - if (!boot_cpu_has_bug(X86_BUG_TAA)) { taa_mitigation = TAA_MITIGATION_OFF; return; @@ -341,7 +343,6 @@ static void __init taa_select_mitigation * On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode * update is required. */ - ia32_cap = x86_read_arch_cap_msr(); if ( (ia32_cap & ARCH_CAP_MDS_NO) && !(ia32_cap & ARCH_CAP_TSX_CTRL_MSR)) taa_mitigation = TAA_MITIGATION_UCODE_NEEDED; @@ -401,8 +402,6 @@ static const char * const mmio_strings[]
static void __init mmio_select_mitigation(void) { - u64 ia32_cap; - if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) || boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) || cpu_mitigations_off()) { @@ -413,8 +412,6 @@ static void __init mmio_select_mitigatio if (mmio_mitigation == MMIO_MITIGATION_OFF) return;
- ia32_cap = x86_read_arch_cap_msr(); - /* * Enable CPU buffer clear mitigation for host and VMM, if also affected * by MDS or TAA. Otherwise, enable mitigation for VMM only. @@ -508,7 +505,7 @@ static void __init rfds_select_mitigatio if (rfds_mitigation == RFDS_MITIGATION_OFF) return;
- if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR) + if (ia32_cap & ARCH_CAP_RFDS_CLEAR) setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF); else rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED; @@ -659,8 +656,6 @@ void update_srbds_msr(void)
static void __init srbds_select_mitigation(void) { - u64 ia32_cap; - if (!boot_cpu_has_bug(X86_BUG_SRBDS)) return;
@@ -669,7 +664,6 @@ static void __init srbds_select_mitigati * are only exposed to SRBDS when TSX is enabled or when CPU is affected * by Processor MMIO Stale Data vulnerability. */ - ia32_cap = x86_read_arch_cap_msr(); if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) srbds_mitigation = SRBDS_MITIGATION_TSX_OFF; @@ -813,7 +807,7 @@ static void __init gds_select_mitigation /* Will verify below that mitigation _can_ be disabled */
/* No microcode */ - if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL)) { + if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) { if (gds_mitigation == GDS_MITIGATION_FORCE) { /* * This only needs to be done on the boot CPU so do it @@ -1907,8 +1901,6 @@ static void update_indir_branch_cond(voi /* Update the static key controlling the MDS CPU buffer clear in idle */ static void update_mds_branch_idle(void) { - u64 ia32_cap = x86_read_arch_cap_msr(); - /* * Enable the idle clearing if SMT is active on CPUs which are * affected only by MSBDS and not any other MDS variant. @@ -2817,7 +2809,7 @@ static const char *spectre_bhi_state(voi else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP)) return "; BHI: SW loop, KVM: SW loop"; else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && - !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA)) + !(ia32_cap & ARCH_CAP_RRSBA)) return "; BHI: Retpoline"; else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT)) return "; BHI: Syscall hardening, KVM: SW loop";
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@kernel.org
commit d0485730d2189ffe5d986d4e9e191f1e4d5ffd24 upstream.
So we are using the 'ia32_cap' value in a number of places, which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
But there's very little 'IA32' about it - this isn't 32-bit only code, nor does it originate from there, it's just a historic quirk that many Intel MSR names are prefixed with IA32_.
This is already clear from the helper method around the MSR: x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with its role and with the naming of the helper function.
Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Nikolay Borisov nik.borisov@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/apic/apic.c | 6 ++--- arch/x86/kernel/cpu/bugs.c | 30 +++++++++++++------------- arch/x86/kernel/cpu/common.c | 48 +++++++++++++++++++++---------------------- 3 files changed, 42 insertions(+), 42 deletions(-)
--- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1724,11 +1724,11 @@ static int x2apic_state;
static bool x2apic_hw_locked(void) { - u64 ia32_cap; + u64 x86_arch_cap_msr; u64 msr;
- ia32_cap = x86_read_arch_cap_msr(); - if (ia32_cap & ARCH_CAP_XAPIC_DISABLE) { + x86_arch_cap_msr = x86_read_arch_cap_msr(); + if (x86_arch_cap_msr & ARCH_CAP_XAPIC_DISABLE) { rdmsrl(MSR_IA32_XAPIC_DISABLE_STATUS, msr); return (msr & LEGACY_XAPIC_DISABLED); } --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,7 +61,7 @@ EXPORT_SYMBOL_GPL(x86_spec_ctrl_current) u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB; EXPORT_SYMBOL_GPL(x86_pred_cmd);
-static u64 __ro_after_init ia32_cap; +static u64 __ro_after_init x86_arch_cap_msr;
static DEFINE_MUTEX(spec_ctrl_mutex);
@@ -146,7 +146,7 @@ void __init cpu_select_mitigations(void) x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK; }
- ia32_cap = x86_read_arch_cap_msr(); + x86_arch_cap_msr = x86_read_arch_cap_msr();
/* Select the proper CPU mitigations before patching alternatives: */ spectre_v1_select_mitigation(); @@ -343,8 +343,8 @@ static void __init taa_select_mitigation * On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode * update is required. */ - if ( (ia32_cap & ARCH_CAP_MDS_NO) && - !(ia32_cap & ARCH_CAP_TSX_CTRL_MSR)) + if ( (x86_arch_cap_msr & ARCH_CAP_MDS_NO) && + !(x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR)) taa_mitigation = TAA_MITIGATION_UCODE_NEEDED;
/* @@ -434,7 +434,7 @@ static void __init mmio_select_mitigatio * be propagated to uncore buffers, clearing the Fill buffers on idle * is required irrespective of SMT state. */ - if (!(ia32_cap & ARCH_CAP_FBSDP_NO)) + if (!(x86_arch_cap_msr & ARCH_CAP_FBSDP_NO)) static_branch_enable(&mds_idle_clear);
/* @@ -444,10 +444,10 @@ static void __init mmio_select_mitigatio * FB_CLEAR or by the presence of both MD_CLEAR and L1D_FLUSH on MDS * affected systems. */ - if ((ia32_cap & ARCH_CAP_FB_CLEAR) || + if ((x86_arch_cap_msr & ARCH_CAP_FB_CLEAR) || (boot_cpu_has(X86_FEATURE_MD_CLEAR) && boot_cpu_has(X86_FEATURE_FLUSH_L1D) && - !(ia32_cap & ARCH_CAP_MDS_NO))) + !(x86_arch_cap_msr & ARCH_CAP_MDS_NO))) mmio_mitigation = MMIO_MITIGATION_VERW; else mmio_mitigation = MMIO_MITIGATION_UCODE_NEEDED; @@ -505,7 +505,7 @@ static void __init rfds_select_mitigatio if (rfds_mitigation == RFDS_MITIGATION_OFF) return;
- if (ia32_cap & ARCH_CAP_RFDS_CLEAR) + if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR) setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF); else rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED; @@ -664,7 +664,7 @@ static void __init srbds_select_mitigati * are only exposed to SRBDS when TSX is enabled or when CPU is affected * by Processor MMIO Stale Data vulnerability. */ - if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) && + if ((x86_arch_cap_msr & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) srbds_mitigation = SRBDS_MITIGATION_TSX_OFF; else if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) @@ -807,7 +807,7 @@ static void __init gds_select_mitigation /* Will verify below that mitigation _can_ be disabled */
/* No microcode */ - if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) { + if (!(x86_arch_cap_msr & ARCH_CAP_GDS_CTRL)) { if (gds_mitigation == GDS_MITIGATION_FORCE) { /* * This only needs to be done on the boot CPU so do it @@ -1540,14 +1540,14 @@ static enum spectre_v2_mitigation __init /* Disable in-kernel use of non-RSB RET predictors */ static void __init spec_ctrl_disable_kernel_rrsba(void) { - u64 ia32_cap; + u64 x86_arch_cap_msr;
if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL)) return;
- ia32_cap = x86_read_arch_cap_msr(); + x86_arch_cap_msr = x86_read_arch_cap_msr();
- if (ia32_cap & ARCH_CAP_RRSBA) { + if (x86_arch_cap_msr & ARCH_CAP_RRSBA) { x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S; update_spec_ctrl(x86_spec_ctrl_base); } @@ -1915,7 +1915,7 @@ static void update_mds_branch_idle(void) if (sched_smt_active()) { static_branch_enable(&mds_idle_clear); } else if (mmio_mitigation == MMIO_MITIGATION_OFF || - (ia32_cap & ARCH_CAP_FBSDP_NO)) { + (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO)) { static_branch_disable(&mds_idle_clear); } } @@ -2809,7 +2809,7 @@ static const char *spectre_bhi_state(voi else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP)) return "; BHI: SW loop, KVM: SW loop"; else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && - !(ia32_cap & ARCH_CAP_RRSBA)) + !(x86_arch_cap_msr & ARCH_CAP_RRSBA)) return "; BHI: Retpoline"; else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT)) return "; BHI: Syscall hardening, KVM: SW loop"; --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1327,25 +1327,25 @@ static bool __init cpu_matches(const str
u64 x86_read_arch_cap_msr(void) { - u64 ia32_cap = 0; + u64 x86_arch_cap_msr = 0;
if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap); + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, x86_arch_cap_msr);
- return ia32_cap; + return x86_arch_cap_msr; }
-static bool arch_cap_mmio_immune(u64 ia32_cap) +static bool arch_cap_mmio_immune(u64 x86_arch_cap_msr) { - return (ia32_cap & ARCH_CAP_FBSDP_NO && - ia32_cap & ARCH_CAP_PSDP_NO && - ia32_cap & ARCH_CAP_SBDR_SSDP_NO); + return (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO && + x86_arch_cap_msr & ARCH_CAP_PSDP_NO && + x86_arch_cap_msr & ARCH_CAP_SBDR_SSDP_NO); }
-static bool __init vulnerable_to_rfds(u64 ia32_cap) +static bool __init vulnerable_to_rfds(u64 x86_arch_cap_msr) { /* The "immunity" bit trumps everything else: */ - if (ia32_cap & ARCH_CAP_RFDS_NO) + if (x86_arch_cap_msr & ARCH_CAP_RFDS_NO) return false;
/* @@ -1353,7 +1353,7 @@ static bool __init vulnerable_to_rfds(u6 * indicate that mitigation is needed because guest is running on a * vulnerable hardware or may migrate to such hardware: */ - if (ia32_cap & ARCH_CAP_RFDS_CLEAR) + if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR) return true;
/* Only consult the blacklist when there is no enumeration: */ @@ -1362,11 +1362,11 @@ static bool __init vulnerable_to_rfds(u6
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) { - u64 ia32_cap = x86_read_arch_cap_msr(); + u64 x86_arch_cap_msr = x86_read_arch_cap_msr();
/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */ if (!cpu_matches(cpu_vuln_whitelist, NO_ITLB_MULTIHIT) && - !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO)) + !(x86_arch_cap_msr & ARCH_CAP_PSCHANGE_MC_NO)) setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
if (cpu_matches(cpu_vuln_whitelist, NO_SPECULATION)) @@ -1378,7 +1378,7 @@ static void __init cpu_set_bug_bits(stru setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
if (!cpu_matches(cpu_vuln_whitelist, NO_SSB) && - !(ia32_cap & ARCH_CAP_SSB_NO) && + !(x86_arch_cap_msr & ARCH_CAP_SSB_NO) && !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
@@ -1386,15 +1386,15 @@ static void __init cpu_set_bug_bits(stru * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature * flag and protect from vendor-specific bugs via the whitelist. */ - if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) { + if ((x86_arch_cap_msr & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) { setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED); if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) && - !(ia32_cap & ARCH_CAP_PBRSB_NO)) + !(x86_arch_cap_msr & ARCH_CAP_PBRSB_NO)) setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB); }
if (!cpu_matches(cpu_vuln_whitelist, NO_MDS) && - !(ia32_cap & ARCH_CAP_MDS_NO)) { + !(x86_arch_cap_msr & ARCH_CAP_MDS_NO)) { setup_force_cpu_bug(X86_BUG_MDS); if (cpu_matches(cpu_vuln_whitelist, MSBDS_ONLY)) setup_force_cpu_bug(X86_BUG_MSBDS_ONLY); @@ -1413,9 +1413,9 @@ static void __init cpu_set_bug_bits(stru * TSX_CTRL check alone is not sufficient for cases when the microcode * update is not present or running as guest that don't get TSX_CTRL. */ - if (!(ia32_cap & ARCH_CAP_TAA_NO) && + if (!(x86_arch_cap_msr & ARCH_CAP_TAA_NO) && (cpu_has(c, X86_FEATURE_RTM) || - (ia32_cap & ARCH_CAP_TSX_CTRL_MSR))) + (x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR))) setup_force_cpu_bug(X86_BUG_TAA);
/* @@ -1441,7 +1441,7 @@ static void __init cpu_set_bug_bits(stru * Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist, * nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits. */ - if (!arch_cap_mmio_immune(ia32_cap)) { + if (!arch_cap_mmio_immune(x86_arch_cap_msr)) { if (cpu_matches(cpu_vuln_blacklist, MMIO)) setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA); else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO)) @@ -1449,7 +1449,7 @@ static void __init cpu_set_bug_bits(stru }
if (!cpu_has(c, X86_FEATURE_BTC_NO)) { - if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA)) + if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (x86_arch_cap_msr & ARCH_CAP_RSBA)) setup_force_cpu_bug(X86_BUG_RETBLEED); }
@@ -1467,15 +1467,15 @@ static void __init cpu_set_bug_bits(stru * disabling AVX2. The only way to do this in HW is to clear XCR0[2], * which means that AVX will be disabled. */ - if (cpu_matches(cpu_vuln_blacklist, GDS) && !(ia32_cap & ARCH_CAP_GDS_NO) && + if (cpu_matches(cpu_vuln_blacklist, GDS) && !(x86_arch_cap_msr & ARCH_CAP_GDS_NO) && boot_cpu_has(X86_FEATURE_AVX)) setup_force_cpu_bug(X86_BUG_GDS);
- if (vulnerable_to_rfds(ia32_cap)) + if (vulnerable_to_rfds(x86_arch_cap_msr)) setup_force_cpu_bug(X86_BUG_RFDS);
/* When virtualized, eIBRS could be hidden, assume vulnerable */ - if (!(ia32_cap & ARCH_CAP_BHI_NO) && + if (!(x86_arch_cap_msr & ARCH_CAP_BHI_NO) && !cpu_matches(cpu_vuln_whitelist, NO_BHI) && (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) || boot_cpu_has(X86_FEATURE_HYPERVISOR))) @@ -1485,7 +1485,7 @@ static void __init cpu_set_bug_bits(stru return;
/* Rogue Data Cache Load? No! */ - if (ia32_cap & ARCH_CAP_RDCL_NO) + if (x86_arch_cap_msr & ARCH_CAP_RDCL_NO) return;
setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit 1cea8a280dfd1016148a3820676f2f03e3f5b898 upstream.
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been disabled by the Spectre v2 mitigation (or can otherwise be disabled by the BHI mitigation itself if needed). In that case retpolines are fine.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1537,20 +1537,25 @@ static enum spectre_v2_mitigation __init return SPECTRE_V2_RETPOLINE; }
+static bool __ro_after_init rrsba_disabled; + /* Disable in-kernel use of non-RSB RET predictors */ static void __init spec_ctrl_disable_kernel_rrsba(void) { - u64 x86_arch_cap_msr; + if (rrsba_disabled) + return;
- if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL)) + if (!(x86_arch_cap_msr & ARCH_CAP_RRSBA)) { + rrsba_disabled = true; return; + }
- x86_arch_cap_msr = x86_read_arch_cap_msr(); + if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL)) + return;
- if (x86_arch_cap_msr & ARCH_CAP_RRSBA) { - x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S; - update_spec_ctrl(x86_spec_ctrl_base); - } + x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S; + update_spec_ctrl(x86_spec_ctrl_base); + rrsba_disabled = true; }
static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) @@ -1651,9 +1656,11 @@ static void __init bhi_select_mitigation return;
/* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */ - if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) && - !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA)) - return; + if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) { + spec_ctrl_disable_kernel_rrsba(); + if (rrsba_disabled) + return; + }
if (spec_ctrl_bhi_dis()) return; @@ -2808,8 +2815,7 @@ static const char *spectre_bhi_state(voi return "; BHI: BHI_DIS_S"; else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP)) return "; BHI: SW loop, KVM: SW loop"; - else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && - !(x86_arch_cap_msr & ARCH_CAP_RRSBA)) + else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled) return "; BHI: Retpoline"; else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT)) return "; BHI: Syscall hardening, KVM: SW loop";
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit 5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a upstream.
While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/hw-vuln/spectre.rst | 11 +++++------ Documentation/admin-guide/kernel-parameters.txt | 3 +-- arch/x86/kernel/cpu/bugs.c | 6 +++--- 3 files changed, 9 insertions(+), 11 deletions(-)
--- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -441,10 +441,10 @@ The possible values in this file are: - System is protected by BHI_DIS_S * - BHI: SW loop, KVM SW loop - System is protected by software clearing sequence - * - BHI: Syscall hardening - - Syscalls are hardened against BHI - * - BHI: Syscall hardening, KVM: SW loop - - System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence + * - BHI: Vulnerable + - System is vulnerable to BHI + * - BHI: Vulnerable, KVM: SW loop + - System is vulnerable; KVM is protected by software clearing sequence
Full mitigation might require a microcode update from the CPU vendor. When the necessary microcode is not available, the kernel will @@ -661,8 +661,7 @@ kernel command line. spectre_bhi=
[X86] Control mitigation of Branch History Injection - (BHI) vulnerability. Syscalls are hardened against BHI - regardless of this setting. This setting affects the deployment + (BHI) vulnerability. This setting affects the deployment of the HW BHI control and the SW BHB clearing sequence.
on --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6033,8 +6033,7 @@ See Documentation/admin-guide/laptops/sonypi.rst
spectre_bhi= [X86] Control mitigation of Branch History Injection - (BHI) vulnerability. Syscalls are hardened against BHI - reglardless of this setting. This setting affects the + (BHI) vulnerability. This setting affects the deployment of the HW BHI control and the SW BHB clearing sequence.
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -2817,10 +2817,10 @@ static const char *spectre_bhi_state(voi return "; BHI: SW loop, KVM: SW loop"; else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled) return "; BHI: Retpoline"; - else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT)) - return "; BHI: Syscall hardening, KVM: SW loop"; + else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT)) + return "; BHI: Vulnerable, KVM: SW loop";
- return "; BHI: Vulnerable (Syscall hardening enabled)"; + return "; BHI: Vulnerable"; }
static ssize_t spectre_v2_show_state(char *buf)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit 36d4fe147c870f6d3f6602befd7ef44393a1c87a upstream.
Unlike most other mitigations' "auto" options, spectre_bhi=auto only mitigates newer systems, which is confusing and not particularly useful.
Remove it.
Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Nikolay Borisov nik.borisov@suse.com Cc: Sean Christopherson seanjc@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/hw-vuln/spectre.rst | 4 ---- Documentation/admin-guide/kernel-parameters.txt | 3 --- arch/x86/Kconfig | 4 ---- arch/x86/kernel/cpu/bugs.c | 10 +--------- 4 files changed, 1 insertion(+), 20 deletions(-)
--- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -669,10 +669,6 @@ kernel command line. needed. off Disable the mitigation. - auto - Enable the HW mitigation if needed, but - *don't* enable the SW mitigation except for KVM. - The system may be vulnerable.
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6040,9 +6040,6 @@ on - (default) Enable the HW or SW mitigation as needed. off - Disable the mitigation. - auto - Enable the HW mitigation if needed, but - *don't* enable the SW mitigation except - for KVM. The system may be vulnerable.
spectre_v2= [X86] Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability. --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2630,10 +2630,6 @@ config SPECTRE_BHI_OFF bool "off" help Equivalent to setting spectre_bhi=off command line parameter. -config SPECTRE_BHI_AUTO - bool "auto" - help - Equivalent to setting spectre_bhi=auto command line parameter.
endchoice
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1624,13 +1624,10 @@ static bool __init spec_ctrl_bhi_dis(voi enum bhi_mitigations { BHI_MITIGATION_OFF, BHI_MITIGATION_ON, - BHI_MITIGATION_AUTO, };
static enum bhi_mitigations bhi_mitigation __ro_after_init = - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : - IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF : - BHI_MITIGATION_AUTO; + IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
static int __init spectre_bhi_parse_cmdline(char *str) { @@ -1641,8 +1638,6 @@ static int __init spectre_bhi_parse_cmdl bhi_mitigation = BHI_MITIGATION_OFF; else if (!strcmp(str, "on")) bhi_mitigation = BHI_MITIGATION_ON; - else if (!strcmp(str, "auto")) - bhi_mitigation = BHI_MITIGATION_AUTO; else pr_err("Ignoring unknown spectre_bhi option (%s)", str);
@@ -1672,9 +1667,6 @@ static void __init bhi_select_mitigation setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT); pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
- if (bhi_mitigation == BHI_MITIGATION_AUTO) - return; - /* Mitigate syscalls when the mitigation is forced =on */ setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP); pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@kernel.org
commit 4f511739c54b549061993b53fc0380f48dfca23b upstream.
For consistency with the other CONFIG_MITIGATION_* options, replace the CONFIG_SPECTRE_BHI_{ON,OFF} options with a single CONFIG_MITIGATION_SPECTRE_BHI option.
[ mingo: Fix ]
Signed-off-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Sean Christopherson seanjc@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Nikolay Borisov nik.borisov@suse.com Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.171281347... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 17 +++-------------- arch/x86/kernel/cpu/bugs.c | 2 +- 2 files changed, 4 insertions(+), 15 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2612,27 +2612,16 @@ config MITIGATION_RFDS stored in floating point, vector and integer registers. See also file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
-choice - prompt "Clear branch history" +config MITIGATION_SPECTRE_BHI + bool "Mitigate Spectre-BHB (Branch History Injection)" depends on CPU_SUP_INTEL - default SPECTRE_BHI_ON + default y help Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks where the branch history buffer is poisoned to speculatively steer indirect branches. See file:Documentation/admin-guide/hw-vuln/spectre.rst
-config SPECTRE_BHI_ON - bool "on" - help - Equivalent to setting spectre_bhi=on command line parameter. -config SPECTRE_BHI_OFF - bool "off" - help - Equivalent to setting spectre_bhi=off command line parameter. - -endchoice - endif
config ARCH_HAS_ADD_PAGES --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1627,7 +1627,7 @@ enum bhi_mitigations { };
static enum bhi_mitigations bhi_mitigation __ro_after_init = - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF; + IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
static int __init spectre_bhi_parse_cmdline(char *str) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 7b1f6b5aaec0f849e19c3e99d4eea75876853cdd upstream.
Currently we always reprogram CDCLK from the intel_set_cdclk_pre_plane_update() when using squash/crawl. The code only works correctly for the cd2x update or full modeset cases, and it was simply never updated to deal with squash/crawl.
If the CDCLK frequency is increasing we must reprogram it before we do anything else that might depend on the new higher frequency, and conversely we must not decrease the frequency until everything that might still depend on the old higher frequency has been dealt with.
Since cdclk_state->pipe is only relevant when doing a cd2x update we can't use it to determine the correct sequence during squash/crawl. To that end introduce cdclk_state->disable_pipes which simply indicates that we must perform the update while the pipes are disable (ie. during intel_set_cdclk_pre_plane_update()). Otherwise we use the same old vs. new CDCLK frequency comparsiong as for cd2x updates.
The only remaining problem case is when the voltage_level needs to increase due to a DDI port, but the CDCLK frequency is decreasing (and not all pipes are being disabled). The current approach will not bump the voltage level up until after the port has already been enabled, which is too late. But we'll take care of that case separately.
v2: Don't break the "must disable pipes case" v3: Keep the on stack 'pipe' for future use
Cc: stable@vger.kernel.org Fixes: d62686ba3b54 ("drm/i915/adl_p: CDCLK crawl support for ADL") Reviewed-by: Uma Shankar uma.shankar@intel.com Reviewed-by: Gustavo Sousa gustavo.sousa@intel.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240402155016.13733-2-ville.s... (cherry picked from commit 3aecee90ac12a351905f12dda7643d5b0676d6ca) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_cdclk.c | 7 +++++-- drivers/gpu/drm/i915/display/intel_cdclk.h | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_cdclk.c +++ b/drivers/gpu/drm/i915/display/intel_cdclk.c @@ -2521,7 +2521,7 @@ intel_set_cdclk_pre_plane_update(struct if (IS_DG2(i915)) intel_cdclk_pcode_pre_notify(state);
- if (pipe == INVALID_PIPE || + if (new_cdclk_state->disable_pipes || old_cdclk_state->actual.cdclk <= new_cdclk_state->actual.cdclk) { drm_WARN_ON(&i915->drm, !new_cdclk_state->base.changed);
@@ -2553,7 +2553,7 @@ intel_set_cdclk_post_plane_update(struct if (IS_DG2(i915)) intel_cdclk_pcode_post_notify(state);
- if (pipe != INVALID_PIPE && + if (!new_cdclk_state->disable_pipes && old_cdclk_state->actual.cdclk > new_cdclk_state->actual.cdclk) { drm_WARN_ON(&i915->drm, !new_cdclk_state->base.changed);
@@ -3036,6 +3036,7 @@ static struct intel_global_state *intel_ return NULL;
cdclk_state->pipe = INVALID_PIPE; + cdclk_state->disable_pipes = false;
return &cdclk_state->base; } @@ -3214,6 +3215,8 @@ int intel_modeset_calc_cdclk(struct inte if (ret) return ret;
+ new_cdclk_state->disable_pipes = true; + drm_dbg_kms(&dev_priv->drm, "Modeset required for cdclk change\n"); } --- a/drivers/gpu/drm/i915/display/intel_cdclk.h +++ b/drivers/gpu/drm/i915/display/intel_cdclk.h @@ -51,6 +51,9 @@ struct intel_cdclk_state {
/* bitmask of active pipes */ u8 active_pipes; + + /* update cdclk with pipes disabled */ + bool disable_pipes; };
int intel_crtc_compute_min_cdclk(const struct intel_crtc_state *crtc_state);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit e3d4ead4d48c05355bd3b99c8162428f68c3c1a5 upstream.
Bigjoiner seem to be causing all kinds of grief to the PSR code currently. I don't believe there is any hardware issue but the code simply not handling this correctly. For now just disable PSR when bigjoiner is needed.
Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-3-ville.s... Reviewed-by: Arun R Murthy arun.r.mruthy@intel.com Acked-by: Jouni Högander jouni.hogander@intel.com Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com (cherry picked from commit 372fa0c79d3f289f813d8001e0a8a96d1011826c) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_psr.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_psr.c +++ b/drivers/gpu/drm/i915/display/intel_psr.c @@ -1368,6 +1368,17 @@ void intel_psr_compute_config(struct int return; }
+ /* + * FIXME figure out what is wrong with PSR+bigjoiner and + * fix it. Presumably something related to the fact that + * PSR is a transcoder level feature. + */ + if (crtc_state->bigjoiner_pipes) { + drm_dbg_kms(&dev_priv->drm, + "PSR disabled due to bigjoiner\n"); + return; + } + if (CAN_PANEL_REPLAY(intel_dp)) crtc_state->has_panel_replay = true; else
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 0653d501409eeb9f1deb7e4c12e4d0d2c9f1cba1 upstream.
The current modeset sequence can't handle port sync and bigjoiner at the same time. Refuse port sync when bigjoiner is needed, at least until we fix the modeset sequence.
v2: Add a FIXME (Vandite)
Cc: stable@vger.kernel.org Tested-by: Vidya Srinivas vidya.srinivas@intel.com Reviewed-by: Vandita Kulkarni vandita.kulkarni@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-4-ville.s... Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com (cherry picked from commit b37e1347b991459c38c56ec2476087854a4f720b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_ddi.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_ddi.c +++ b/drivers/gpu/drm/i915/display/intel_ddi.c @@ -4229,7 +4229,12 @@ static bool m_n_equal(const struct intel static bool crtcs_port_sync_compatible(const struct intel_crtc_state *crtc_state1, const struct intel_crtc_state *crtc_state2) { + /* + * FIXME the modeset sequence is currently wrong and + * can't deal with bigjoiner + port sync at the same time. + */ return crtc_state1->hw.active && crtc_state2->hw.active && + !crtc_state1->bigjoiner_pipes && !crtc_state2->bigjoiner_pipes && crtc_state1->output_types == crtc_state2->output_types && crtc_state1->output_format == crtc_state2->output_format && crtc_state1->lane_count == crtc_state2->lane_count &&
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 4a36e46df7aa781c756f09727d37dc2783f1ee75 upstream.
All joined pipes share the same transcoder/timing generator. Currently we just do the commits per-pipe, which doesn't really work if we need to change the timings at the same time. For now just disable live M/N updates when bigjoiner is needed.
Cc: stable@vger.kernel.org Tested-by: Vidya Srinivas vidya.srinivas@intel.com Reviewed-by: Arun R Murthy arun.r.murthy@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-5-ville.s... Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com (cherry picked from commit ef79820db723a2a7c229a7251c12859e7e25a247) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_dp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -2756,7 +2756,11 @@ intel_dp_drrs_compute_config(struct inte intel_panel_downclock_mode(connector, &pipe_config->hw.adjusted_mode); int pixel_clock;
- if (has_seamless_m_n(connector)) + /* + * FIXME all joined pipes share the same transcoder. + * Need to account for that when updating M/N live. + */ + if (has_seamless_m_n(connector) && !pipe_config->bigjoiner_pipes) pipe_config->update_m_n = true;
if (!can_enable_drrs(connector, pipe_config, downclock_mode)) {
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lijo Lazar lijo.lazar@amd.com
commit 8b2be55f4d6c1099d7f629b0ed7535a5be788c83 upstream.
For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized.
Signed-off-by: Lijo Lazar lijo.lazar@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Yang Wang kevinyang.wang@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/soc21.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -832,10 +832,35 @@ static int soc21_common_suspend(void *ha return soc21_common_hw_fini(adev); }
+static bool soc21_need_reset_on_resume(struct amdgpu_device *adev) +{ + u32 sol_reg1, sol_reg2; + + /* Will reset for the following suspend abort cases. + * 1) Only reset dGPU side. + * 2) S3 suspend got aborted and TOS is active. + */ + if (!(adev->flags & AMD_IS_APU) && adev->in_s3 && + !adev->suspend_complete) { + sol_reg1 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81); + msleep(100); + sol_reg2 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81); + + return (sol_reg1 != sol_reg2); + } + + return false; +} + static int soc21_common_resume(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ if (soc21_need_reset_on_resume(adev)) { + dev_info(adev->dev, "S3 suspend aborted, resetting..."); + soc21_asic_reset(adev); + } + return soc21_common_hw_init(adev); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 65ff8092e4802f96d87d3d7cde146961f5228265 upstream.
There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now.
Reviewed-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/soc21.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -450,10 +450,8 @@ static bool soc21_need_full_reset(struct { switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(11, 0, 0): - return amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__UMC); case IP_VERSION(11, 0, 2): case IP_VERSION(11, 0, 3): - return false; default: return true; }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Huang Tim.Huang@amd.com
commit bbca7f414ae9a12ea231cdbafd79c607e3337ea8 upstream.
The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA.
Signed-off-by: Tim Huang Tim.Huang@amd.com Reviewed-by: Yifan Zhang yifan1.zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -1630,7 +1630,7 @@ static void gfx_v11_0_setup_rb(struct am active_rb_bitmap |= (0x3 << (i * rb_bitmap_width_per_sa)); }
- active_rb_bitmap |= global_active_rb_bitmap; + active_rb_bitmap &= global_active_rb_bitmap; adev->gfx.config.backend_enable_mask = active_rb_bitmap; adev->gfx.config.num_rbs = hweight32(active_rb_bitmap); }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yifan Zhang yifan1.zhang@amd.com
commit 6dba20d23e85034901ccb765a7ca71199bcca4df upstream.
This patch to differentiate external rev id for gfx 11.5.0.
Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: Tim Huang Tim.Huang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/soc21.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -712,7 +712,10 @@ static int soc21_common_early_init(void AMD_PG_SUPPORT_VCN | AMD_PG_SUPPORT_JPEG | AMD_PG_SUPPORT_GFX_PG; - adev->external_rev_id = adev->rev_id + 0x1; + if (adev->rev_id == 0) + adev->external_rev_id = 0x1; + else + adev->external_rev_id = adev->rev_id + 0x10; break; default: /* FIXME: not supported yet */
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harry Wentland harry.wentland@amd.com
commit 9e61ef8d219877202d4ee51d0d2ad9072c99a262 upstream.
In order for display colorimetry to work correctly on DP displays we need to send the VSC SDP packet. We should only do so for panels with DPCD revision greater or equal to 1.4 as older receivers might have problems with it.
Cc: stable@vger.kernel.org Cc: Joshua Ashton joshua@froggi.es Cc: Xaver Hugl xaver.hugl@gmail.com Cc: Melissa Wen mwen@igalia.com Cc: Agustin Gutierrez Agustin.Gutierrez@amd.com Reviewed-by: Agustin Gutierrez agustin.gutierrez@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6257,7 +6257,9 @@ create_stream_for_sink(struct drm_connec if (stream->signal == SIGNAL_TYPE_HDMI_TYPE_A) mod_build_hf_vsif_infopacket(stream, &stream->vsp_infopacket);
- if (stream->link->psr_settings.psr_feature_enabled || stream->link->replay_settings.replay_feature_enabled) { + if (stream->signal == SIGNAL_TYPE_DISPLAY_PORT || + stream->signal == SIGNAL_TYPE_DISPLAY_PORT_MST || + stream->signal == SIGNAL_TYPE_EDP) { // // should decide stream support vsc sdp colorimetry capability // before building vsc info packet @@ -6267,7 +6269,8 @@ create_stream_for_sink(struct drm_connec stream->use_vsc_sdp_for_colorimetry = aconnector->dc_sink->is_vsc_sdp_colorimetry_supported; } else { - if (stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED) + if (stream->link->dpcd_caps.dpcd_rev.raw >= 0x14 && + stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED) stream->use_vsc_sdp_for_colorimetry = true; } if (stream->out_transfer_func->tf == TRANSFER_FUNCTION_GAMMA22)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harry Wentland harry.wentland@amd.com
commit c3e2a5f2da904a18661335e8be2b961738574998 upstream.
The previous check for the is_vsc_sdp_colorimetry_supported flag for MST sink signals did nothing. Simplify the code and use the same check for MST and SST.
Cc: stable@vger.kernel.org Reviewed-by: Agustin Gutierrez agustin.gutierrez@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6264,15 +6264,9 @@ create_stream_for_sink(struct drm_connec // should decide stream support vsc sdp colorimetry capability // before building vsc info packet // - stream->use_vsc_sdp_for_colorimetry = false; - if (aconnector->dc_sink->sink_signal == SIGNAL_TYPE_DISPLAY_PORT_MST) { - stream->use_vsc_sdp_for_colorimetry = - aconnector->dc_sink->is_vsc_sdp_colorimetry_supported; - } else { - if (stream->link->dpcd_caps.dpcd_rev.raw >= 0x14 && - stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED) - stream->use_vsc_sdp_for_colorimetry = true; - } + stream->use_vsc_sdp_for_colorimetry = stream->link->dpcd_caps.dpcd_rev.raw >= 0x14 && + stream->link->dpcd_caps.dprx_feature.bits.VSC_SDP_COLORIMETRY_SUPPORTED; + if (stream->out_transfer_func->tf == TRANSFER_FUNCTION_GAMMA22) tf = TRANSFER_FUNC_GAMMA_22; mod_build_vsc_infopacket(stream, &stream->vsc_infopacket, stream->output_color_space, tf);
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dillon Varone dillon.varone@amd.com
commit 953927587f37b731abdeabe46ad44a3b3ec67a52 upstream.
[WHY&HOW] We should not be recursively calling the manual trigger programming function when FAMS is not in use.
Cc: stable@vger.kernel.org Reviewed-by: Alvin Lee alvin.lee2@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Dillon Varone dillon.varone@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c @@ -267,9 +267,6 @@ static void optc32_setup_manual_trigger( OTG_V_TOTAL_MAX_SEL, 1, OTG_FORCE_LOCK_ON_EVENT, 0, OTG_SET_V_TOTAL_MIN_MASK, (1 << 1)); /* TRIGA */ - - // Setup manual flow control for EOF via TRIG_A - optc->funcs->setup_manual_trigger(optc); } }
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Hung alex.hung@amd.com
commit 2cc69a10d83180f3de9f5afe3a98e972b1453d4c upstream.
mode_config's max width x height is 4096x2160 and is higher than DWB's max resolution 3840x2160 which is returned instead.
Cc: stable@vger.kernel.org Reviewed-by: Harry Wentland harry.wentland@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Hung alex.hung@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c @@ -76,10 +76,8 @@ static int amdgpu_dm_wb_encoder_atomic_c
static int amdgpu_dm_wb_connector_get_modes(struct drm_connector *connector) { - struct drm_device *dev = connector->dev; - - return drm_add_modes_noedid(connector, dev->mode_config.max_width, - dev->mode_config.max_height); + /* Maximum resolution supported by DWB */ + return drm_add_modes_noedid(connector, 3840, 2160); }
static int amdgpu_dm_wb_prepare_job(struct drm_writeback_connector *wb_connector,
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenjing Liu wenjing.liu@amd.com
commit 81901d8d0472e9a19d294ae1dea76b950548195d upstream.
[why] In current implemenation ODM mode is only reset when the last plane is removed from dc state. For any dc validate we will always remove all current planes and add new planes. However when switching from no planes to 1 plane, ODM mode is not reset because no planes get removed. This has caused an issue where we kept ODM combine when it should have been remove when a plane is added. The change is to reset ODM mode when adding the first plane.
Cc: stable@vger.kernel.org Reviewed-by: Alvin Lee alvin.lee2@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Wenjing Liu wenjing.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/core/dc_state.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c @@ -436,6 +436,15 @@ bool dc_state_add_plane( goto out; }
+ if (stream_status->plane_count == 0 && dc->config.enable_windowed_mpo_odm) + /* ODM combine could prevent us from supporting more planes + * we will reset ODM slice count back to 1 when all planes have + * been removed to maximize the amount of planes supported when + * new planes are added. + */ + resource_update_pipes_for_stream_with_slice_count( + state, dc->current_state, dc->res_pool, stream, 1); + otg_master_pipe = resource_get_otg_master_for_stream( &state->res_ctx, stream); if (otg_master_pipe)
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fudongwang fudong.wang@amd.com
commit cf79814cb0bf5749b9f0db53ca231aa540c02768 upstream.
[Why] Wrong logic cause screen corruption.
[How] Port logic from DCN35/314.
Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Fudongwang fudong.wang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c | 19 ++++++---- 1 file changed, 12 insertions(+), 7 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c @@ -99,20 +99,25 @@ static int dcn316_get_active_display_cnt return display_count; }
-static void dcn316_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct dc_state *context, bool disable) +static void dcn316_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct dc_state *context, + bool safe_to_lower, bool disable) { struct dc *dc = clk_mgr_base->ctx->dc; int i;
for (i = 0; i < dc->res_pool->pipe_count; ++i) { - struct pipe_ctx *pipe = &dc->current_state->res_ctx.pipe_ctx[i]; + struct pipe_ctx *pipe = safe_to_lower + ? &context->res_ctx.pipe_ctx[i] + : &dc->current_state->res_ctx.pipe_ctx[i];
if (pipe->top_pipe || pipe->prev_odm_pipe) continue; - if (pipe->stream && (pipe->stream->dpms_off || pipe->plane_state == NULL || - dc_is_virtual_signal(pipe->stream->signal))) { + if (pipe->stream && (pipe->stream->dpms_off || dc_is_virtual_signal(pipe->stream->signal) || + !pipe->stream->link_enc)) { if (disable) { - pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg); + if (pipe->stream_res.tg && pipe->stream_res.tg->funcs->immediate_disable_crtc) + pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg); + reset_sync_context_for_pipe(dc, context, i); } else pipe->stream_res.tg->funcs->enable_crtc(pipe->stream_res.tg); @@ -207,11 +212,11 @@ static void dcn316_update_clocks(struct }
if (should_set_clock(safe_to_lower, new_clocks->dispclk_khz, clk_mgr_base->clks.dispclk_khz)) { - dcn316_disable_otg_wa(clk_mgr_base, context, true); + dcn316_disable_otg_wa(clk_mgr_base, context, safe_to_lower, true);
clk_mgr_base->clks.dispclk_khz = new_clocks->dispclk_khz; dcn316_smu_set_dispclk(clk_mgr, clk_mgr_base->clks.dispclk_khz); - dcn316_disable_otg_wa(clk_mgr_base, context, false); + dcn316_disable_otg_wa(clk_mgr_base, context, safe_to_lower, false);
update_dispclk = true; }
On 4/15/2024 7:18 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Mon, Apr 15, 2024 at 04:18:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Mon, Apr 15, 2024 at 04:18:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 4/15/24 7:18 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 15/04/2024 15:18, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
No new regressions for Tegra ...
Test results for stable-v6.8: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 106 tests: 105 pass, 1 fail
Linux version: 6.8.7-rc1-g367141eaada2 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Test failures: tegra194-p2972-0000: boot.py
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
[2024-04-15 16:18] Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
Hi, 6.8.7-rc1 is running fine on various x86_64 bare metal and virtual machines of mine (Haswell, Kaby Lake, Coffee Lake).
Regards Pascal
On Mon, 15 Apr 2024 at 19:54, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.8.7 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.8.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.8.7-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.8.y * git commit: 367141eaada2c7635234576914bcd1f480d62731 * git describe: v6.8.6-173-g367141eaada2 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.8.y/build/v6.8.6-...
## Test Regressions (compared to v6.8.6)
## Metric Regressions (compared to v6.8.6)
## Test Fixes (compared to v6.8.6)
## Metric Fixes (compared to v6.8.6)
## Test result summary total: 285130, pass: 247210, fail: 3619, skip: 33930, xfail: 371
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 133 total, 131 passed, 2 failed * arm64: 41 total, 41 passed, 0 failed * i386: 32 total, 32 passed, 0 failed * mips: 25 total, 25 passed, 0 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 36 passed, 0 failed * riscv: 19 total, 19 passed, 0 failed * s390: 13 total, 13 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 8 total, 6 passed, 2 failed * x86_64: 37 total, 36 passed, 1 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memory-hotplug * kselftest-mincore * kselftest-mm * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * libgpiod * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-io * ltp-io[ * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
linux-stable-mirror@lists.linaro.org