This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.140-rc1
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix pm notifier handling
Théo Lebrun theo.lebrun@bootlin.com spi: cadence-qspi: fix pointer reference in runtime PM hooks
Shigeru Yoshida syoshida@redhat.com ipv4: Fix uninit-value access in __ip_make_skb()
Shigeru Yoshida syoshida@redhat.com ipv6: Fix potential uninit-value access in __ip6_make_skb()
Shravya KN shravya.k-n@broadcom.com bnxt_en: Fix receive ring space parameters when XDP is active
Maciej S. Szmigiero mail@maciej.szmigiero.name platform/x86/amd/pmc: Only disable IRQ1 wakeup where i8042 actually enabled it
Mark Brown broonie@kernel.org arm64/sme: Always exit sme_alloc() early with existing storage
Florian Westphal fw@strlen.de netfilter: nf_tables: do not defer rule destruction via call_rcu
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: wait for rcu grace period on net_device removal
Florian Westphal fw@strlen.de netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx
Filipe Manana fdmanana@suse.com btrfs: don't BUG_ON() when 0 reference count at btrfs_lookup_extent_info()
Eric Dumazet edumazet@google.com sctp: add mutual exclusion in proc_sctp_do_udp_port()
Ma Wupeng mawupeng1@huawei.com hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Huacai Chen chenhuacai@kernel.org LoongArch: Explicitly specify code model in Makefile
Peter Collingbourne pcc@google.com bpf, arm64: Fix address emission with tag-based KASAN enabled
Puranjay Mohan puranjay@kernel.org bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
Xu Lu luxu.kernel@bytedance.com riscv: mm: Fix the out of bound issue of vmemmap address
Byungchul Park byungchul@sk.com mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Feng Tang feng.tang@linux.alibaba.com selftests/mm: compaction_test: support platform with huge mount of memory
GONG Ruiqi gongruiqi1@huawei.com usb: typec: fix pm usage counter imbalance in ucsi_ccg_sync_control()
Dan Carpenter dan.carpenter@linaro.org usb: typec: fix potential array underflow in ucsi_ccg_sync_control()
RD Babiera rdbabiera@google.com usb: typec: altmodes/displayport: create sysfs nodes as driver's default device attribute group
Andrei Kuchynski akuchynski@chromium.org usb: typec: ucsi: displayport: Fix deadlock
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_alloc
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing cleanups in cleanup internals
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing cleanup for early error out in idxd_setup_internals
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs
Yemike Abhilash Chandra y-abhilashchandra@ti.com dmaengine: ti: k3-udma: Use cap_mask directly from dma_device structure instead of a local copy
Ronald Wahl ronald.wahl@legrand.com dmaengine: ti: k3-udma: Add missing locking
Nathan Chancellor nathan@kernel.org net: qede: Initialize qede_ll_ops with designated initializer
Fedor Pchelkin pchelkin@ispras.ru wifi: mt76: disable napi on driver removal
Jethro Donaldson devel@jro.nz smb: client: fix memory leak during error handling for POSIX mkdir
Steve Siwinski ssiwinski@atto.com scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com phy: renesas: rcar-gen3-usb2: Set timing registers only once
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bind
Ma Ke make24@iscas.ac.cn phy: Fix error handling in tegra_xusb_port_init
Steven Rostedt rostedt@goodmis.org tracing: samples: Initialize trace_array_printk() with the correct function
pengdonglin pengdonglin@xiaomi.com ftrace: Fix preemption accounting for stacktrace filter command
pengdonglin pengdonglin@xiaomi.com ftrace: Fix preemption accounting for stacktrace trigger command
Michael Kelley mhklinux@outlook.com Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
Michael Kelley mhklinux@outlook.com Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
Michael Kelley mhklinux@outlook.com hv_netvsc: Remove rmsg_pgcnt
Michael Kelley mhklinux@outlook.com hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
Michael Kelley mhklinux@outlook.com hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
Hyejeong Choi hjeong.choi@samsung.com dma-buf: insert memory barrier before updating num_fences
Nicolas Chauvet kwizart@gmail.com ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera
Christian Heusel christian@heusel.eu ALSA: usb-audio: Add sample rate quirk for Audioengine D1
Wentao Liang vulab@iscas.ac.cn ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()
Jeremy Linton jeremy.linton@arm.com ACPI: PPTT: Fix processor subtable walk
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Avoid flooding unnecessary info messages
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Correct the reply value when AUX write incomplete
Filipe Manana fdmanana@suse.com btrfs: fix discard worker infinite loop after disabling discard
Huacai Chen chenhuacai@kernel.org LoongArch: Fix MAX_REG_OFFSET calculation
Nathan Lynch nathan.lynch@amd.com dmaengine: Revert "dmaengine: dmatest: Fix dmatest waiting less when interrupted"
Trond Myklebust trond.myklebust@hammerspace.com NFSv4/pnfs: Reset the layout state after a layoutreturn
Pengtao He hept.hept.hept@gmail.com net/tls: fix kernel panic when alloc_page failed
Subbaraya Sundeep sbhatta@marvell.com octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy
Cosmin Tanislav demonsingur@gmail.com regulator: max20086: fix invalid memory access
Abdun Nihaal abdun.nihaal@gmail.com qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd()
Carolina Jubran cjubran@nvidia.com net/mlx5e: Disable MACsec offload for uplink representor profile
Geert Uytterhoeven geert+renesas@glider.be ALSA: sh: SND_AICA should depend on SH_DMA_API
Keith Busch kbusch@kernel.org nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
Kees Cook kees@kernel.org nvme-pci: make nvme_pci_npages_prp() __always_inline
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENING
Mathieu Othacehe othacehe@gnu.org net: cadence: macb: Fix a possible deadlock in macb_halt_tx.
Andrew Jeffery andrew@codeconstruct.com.au net: mctp: Ensure keys maintain only one ref to corresponding dev
Cong Wang xiyou.wangcong@gmail.com net_sched: Flush gso_skb list too during ->change()
Geert Uytterhoeven geert+renesas@glider.be spi: loopback-test: Do not split 1024-byte hexdumps
Li Lingfeng lilingfeng3@huawei.com nfs: handle failure of nfs_get_lock_context in unlock path
Henry Martin bsdhenrymartin@gmail.com HID: uclogic: Add NULL check in uclogic_input_configured()
Qasim Ijaz qasdev00@gmail.com HID: thrustmaster: fix memory leak in thrustmaster_interrupts()
Zhu Yanjun yanjun.zhu@linux.dev RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug
Sebastian Andrzej Siewior bigeasy@linutronix.de clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
David Lechner dlechner@baylibre.com iio: chemical: sps30: use aligned_s64 for timestamp
Jonathan Cameron Jonathan.Cameron@huawei.com iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
Alex Deucher alexander.deucher@amd.com Revert "drm/amd: Stop evicting resources on APUs in suspend"
Mario Limonciello mario.limonciello@amd.com drm/amd: Add Suspend/Hibernate notification callback support
Zhigang Luo Zhigang.Luo@amd.com drm/amdgpu: trigger flr_work if reading pf2vf data failed
Ma Jun Jun.Ma2@amd.com drm/amdgpu: Fix the runtime resume failure issue
Mario Limonciello mario.limonciello@amd.com drm/amd: Stop evicting resources on APUs in suspend
Jonathan Cameron Jonathan.Cameron@huawei.com iio: adc: ad7266: Fix potential timestamp alignment issue.
Michal Suchanek msuchanek@suse.de tpm: tis: Double the timeout B to 4s
Masami Hiramatsu (Google) mhiramat@kernel.org tracing: probes: Fix a possible race in trace_probe_log APIs
Hans de Goede hdegoede@redhat.com platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection
Kees Cook kees@kernel.org binfmt_elf: Move brk for static PIE even if ASLR disabled
Kees Cook kees@kernel.org binfmt_elf: Honor PT_LOAD alignment for static PIE
Kees Cook kees@kernel.org binfmt_elf: Calculate total_size earlier
Kees Cook kees@kernel.org selftests/exec: Build both static and non-static load_address tests
Kees Cook keescook@chromium.org binfmt_elf: Leave a gap between .bss and brk
Muhammad Usama Anjum usama.anjum@collabora.com selftests/exec: load_address: conform test to TAP format output
Kees Cook keescook@chromium.org binfmt_elf: elf_bss no longer used by load_elf_binary()
Eric W. Biederman ebiederm@xmission.com binfmt_elf: Support segments with 0 filesz and misaligned starts
Kees Cook keescook@chromium.org binfmt: Fix whitespace issues
-------------
Diffstat:
Makefile | 4 +- arch/arm64/kernel/fpsimd.c | 6 +- arch/arm64/net/bpf_jit_comp.c | 12 +- arch/loongarch/Makefile | 2 +- arch/loongarch/include/asm/ptrace.h | 2 +- arch/riscv/include/asm/page.h | 1 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/mm/init.c | 17 +- block/bio.c | 2 +- drivers/acpi/pptt.c | 11 +- drivers/char/tpm/tpm_tis_core.h | 2 +- drivers/clocksource/i8253.c | 4 +- drivers/dma-buf/dma-resv.c | 5 +- drivers/dma/dmatest.c | 6 +- drivers/dma/idxd/init.c | 145 +++++++--- drivers/dma/ti/k3-udma.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 51 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 29 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 + drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 16 +- drivers/hid/hid-thrustmaster.c | 1 + drivers/hid/hid-uclogic-core.c | 7 +- drivers/hv/channel.c | 65 +---- drivers/iio/adc/ad7266.c | 2 +- drivers/iio/adc/ad7768-1.c | 2 +- drivers/iio/chemical/sps30.c | 2 +- drivers/infiniband/sw/rxe/rxe_cq.c | 5 +- drivers/net/dsa/sja1105/sja1105_main.c | 6 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +- drivers/net/ethernet/cadence/macb_main.c | 19 +- .../ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 + drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +- .../ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 7 +- drivers/net/hyperv/hyperv_net.h | 13 +- drivers/net/hyperv/netvsc.c | 57 +++- drivers/net/hyperv/netvsc_drv.c | 62 +---- drivers/net/hyperv/rndis_filter.c | 24 +- drivers/net/wireless/mediatek/mt76/dma.c | 1 + drivers/nvme/host/pci.c | 4 +- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 38 ++- drivers/phy/tegra/xusb.c | 8 +- drivers/platform/x86/amd/pmc.c | 8 +- drivers/platform/x86/asus-wmi.c | 3 +- drivers/regulator/max20086-regulator.c | 7 +- drivers/scsi/sd_zbc.c | 6 +- drivers/scsi/storvsc_drv.c | 1 + drivers/spi/spi-cadence-quadspi.c | 6 +- drivers/spi/spi-loopback-test.c | 2 +- drivers/usb/typec/altmodes/displayport.c | 18 +- drivers/usb/typec/ucsi/displayport.c | 19 +- drivers/usb/typec/ucsi/ucsi.c | 34 +++ drivers/usb/typec/ucsi/ucsi.h | 3 + drivers/usb/typec/ucsi/ucsi_ccg.c | 5 + fs/binfmt_elf.c | 293 ++++++++++++--------- fs/binfmt_elf_fdpic.c | 2 +- fs/btrfs/discard.c | 17 +- fs/btrfs/extent-tree.c | 25 +- fs/exec.c | 2 +- fs/nfs/nfs4proc.c | 9 +- fs/nfs/pnfs.c | 9 + fs/smb/client/smb2pdu.c | 2 +- include/linux/bio.h | 1 + include/linux/hyperv.h | 7 - include/linux/tpm.h | 2 +- include/net/netfilter/nf_tables.h | 3 +- include/net/sch_generic.h | 15 ++ include/uapi/linux/elf.h | 2 +- kernel/trace/trace_dynevent.c | 16 +- kernel/trace/trace_dynevent.h | 1 + kernel/trace/trace_events_trigger.c | 2 +- kernel/trace/trace_functions.c | 6 +- kernel/trace/trace_kprobe.c | 2 +- kernel/trace/trace_probe.c | 9 + kernel/trace/trace_uprobe.c | 2 +- mm/memory_hotplug.c | 6 +- mm/migrate.c | 8 + net/ipv4/ip_output.c | 3 +- net/ipv4/raw.c | 3 + net/ipv6/ip6_output.c | 3 +- net/mctp/route.c | 4 +- net/netfilter/nf_tables_api.c | 54 ++-- net/netfilter/nft_immediate.c | 2 +- net/sched/sch_codel.c | 2 +- net/sched/sch_fq.c | 2 +- net/sched/sch_fq_codel.c | 2 +- net/sched/sch_fq_pie.c | 2 +- net/sched/sch_hhf.c | 2 +- net/sched/sch_pie.c | 2 +- net/sctp/sysctl.c | 4 + net/tls/tls_strp.c | 3 +- samples/ftrace/sample-trace-array.c | 2 +- sound/pci/es1968.c | 6 +- sound/sh/Kconfig | 2 +- sound/usb/quirks.c | 4 + tools/testing/selftests/exec/Makefile | 19 +- tools/testing/selftests/exec/load_address.c | 83 ++++-- tools/testing/selftests/vm/compaction_test.c | 19 +- 103 files changed, 937 insertions(+), 530 deletions(-)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 8f6e3f9e5a0f58e458a348b7e36af11d0e9702af ]
Fix the annoying whitespace issues that have been following these files around for years.
Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Acked-by: Christian Brauner (Microsoft) brauner@kernel.org Link: https://lore.kernel.org/r/20221018071350.never.230-kees@kernel.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 14 +++++++------- fs/binfmt_elf_fdpic.c | 2 +- fs/exec.c | 2 +- include/uapi/linux/elf.h | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 89e7e4826efce..d1cf3d25da4b6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -248,7 +248,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, } while (0)
#ifdef ARCH_DLINFO - /* + /* * ARCH_DLINFO must come first so PPC can do its special alignment of * AUXV. * update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT() in @@ -1021,7 +1021,7 @@ static int load_elf_binary(struct linux_binprm *bprm) executable_stack); if (retval < 0) goto out_free_dentry; - + elf_bss = 0; elf_brk = 0;
@@ -1044,7 +1044,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
if (unlikely (elf_brk > elf_bss)) { unsigned long nbyte; - + /* There was a PT_LOAD segment with p_memsz > p_filesz before this one. Map anonymous pages, if needed, and clear the area. */ @@ -1522,7 +1522,7 @@ static void fill_elf_note_phdr(struct elf_phdr *phdr, int sz, loff_t offset) phdr->p_align = 0; }
-static void fill_note(struct memelfnote *note, const char *name, int type, +static void fill_note(struct memelfnote *note, const char *name, int type, unsigned int sz, void *data) { note->name = name; @@ -2005,8 +2005,8 @@ static int elf_dump_thread_status(long signr, struct elf_thread_status *t) t->num_notes = 0;
fill_prstatus(&t->prstatus.common, p, signr); - elf_core_copy_task_regs(p, &t->prstatus.pr_reg); - + elf_core_copy_task_regs(p, &t->prstatus.pr_reg); + fill_note(&t->notes[0], "CORE", NT_PRSTATUS, sizeof(t->prstatus), &(t->prstatus)); t->num_notes++; @@ -2296,7 +2296,7 @@ static int elf_core_dump(struct coredump_params *cprm) if (!elf_core_write_extra_phdrs(cprm, offset)) goto end_coredump;
- /* write out the notes section */ + /* write out the notes section */ if (!write_note_info(&info, cprm)) goto end_coredump;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index c71a409273150..b2d3b6e43bb56 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1603,7 +1603,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) if (!elf_core_write_extra_phdrs(cprm, offset)) goto end_coredump;
- /* write out the notes section */ + /* write out the notes section */ if (!writenote(thread_list->notes, cprm)) goto end_coredump; if (!writenote(&psinfo_note, cprm)) diff --git a/fs/exec.c b/fs/exec.c index 2039414cc6621..b65af8f9a4f9b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -169,7 +169,7 @@ SYSCALL_DEFINE1(uselib, const char __user *, library) exit: fput(file); out: - return error; + return error; } #endif /* #ifdef CONFIG_USELIB */
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index c7b056af9ef0a..3e5f07be26fc8 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -91,7 +91,7 @@ typedef __s64 Elf64_Sxword; #define DT_INIT 12 #define DT_FINI 13 #define DT_SONAME 14 -#define DT_RPATH 15 +#define DT_RPATH 15 #define DT_SYMBOLIC 16 #define DT_REL 17 #define DT_RELSZ 18
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 585a018627b4d7ed37387211f667916840b5c5ea ]
Implement a helper elf_load() that wraps elf_map() and performs all of the necessary work to ensure that when "memsz > filesz" the bytes described by "memsz > filesz" are zeroed.
An outstanding issue is if the first segment has filesz 0, and has a randomized location. But that is the same as today.
In this change I replaced an open coded padzero() that did not clear all of the way to the end of the page, with padzero() that does.
I also stopped checking the return of padzero() as there is at least one known case where testing for failure is the wrong thing to do. It looks like binfmt_elf_fdpic may have the proper set of tests for when error handling can be safely completed.
I found a couple of commits in the old history https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git, that look very interesting in understanding this code.
commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail") commit c6e2227e4a3e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c") commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2")
Looking at commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail"):
commit 39b56d902bf35241e7cba6cc30b828ed937175ad Author: Pavel Machek pavel@ucw.cz Date: Wed Feb 9 22:40:30 2005 -0800
[PATCH] binfmt_elf: clearing bss may fail So we discover that Borland's Kylix application builder emits weird elf files which describe a non-writeable bss segment. So remove the clear_user() check at the place where we zero out the bss. I don't _think_ there are any security implications here (plus we've never checked that clear_user() return value, so whoops if it is a problem). Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems pretty clear that binfmt_elf_fdpic with skipping clear_user() for non-writable segments and otherwise calling clear_user(), aka padzero(), and checking it's return code is the right thing to do.
I just skipped the error checking as that avoids breaking things.
And notably, it looks like Borland's Kylix died in 2005 so it might be safe to just consider read-only segments with memsz > filesz an error.
Reported-by: Sebastian Ott sebott@redhat.com Reported-by: Thomas Weißschuh linux@weissschuh.net Closes: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@weissschuh.ne... Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Link: https://lore.kernel.org/r/87sf71f123.fsf@email.froward.int.ebiederm.org Tested-by: Pedro Falcato pedro.falcato@gmail.com Signed-off-by: Sebastian Ott sebott@redhat.com Link: https://lore.kernel.org/r/20230929032435.2391507-1-keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 111 +++++++++++++++++++++--------------------------- 1 file changed, 48 insertions(+), 63 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index d1cf3d25da4b6..ea2968d343bb6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -109,25 +109,6 @@ static struct linux_binfmt elf_format = {
#define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
-static int set_brk(unsigned long start, unsigned long end, int prot) -{ - start = ELF_PAGEALIGN(start); - end = ELF_PAGEALIGN(end); - if (end > start) { - /* - * Map the last of the bss segment. - * If the header is requesting these pages to be - * executable, honour that (ppc32 needs this). - */ - int error = vm_brk_flags(start, end - start, - prot & PROT_EXEC ? VM_EXEC : 0); - if (error) - return error; - } - current->mm->start_brk = current->mm->brk = end; - return 0; -} - /* We need to explicitly zero any fractional pages after the data section (i.e. bss). This would contain the junk from the file that should not @@ -401,6 +382,51 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, return(map_addr); }
+static unsigned long elf_load(struct file *filep, unsigned long addr, + const struct elf_phdr *eppnt, int prot, int type, + unsigned long total_size) +{ + unsigned long zero_start, zero_end; + unsigned long map_addr; + + if (eppnt->p_filesz) { + map_addr = elf_map(filep, addr, eppnt, prot, type, total_size); + if (BAD_ADDR(map_addr)) + return map_addr; + if (eppnt->p_memsz > eppnt->p_filesz) { + zero_start = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) + + eppnt->p_filesz; + zero_end = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) + + eppnt->p_memsz; + + /* Zero the end of the last mapped page */ + padzero(zero_start); + } + } else { + map_addr = zero_start = ELF_PAGESTART(addr); + zero_end = zero_start + ELF_PAGEOFFSET(eppnt->p_vaddr) + + eppnt->p_memsz; + } + if (eppnt->p_memsz > eppnt->p_filesz) { + /* + * Map the last of the segment. + * If the header is requesting these pages to be + * executable, honour that (ppc32 needs this). + */ + int error; + + zero_start = ELF_PAGEALIGN(zero_start); + zero_end = ELF_PAGEALIGN(zero_end); + + error = vm_brk_flags(zero_start, zero_end - zero_start, + prot & PROT_EXEC ? VM_EXEC : 0); + if (error) + map_addr = error; + } + return map_addr; +} + + static unsigned long total_mapping_size(const struct elf_phdr *phdr, int nr) { elf_addr_t min_addr = -1; @@ -830,7 +856,6 @@ static int load_elf_binary(struct linux_binprm *bprm) struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL; struct elf_phdr *elf_property_phdata = NULL; unsigned long elf_bss, elf_brk; - int bss_prot = 0; int retval, i; unsigned long elf_entry; unsigned long e_entry; @@ -1042,33 +1067,6 @@ static int load_elf_binary(struct linux_binprm *bprm) if (elf_ppnt->p_type != PT_LOAD) continue;
- if (unlikely (elf_brk > elf_bss)) { - unsigned long nbyte; - - /* There was a PT_LOAD segment with p_memsz > p_filesz - before this one. Map anonymous pages, if needed, - and clear the area. */ - retval = set_brk(elf_bss + load_bias, - elf_brk + load_bias, - bss_prot); - if (retval) - goto out_free_dentry; - nbyte = ELF_PAGEOFFSET(elf_bss); - if (nbyte) { - nbyte = ELF_MIN_ALIGN - nbyte; - if (nbyte > elf_brk - elf_bss) - nbyte = elf_brk - elf_bss; - if (clear_user((void __user *)elf_bss + - load_bias, nbyte)) { - /* - * This bss-zeroing can fail if the ELF - * file specifies odd protections. So - * we don't check the return value - */ - } - } - } - elf_prot = make_prot(elf_ppnt->p_flags, &arch_state, !!interpreter, false);
@@ -1164,7 +1162,7 @@ static int load_elf_binary(struct linux_binprm *bprm) } }
- error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, + error = elf_load(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, total_size); if (BAD_ADDR(error)) { retval = IS_ERR((void *)error) ? @@ -1219,10 +1217,8 @@ static int load_elf_binary(struct linux_binprm *bprm) if (end_data < k) end_data = k; k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz; - if (k > elf_brk) { - bss_prot = elf_prot; + if (k > elf_brk) elf_brk = k; - } }
e_entry = elf_ex->e_entry + load_bias; @@ -1234,18 +1230,7 @@ static int load_elf_binary(struct linux_binprm *bprm) start_data += load_bias; end_data += load_bias;
- /* Calling set_brk effectively mmaps the pages that we need - * for the bss and break sections. We must do this before - * mapping in the interpreter, to make sure it doesn't wind - * up getting placed where the bss needs to go. - */ - retval = set_brk(elf_bss, elf_brk, bss_prot); - if (retval) - goto out_free_dentry; - if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) { - retval = -EFAULT; /* Nobody gets to see this, but.. */ - goto out_free_dentry; - } + current->mm->start_brk = current->mm->brk = ELF_PAGEALIGN(elf_brk);
if (interpreter) { elf_entry = load_elf_interp(interp_elf_ex,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 8ed2ef21ff564cf4a25c098ace510ee6513c9836 ]
With the BSS handled generically via the new filesz/memsz mismatch handling logic in elf_load(), elf_bss no longer needs to be tracked. Drop the variable.
Cc: Eric Biederman ebiederm@xmission.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Eric Biederman ebiederm@xmission.com Tested-by: Pedro Falcato pedro.falcato@gmail.com Signed-off-by: Sebastian Ott sebott@redhat.com Link: https://lore.kernel.org/r/20230929032435.2391507-2-keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index ea2968d343bb6..2672b9dca1af0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -855,7 +855,7 @@ static int load_elf_binary(struct linux_binprm *bprm) unsigned long error; struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL; struct elf_phdr *elf_property_phdata = NULL; - unsigned long elf_bss, elf_brk; + unsigned long elf_brk; int retval, i; unsigned long elf_entry; unsigned long e_entry; @@ -1047,7 +1047,6 @@ static int load_elf_binary(struct linux_binprm *bprm) if (retval < 0) goto out_free_dentry;
- elf_bss = 0; elf_brk = 0;
start_code = ~0UL; @@ -1210,8 +1209,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
- if (k > elf_bss) - elf_bss = k; if ((elf_ppnt->p_flags & PF_X) && end_code < k) end_code = k; if (end_data < k) @@ -1223,7 +1220,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
e_entry = elf_ex->e_entry + load_bias; phdr_addr += load_bias; - elf_bss += load_bias; elf_brk += load_bias; start_code += load_bias; end_code += load_bias;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Muhammad Usama Anjum usama.anjum@collabora.com
[ Upstream commit c4095067736b7ed50316a2bc7c9577941e87ad45 ]
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/r/20240304155928.1818928-2-usama.anjum@collabora.com Signed-off-by: Kees Cook keescook@chromium.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/exec/load_address.c | 34 +++++++++------------ 1 file changed, 15 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/exec/load_address.c b/tools/testing/selftests/exec/load_address.c index d487c2f6a6150..17e3207d34ae7 100644 --- a/tools/testing/selftests/exec/load_address.c +++ b/tools/testing/selftests/exec/load_address.c @@ -5,6 +5,7 @@ #include <link.h> #include <stdio.h> #include <stdlib.h> +#include "../kselftest.h"
struct Statistics { unsigned long long load_address; @@ -41,28 +42,23 @@ int main(int argc, char **argv) unsigned long long misalign; int ret;
+ ksft_print_header(); + ksft_set_plan(1); + ret = dl_iterate_phdr(ExtractStatistics, &extracted); - if (ret != 1) { - fprintf(stderr, "FAILED\n"); - return 1; - } + if (ret != 1) + ksft_exit_fail_msg("FAILED: dl_iterate_phdr\n");
- if (extracted.alignment == 0) { - fprintf(stderr, "No alignment found\n"); - return 1; - } else if (extracted.alignment & (extracted.alignment - 1)) { - fprintf(stderr, "Alignment is not a power of 2\n"); - return 1; - } + if (extracted.alignment == 0) + ksft_exit_fail_msg("FAILED: No alignment found\n"); + else if (extracted.alignment & (extracted.alignment - 1)) + ksft_exit_fail_msg("FAILED: Alignment is not a power of 2\n");
misalign = extracted.load_address & (extracted.alignment - 1); - if (misalign) { - printf("alignment = %llu, load_address = %llu\n", - extracted.alignment, extracted.load_address); - fprintf(stderr, "FAILED\n"); - return 1; - } + if (misalign) + ksft_exit_fail_msg("FAILED: alignment = %llu, load_address = %llu\n", + extracted.alignment, extracted.load_address);
- fprintf(stderr, "PASS\n"); - return 0; + ksft_test_result_pass("Completed\n"); + ksft_finished(); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 2a5eb9995528441447d33838727f6ec1caf08139 ]
Currently the brk starts its randomization immediately after .bss, which means there is a chance that when the random offset is 0, linear overflows from .bss can reach into the brk area. Leave at least a single page gap between .bss and brk (when it has not already been explicitly relocated into the mmap range).
Reported-by: y0un9n132@gmail.com Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7... Link: https://lore.kernel.org/r/20240217062545.1631668-2-keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 2672b9dca1af0..ff796910265da 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1294,6 +1294,9 @@ static int load_elf_binary(struct linux_binprm *bprm) if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && elf_ex->e_type == ET_DYN && !interpreter) { mm->brk = mm->start_brk = ELF_ET_DYN_BASE; + } else { + /* Otherwise leave a gap between .bss and brk. */ + mm->brk = mm->start_brk = mm->brk + PAGE_SIZE; }
mm->brk = mm->start_brk = arch_randomize_brk(mm);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit b57a2907c9d96c56494ef25f8ec821cd0b355dd6 ]
After commit 4d1cd3b2c5c1 ("tools/testing/selftests/exec: fix link error"), the load address alignment tests tried to build statically. This was silently ignored in some cases. However, after attempting to further fix the build by switching to "-static-pie", the test started failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static PIE") not doing alignment correctly, which remains unfixed[1]. See commit aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"") for more details.
Provide rules to build both static and non-static PIE binaries, improve debug reporting, and perform several test steps instead of a single all-or-nothing test. However, do not actually enable static-pie tests; alignment specification is only supported for ET_DYN with PT_INTERP ("regular PIE").
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1] Link: https://lore.kernel.org/r/20240508173149.677910-1-keescook@chromium.org Signed-off-by: Kees Cook kees@kernel.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/exec/Makefile | 19 +++--- tools/testing/selftests/exec/load_address.c | 67 +++++++++++++++++---- 2 files changed, 66 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile index a0b8688b08369..b54986078d7ea 100644 --- a/tools/testing/selftests/exec/Makefile +++ b/tools/testing/selftests/exec/Makefile @@ -3,8 +3,13 @@ CFLAGS = -Wall CFLAGS += -Wno-nonnull CFLAGS += -D_GNU_SOURCE
+ALIGNS := 0x1000 0x200000 0x1000000 +ALIGN_PIES := $(patsubst %,load_address.%,$(ALIGNS)) +ALIGN_STATIC_PIES := $(patsubst %,load_address.static.%,$(ALIGNS)) +ALIGNMENT_TESTS := $(ALIGN_PIES) + TEST_PROGS := binfmt_script.py -TEST_GEN_PROGS := execveat load_address_4096 load_address_2097152 load_address_16777216 non-regular +TEST_GEN_PROGS := execveat non-regular $(ALIGNMENT_TESTS) TEST_GEN_FILES := execveat.symlink execveat.denatured script subdir # Makefile is a run-time dependency, since it's accessed by the execveat test TEST_FILES := Makefile @@ -28,9 +33,9 @@ $(OUTPUT)/execveat.symlink: $(OUTPUT)/execveat $(OUTPUT)/execveat.denatured: $(OUTPUT)/execveat cp $< $@ chmod -x $@ -$(OUTPUT)/load_address_4096: load_address.c - $(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=0x1000 -pie -static $< -o $@ -$(OUTPUT)/load_address_2097152: load_address.c - $(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=0x200000 -pie -static $< -o $@ -$(OUTPUT)/load_address_16777216: load_address.c - $(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=0x1000000 -pie -static $< -o $@ +$(OUTPUT)/load_address.0x%: load_address.c + $(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=$(lastword $(subst ., ,$@)) \ + -fPIE -pie $< -o $@ +$(OUTPUT)/load_address.static.0x%: load_address.c + $(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=$(lastword $(subst ., ,$@)) \ + -fPIE -static-pie $< -o $@ diff --git a/tools/testing/selftests/exec/load_address.c b/tools/testing/selftests/exec/load_address.c index 17e3207d34ae7..8257fddba8c8d 100644 --- a/tools/testing/selftests/exec/load_address.c +++ b/tools/testing/selftests/exec/load_address.c @@ -5,11 +5,13 @@ #include <link.h> #include <stdio.h> #include <stdlib.h> +#include <stdbool.h> #include "../kselftest.h"
struct Statistics { unsigned long long load_address; unsigned long long alignment; + bool interp; };
int ExtractStatistics(struct dl_phdr_info *info, size_t size, void *data) @@ -26,11 +28,20 @@ int ExtractStatistics(struct dl_phdr_info *info, size_t size, void *data) stats->alignment = 0;
for (i = 0; i < info->dlpi_phnum; i++) { + unsigned long long align; + + if (info->dlpi_phdr[i].p_type == PT_INTERP) { + stats->interp = true; + continue; + } + if (info->dlpi_phdr[i].p_type != PT_LOAD) continue;
- if (info->dlpi_phdr[i].p_align > stats->alignment) - stats->alignment = info->dlpi_phdr[i].p_align; + align = info->dlpi_phdr[i].p_align; + + if (align > stats->alignment) + stats->alignment = align; }
return 1; // Terminate dl_iterate_phdr. @@ -38,27 +49,57 @@ int ExtractStatistics(struct dl_phdr_info *info, size_t size, void *data)
int main(int argc, char **argv) { - struct Statistics extracted; - unsigned long long misalign; + struct Statistics extracted = { }; + unsigned long long misalign, pow2; + bool interp_needed; + char buf[1024]; + FILE *maps; int ret;
ksft_print_header(); - ksft_set_plan(1); + ksft_set_plan(4); + + /* Dump maps file for debugging reference. */ + maps = fopen("/proc/self/maps", "r"); + if (!maps) + ksft_exit_fail_msg("FAILED: /proc/self/maps: %s\n", strerror(errno)); + while (fgets(buf, sizeof(buf), maps)) { + ksft_print_msg("%s", buf); + } + fclose(maps);
+ /* Walk the program headers. */ ret = dl_iterate_phdr(ExtractStatistics, &extracted); if (ret != 1) ksft_exit_fail_msg("FAILED: dl_iterate_phdr\n");
- if (extracted.alignment == 0) - ksft_exit_fail_msg("FAILED: No alignment found\n"); - else if (extracted.alignment & (extracted.alignment - 1)) - ksft_exit_fail_msg("FAILED: Alignment is not a power of 2\n"); + /* Report our findings. */ + ksft_print_msg("load_address=%#llx alignment=%#llx\n", + extracted.load_address, extracted.alignment); + + /* If we're named with ".static." we expect no INTERP. */ + interp_needed = strstr(argv[0], ".static.") == NULL; + + /* Were we built as expected? */ + ksft_test_result(interp_needed == extracted.interp, + "%s INTERP program header %s\n", + interp_needed ? "Wanted" : "Unwanted", + extracted.interp ? "seen" : "missing"); + + /* Did we find an alignment? */ + ksft_test_result(extracted.alignment != 0, + "Alignment%s found\n", extracted.alignment ? "" : " NOT"); + + /* Is the alignment sane? */ + pow2 = extracted.alignment & (extracted.alignment - 1); + ksft_test_result(pow2 == 0, + "Alignment is%s a power of 2: %#llx\n", + pow2 == 0 ? "" : " NOT", extracted.alignment);
+ /* Is the load address aligned? */ misalign = extracted.load_address & (extracted.alignment - 1); - if (misalign) - ksft_exit_fail_msg("FAILED: alignment = %llu, load_address = %llu\n", - extracted.alignment, extracted.load_address); + ksft_test_result(misalign == 0, "Load Address is %saligned (%#llx)\n", + misalign ? "MIS" : "", misalign);
- ksft_test_result_pass("Completed\n"); ksft_finished(); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 2d4cf7b190bbfadd4986bf5c34da17c1a88adf8e ]
In preparation to support PT_LOAD with large p_align values on non-PT_INTERP ET_DYN executables (i.e. "static pie"), we'll need to use the total_size details earlier. Move this separately now to make the next patch more readable. As total_size and load_bias are currently calculated separately, this has no behavioral impact.
Link: https://lore.kernel.org/r/20240508173149.677910-2-keescook@chromium.org Signed-off-by: Kees Cook kees@kernel.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 52 +++++++++++++++++++++++++------------------------ 1 file changed, 27 insertions(+), 25 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index ff796910265da..26abe256e91fb 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1093,7 +1093,34 @@ static int load_elf_binary(struct linux_binprm *bprm) * Header for ET_DYN binaries to calculate the * randomization (load_bias) for all the LOAD * Program Headers. + */ + + /* + * Calculate the entire size of the ELF mapping + * (total_size), used for the initial mapping, + * due to load_addr_set which is set to true later + * once the initial mapping is performed. + * + * Note that this is only sensible when the LOAD + * segments are contiguous (or overlapping). If + * used for LOADs that are far apart, this would + * cause the holes between LOADs to be mapped, + * running the risk of having the mapping fail, + * as it would be larger than the ELF file itself. * + * As a result, only ET_DYN does this, since + * some ET_EXEC (e.g. ia64) may have large virtual + * memory holes between LOADs. + * + */ + total_size = total_mapping_size(elf_phdata, + elf_ex->e_phnum); + if (!total_size) { + retval = -EINVAL; + goto out_free_dentry; + } + + /* * There are effectively two types of ET_DYN * binaries: programs (i.e. PIE: ET_DYN with INTERP) * and loaders (ET_DYN without INTERP, since they @@ -1134,31 +1161,6 @@ static int load_elf_binary(struct linux_binprm *bprm) * is then page aligned. */ load_bias = ELF_PAGESTART(load_bias - vaddr); - - /* - * Calculate the entire size of the ELF mapping - * (total_size), used for the initial mapping, - * due to load_addr_set which is set to true later - * once the initial mapping is performed. - * - * Note that this is only sensible when the LOAD - * segments are contiguous (or overlapping). If - * used for LOADs that are far apart, this would - * cause the holes between LOADs to be mapped, - * running the risk of having the mapping fail, - * as it would be larger than the ELF file itself. - * - * As a result, only ET_DYN does this, since - * some ET_EXEC (e.g. ia64) may have large virtual - * memory holes between LOADs. - * - */ - total_size = total_mapping_size(elf_phdata, - elf_ex->e_phnum); - if (!total_size) { - retval = -EINVAL; - goto out_free_dentry; - } }
error = elf_load(bprm->file, load_bias + vaddr, elf_ppnt,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 3545deff0ec7a37de7ed9632e262598582b140e9 ]
The p_align values in PT_LOAD were ignored for static PIE executables (i.e. ET_DYN without PT_INTERP). This is because there is no way to request a non-fixed mmap region with a specific alignment. ET_DYN with PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf performs the ASLR itself, which means it can also apply alignment. For the mmap region, the address selection happens deep within the vm_mmap() implementation (when the requested address is 0).
The earlier attempt to implement this:
commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") commit 925346c129da ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
did not take into account the different base address origins, and were eventually reverted:
aeb7923733d1 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"")
In order to get the correct alignment from an mmap base, binfmt_elf must perform a 0-address load first, then tear down the mapping and perform alignment on the resulting address. Since this is slightly more overhead, only do this when it is needed (i.e. the alignment is not the default ELF alignment). This does, however, have the benefit of being able to use MAP_FIXED_NOREPLACE, to avoid potential collisions.
With this fixed, enable the static PIE self tests again.
Reported-by: H.J. Lu hjl.tools@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Link: https://lore.kernel.org/r/20240508173149.677910-3-keescook@chromium.org Signed-off-by: Kees Cook kees@kernel.org Stable-dep-of: 11854fe263eb ("binfmt_elf: Move brk for static PIE even if ASLR disabled") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 42 +++++++++++++++++++++++---- tools/testing/selftests/exec/Makefile | 2 +- 2 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 26abe256e91fb..fd3d800f69c1c 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1120,10 +1120,13 @@ static int load_elf_binary(struct linux_binprm *bprm) goto out_free_dentry; }
+ /* Calculate any requested alignment. */ + alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum); + /* * There are effectively two types of ET_DYN - * binaries: programs (i.e. PIE: ET_DYN with INTERP) - * and loaders (ET_DYN without INTERP, since they + * binaries: programs (i.e. PIE: ET_DYN with PT_INTERP) + * and loaders (ET_DYN without PT_INTERP, since they * _are_ the ELF interpreter). The loaders must * be loaded away from programs since the program * may otherwise collide with the loader (especially @@ -1143,15 +1146,44 @@ static int load_elf_binary(struct linux_binprm *bprm) * without MAP_FIXED nor MAP_FIXED_NOREPLACE). */ if (interpreter) { + /* On ET_DYN with PT_INTERP, we do the ASLR. */ load_bias = ELF_ET_DYN_BASE; if (current->flags & PF_RANDOMIZE) load_bias += arch_mmap_rnd(); - alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum); + /* Adjust alignment as requested. */ if (alignment) load_bias &= ~(alignment - 1); elf_flags |= MAP_FIXED_NOREPLACE; - } else - load_bias = 0; + } else { + /* + * For ET_DYN without PT_INTERP, we rely on + * the architectures's (potentially ASLR) mmap + * base address (via a load_bias of 0). + * + * When a large alignment is requested, we + * must do the allocation at address "0" right + * now to discover where things will load so + * that we can adjust the resulting alignment. + * In this case (load_bias != 0), we can use + * MAP_FIXED_NOREPLACE to make sure the mapping + * doesn't collide with anything. + */ + if (alignment > ELF_MIN_ALIGN) { + load_bias = elf_load(bprm->file, 0, elf_ppnt, + elf_prot, elf_flags, total_size); + if (BAD_ADDR(load_bias)) { + retval = IS_ERR_VALUE(load_bias) ? + PTR_ERR((void*)load_bias) : -EINVAL; + goto out_free_dentry; + } + vm_munmap(load_bias, total_size); + /* Adjust alignment as requested. */ + if (alignment) + load_bias &= ~(alignment - 1); + elf_flags |= MAP_FIXED_NOREPLACE; + } else + load_bias = 0; + }
/* * Since load_bias is used for all subsequent loading diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile index b54986078d7ea..a705493c04bb7 100644 --- a/tools/testing/selftests/exec/Makefile +++ b/tools/testing/selftests/exec/Makefile @@ -6,7 +6,7 @@ CFLAGS += -D_GNU_SOURCE ALIGNS := 0x1000 0x200000 0x1000000 ALIGN_PIES := $(patsubst %,load_address.%,$(ALIGNS)) ALIGN_STATIC_PIES := $(patsubst %,load_address.static.%,$(ALIGNS)) -ALIGNMENT_TESTS := $(ALIGN_PIES) +ALIGNMENT_TESTS := $(ALIGN_PIES) $(ALIGN_STATIC_PIES)
TEST_PROGS := binfmt_script.py TEST_GEN_PROGS := execveat non-regular $(ALIGNMENT_TESTS)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 11854fe263eb1b9a8efa33b0c087add7719ea9b4 ]
In commit bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec"), the brk was moved out of the mmap region when loading static PIE binaries (ET_DYN without INTERP). The common case for these binaries was testing new ELF loaders, so the brk needed to be away from mmap to avoid colliding with stack, future mmaps (of the loader-loaded binary), etc. But this was only done when ASLR was enabled, in an attempt to minimize changes to memory layouts.
After adding support to respect alignment requirements for static PIE binaries in commit 3545deff0ec7 ("binfmt_elf: Honor PT_LOAD alignment for static PIE"), it became possible to have a large gap after the final PT_LOAD segment and the top of the mmap region. This means that future mmap allocations might go after the last PT_LOAD segment (where brk might be if ASLR was disabled) instead of before them (where they traditionally ended up).
On arm64, running with ASLR disabled, Ubuntu 22.04's "ldconfig" binary, a static PIE, has alignment requirements that leaves a gap large enough after the last PT_LOAD segment to fit the vdso and vvar, but still leave enough space for the brk (which immediately follows the last PT_LOAD segment) to be allocated by the binary.
fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426 /sbin/ldconfig.real fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426 /sbin/ldconfig.real fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 ***[brk will go here at fffff7ffa000]*** fffff7ffc000-fffff7ffe000 r--p 00000000 00:00 0 [vvar] fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0 [vdso] fffffffdf000-1000000000000 rw-p 00000000 00:00 0 [stack]
After commit 0b3bc3354eb9 ("arm64: vdso: Switch to generic storage implementation"), the arm64 vvar grew slightly, and suddenly the brk collided with the allocation.
fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426 /sbin/ldconfig.real fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426 /sbin/ldconfig.real fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 ***[oops, no room any more, vvar is at fffff7ffa000!]*** fffff7ffa000-fffff7ffe000 r--p 00000000 00:00 0 [vvar] fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0 [vdso] fffffffdf000-1000000000000 rw-p 00000000 00:00 0 [stack]
The solution is to unconditionally move the brk out of the mmap region for static PIE binaries. Whether ASLR is enabled or not does not change if there may be future mmap allocation collisions with a growing brk region.
Update memory layout comments (with kernel-doc headings), consolidate the setting of mm->brk to later (it isn't needed early), move static PIE brk out of mmap unconditionally, and make sure brk(2) knows to base brk position off of mm->start_brk not mm->end_data no matter what the cause of moving it is (via current->brk_randomized).
For the CONFIG_COMPAT_BRK case, though, leave the logic unchanged, as we can never safely move the brk. These systems, however, are not using specially aligned static PIE binaries.
Reported-by: Ryan Roberts ryan.roberts@arm.com Closes: https://lore.kernel.org/lkml/f93db308-4a0e-4806-9faf-98f890f5a5e6@arm.com/ Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec") Link: https://lore.kernel.org/r/20250425224502.work.520-kees@kernel.org Reviewed-by: Ryan Roberts ryan.roberts@arm.com Tested-by: Ryan Roberts ryan.roberts@arm.com Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_elf.c | 71 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 47 insertions(+), 24 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index fd3d800f69c1c..762704eed9ce9 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -856,6 +856,7 @@ static int load_elf_binary(struct linux_binprm *bprm) struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL; struct elf_phdr *elf_property_phdata = NULL; unsigned long elf_brk; + bool brk_moved = false; int retval, i; unsigned long elf_entry; unsigned long e_entry; @@ -1123,15 +1124,19 @@ static int load_elf_binary(struct linux_binprm *bprm) /* Calculate any requested alignment. */ alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
- /* - * There are effectively two types of ET_DYN - * binaries: programs (i.e. PIE: ET_DYN with PT_INTERP) - * and loaders (ET_DYN without PT_INTERP, since they - * _are_ the ELF interpreter). The loaders must - * be loaded away from programs since the program - * may otherwise collide with the loader (especially - * for ET_EXEC which does not have a randomized - * position). For example to handle invocations of + /** + * DOC: PIE handling + * + * There are effectively two types of ET_DYN ELF + * binaries: programs (i.e. PIE: ET_DYN with + * PT_INTERP) and loaders (i.e. static PIE: ET_DYN + * without PT_INTERP, usually the ELF interpreter + * itself). Loaders must be loaded away from programs + * since the program may otherwise collide with the + * loader (especially for ET_EXEC which does not have + * a randomized position). + * + * For example, to handle invocations of * "./ld.so someprog" to test out a new version of * the loader, the subsequent program that the * loader loads must avoid the loader itself, so @@ -1144,6 +1149,9 @@ static int load_elf_binary(struct linux_binprm *bprm) * ELF_ET_DYN_BASE and loaders are loaded into the * independently randomized mmap region (0 load_bias * without MAP_FIXED nor MAP_FIXED_NOREPLACE). + * + * See below for "brk" handling details, which is + * also affected by program vs loader and ASLR. */ if (interpreter) { /* On ET_DYN with PT_INTERP, we do the ASLR. */ @@ -1260,8 +1268,6 @@ static int load_elf_binary(struct linux_binprm *bprm) start_data += load_bias; end_data += load_bias;
- current->mm->start_brk = current->mm->brk = ELF_PAGEALIGN(elf_brk); - if (interpreter) { elf_entry = load_elf_interp(interp_elf_ex, interpreter, @@ -1317,27 +1323,44 @@ static int load_elf_binary(struct linux_binprm *bprm) mm->end_data = end_data; mm->start_stack = bprm->p;
- if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) { + /** + * DOC: "brk" handling + * + * For architectures with ELF randomization, when executing a + * loader directly (i.e. static PIE: ET_DYN without PT_INTERP), + * move the brk area out of the mmap region and into the unused + * ELF_ET_DYN_BASE region. Since "brk" grows up it may collide + * early with the stack growing down or other regions being put + * into the mmap region by the kernel (e.g. vdso). + * + * In the CONFIG_COMPAT_BRK case, though, everything is turned + * off because we're not allowed to move the brk at all. + */ + if (!IS_ENABLED(CONFIG_COMPAT_BRK) && + IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && + elf_ex->e_type == ET_DYN && !interpreter) { + elf_brk = ELF_ET_DYN_BASE; + /* This counts as moving the brk, so let brk(2) know. */ + brk_moved = true; + } + mm->start_brk = mm->brk = ELF_PAGEALIGN(elf_brk); + + if ((current->flags & PF_RANDOMIZE) && snapshot_randomize_va_space > 1) { /* - * For architectures with ELF randomization, when executing - * a loader directly (i.e. no interpreter listed in ELF - * headers), move the brk area out of the mmap region - * (since it grows up, and may collide early with the stack - * growing down), and into the unused ELF_ET_DYN_BASE region. + * If we didn't move the brk to ELF_ET_DYN_BASE (above), + * leave a gap between .bss and brk. */ - if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && - elf_ex->e_type == ET_DYN && !interpreter) { - mm->brk = mm->start_brk = ELF_ET_DYN_BASE; - } else { - /* Otherwise leave a gap between .bss and brk. */ + if (!brk_moved) mm->brk = mm->start_brk = mm->brk + PAGE_SIZE; - }
mm->brk = mm->start_brk = arch_randomize_brk(mm); + brk_moved = true; + } + #ifdef compat_brk_randomized + if (brk_moved) current->brk_randomized = 1; #endif - }
if (current->personality & MMAP_PAGE_ZERO) { /* Why this, you ask??? Well SVr4 maps page 0 as read-only,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit bfcfe6d335a967f8ea0c1980960e6f0205b5de6e ]
The wlan_ctrl_by_user detection was introduced by commit a50bd128f28c ("asus-wmi: record wlan status while controlled by userapp").
Quoting from that commit's commit message:
""" When you call WMIMethod(DSTS, 0x00010011) to get WLAN status, it may return
(1) 0x00050001 (On) (2) 0x00050000 (Off) (3) 0x00030001 (On) (4) 0x00030000 (Off) (5) 0x00000002 (Unknown)
(1), (2) means that the model has hardware GPIO for WLAN, you can call WMIMethod(DEVS, 0x00010011, 1 or 0) to turn WLAN on/off. (3), (4) means that the model doesn’t have hardware GPIO, you need to use API or driver library to turn WLAN on/off, and call WMIMethod(DEVS, 0x00010012, 1 or 0) to set WLAN LED status. After you set WLAN LED status, you can see the WLAN status is changed with WMIMethod(DSTS, 0x00010011). Because the status is recorded lastly (ex: Windows), you can use it for synchronization. (5) means that the model doesn’t have WLAN device.
WLAN is the ONLY special case with upper rule. """
The wlan_ctrl_by_user flag should be set on 0x0003000? ((3), (4) above) return values, but the flag mistakenly also gets set on laptops with 0x0005000? ((1), (2)) return values. This is causing rfkill problems on laptops where 0x0005000? is returned.
Fix the check to only set the wlan_ctrl_by_user flag for 0x0003000? return values.
Fixes: a50bd128f28c ("asus-wmi: record wlan status while controlled by userapp") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219786 Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Armin Wolf W_Armin@gmx.de Link: https://lore.kernel.org/r/20250501131702.103360-2-hdegoede@redhat.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-wmi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index 296150eaef929..33eacb4fc4c45 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -3804,7 +3804,8 @@ static int asus_wmi_add(struct platform_device *pdev) goto fail_leds;
asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_WLAN, &result); - if (result & (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT)) + if ((result & (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT)) == + (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT)) asus->driver->wlan_ctrl_by_user = 1;
if (!(asus->driver->wlan_ctrl_by_user && ashs_present())) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
[ Upstream commit fd837de3c9cb1a162c69bc1fb1f438467fe7f2f5 ]
Since the shared trace_probe_log variable can be accessed and modified via probe event create operation of kprobe_events, uprobe_events, and dynamic_events, it should be protected. In the dynamic_events, all operations are serialized by `dyn_event_ops_mutex`. But kprobe_events and uprobe_events interfaces are not serialized.
To solve this issue, introduces dyn_event_create(), which runs create() operation under the mutex, for kprobe_events and uprobe_events. This also uses lockdep to check the mutex is held when using trace_probe_log* APIs.
Link: https://lore.kernel.org/all/174684868120.551552.3068655787654268804.stgit@de...
Reported-by: Paul Cacheux paulcacheux@gmail.com Closes: https://lore.kernel.org/all/20250510074456.805a16872b591e2971a4d221@kernel.o... Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe events") Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_dynevent.c | 16 +++++++++++++++- kernel/trace/trace_dynevent.h | 1 + kernel/trace/trace_kprobe.c | 2 +- kernel/trace/trace_probe.c | 9 +++++++++ kernel/trace/trace_uprobe.c | 2 +- 5 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c index 4376887e0d8aa..c9b0533407ede 100644 --- a/kernel/trace/trace_dynevent.c +++ b/kernel/trace/trace_dynevent.c @@ -16,7 +16,7 @@ #include "trace_output.h" /* for trace_event_sem */ #include "trace_dynevent.h"
-static DEFINE_MUTEX(dyn_event_ops_mutex); +DEFINE_MUTEX(dyn_event_ops_mutex); static LIST_HEAD(dyn_event_ops_list);
bool trace_event_dyn_try_get_ref(struct trace_event_call *dyn_call) @@ -125,6 +125,20 @@ int dyn_event_release(const char *raw_command, struct dyn_event_operations *type return ret; }
+/* + * Locked version of event creation. The event creation must be protected by + * dyn_event_ops_mutex because of protecting trace_probe_log. + */ +int dyn_event_create(const char *raw_command, struct dyn_event_operations *type) +{ + int ret; + + mutex_lock(&dyn_event_ops_mutex); + ret = type->create(raw_command); + mutex_unlock(&dyn_event_ops_mutex); + return ret; +} + static int create_dyn_event(const char *raw_command) { struct dyn_event_operations *ops; diff --git a/kernel/trace/trace_dynevent.h b/kernel/trace/trace_dynevent.h index 936477a111d3e..beee3f8d75444 100644 --- a/kernel/trace/trace_dynevent.h +++ b/kernel/trace/trace_dynevent.h @@ -100,6 +100,7 @@ void *dyn_event_seq_next(struct seq_file *m, void *v, loff_t *pos); void dyn_event_seq_stop(struct seq_file *m, void *v); int dyn_events_release_all(struct dyn_event_operations *type); int dyn_event_release(const char *raw_command, struct dyn_event_operations *type); +int dyn_event_create(const char *raw_command, struct dyn_event_operations *type);
/* * for_each_dyn_event - iterate over the dyn_event list diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 72655d81b37d3..cc155c4117684 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -975,7 +975,7 @@ static int create_or_delete_trace_kprobe(const char *raw_command) if (raw_command[0] == '-') return dyn_event_release(raw_command, &trace_kprobe_ops);
- ret = trace_kprobe_create(raw_command); + ret = dyn_event_create(raw_command, &trace_kprobe_ops); return ret == -ECANCELED ? -EINVAL : ret; }
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index ba48b5e270e1f..3888a59c9dfe9 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -143,9 +143,12 @@ static const struct fetch_type *find_fetch_type(const char *type) }
static struct trace_probe_log trace_probe_log; +extern struct mutex dyn_event_ops_mutex;
void trace_probe_log_init(const char *subsystem, int argc, const char **argv) { + lockdep_assert_held(&dyn_event_ops_mutex); + trace_probe_log.subsystem = subsystem; trace_probe_log.argc = argc; trace_probe_log.argv = argv; @@ -154,11 +157,15 @@ void trace_probe_log_init(const char *subsystem, int argc, const char **argv)
void trace_probe_log_clear(void) { + lockdep_assert_held(&dyn_event_ops_mutex); + memset(&trace_probe_log, 0, sizeof(trace_probe_log)); }
void trace_probe_log_set_index(int index) { + lockdep_assert_held(&dyn_event_ops_mutex); + trace_probe_log.index = index; }
@@ -167,6 +174,8 @@ void __trace_probe_log_err(int offset, int err_type) char *command, *p; int i, len = 0, pos = 0;
+ lockdep_assert_held(&dyn_event_ops_mutex); + if (!trace_probe_log.argv) return;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index a6a3ff2a441ed..53ef3cb65098d 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -732,7 +732,7 @@ static int create_or_delete_trace_uprobe(const char *raw_command) if (raw_command[0] == '-') return dyn_event_release(raw_command, &trace_uprobe_ops);
- ret = trace_uprobe_create(raw_command); + ret = dyn_event_create(raw_command, &trace_uprobe_ops); return ret == -ECANCELED ? -EINVAL : ret; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Suchanek msuchanek@suse.de
[ Upstream commit 2f661f71fda1fc0c42b7746ca5b7da529eb6b5be ]
With some Infineon chips the timeouts in tpm_tis_send_data (both B and C) can reach up to about 2250 ms.
Timeout C is retried since commit de9e33df7762 ("tpm, tpm_tis: Workaround failed command reception on Infineon devices")
Timeout B still needs to be extended.
The problem is most commonly encountered with context related operation such as load context/save context. These are issued directly by the kernel, and there is no retry logic for them.
When a filesystem is set up to use the TPM for unlocking the boot fails, and restarting the userspace service is ineffective. This is likely because ignoring a load context/save context result puts the real TPM state and the TPM state expected by the kernel out of sync.
Chips known to be affected: tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) Description: SLB9672 Firmware Revision: 15.22
tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) Firmware Revision: 7.83
tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16) Firmware Revision: 5.63
Link: https://lore.kernel.org/linux-integrity/Z5pI07m0Muapyu9w@kitsune.suse.cz/ Signed-off-by: Michal Suchanek msuchanek@suse.de Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/tpm/tpm_tis_core.h | 2 +- include/linux/tpm.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h index be72681ab8ea2..5f29eebef52b8 100644 --- a/drivers/char/tpm/tpm_tis_core.h +++ b/drivers/char/tpm/tpm_tis_core.h @@ -53,7 +53,7 @@ enum tis_int_flags { enum tis_defaults { TIS_MEM_LEN = 0x5000, TIS_SHORT_TIMEOUT = 750, /* ms */ - TIS_LONG_TIMEOUT = 2000, /* 2 sec */ + TIS_LONG_TIMEOUT = 4000, /* 4 secs */ TIS_TIMEOUT_MIN_ATML = 14700, /* usecs */ TIS_TIMEOUT_MAX_ATML = 15000, /* usecs */ }; diff --git a/include/linux/tpm.h b/include/linux/tpm.h index dd0784a6e07d9..4a4112bb1d1b8 100644 --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -181,7 +181,7 @@ enum tpm2_const {
enum tpm2_timeouts { TPM2_TIMEOUT_A = 750, - TPM2_TIMEOUT_B = 2000, + TPM2_TIMEOUT_B = 4000, TPM2_TIMEOUT_C = 200, TPM2_TIMEOUT_D = 30, TPM2_DURATION_SHORT = 20,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Cameron Jonathan.Cameron@huawei.com
[ Upstream commit 52d349884738c346961e153f195f4c7fe186fcf4 ]
On architectures where an s64 is only 32-bit aligned insufficient padding would be left between the earlier elements and the timestamp. Use aligned_s64 to enforce the correct placement and ensure the storage is large enough.
Fixes: 54e018da3141 ("iio:ad7266: Mark transfer buffer as __be16") # aligned_s64 is much newer. Reported-by: David Lechner dlechner@baylibre.com Reviewed-by: Nuno Sá nuno.sa@analog.com Reviewed-by: David Lechner dlechner@baylibre.com Link: https://patch.msgid.link/20250413103443.2420727-2-jic23@kernel.org Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7266.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad7266.c b/drivers/iio/adc/ad7266.c index 98648c679a55c..2ace3aafe4978 100644 --- a/drivers/iio/adc/ad7266.c +++ b/drivers/iio/adc/ad7266.c @@ -44,7 +44,7 @@ struct ad7266_state { */ struct { __be16 sample[2]; - s64 timestamp; + aligned_s64 timestamp; } data __aligned(IIO_DMA_MINALIGN); };
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 226db36032c61d8717dfdd052adac351b22d3e83 ]
commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well.
Explicitly set s0ix or s3 in the prepare() stage, and unset them if the prepare() stage failed.
v2: squash in warning fix from Stephen Rothwell
Reported-by: Jürg Billeter j@bitron.ch Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 15 +++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++++++-- 3 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index dd22d2559720c..705703bab2fdd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1408,9 +1408,11 @@ static inline int amdgpu_acpi_smart_shift_update(struct drm_device *dev, #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND) bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev); bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev); +void amdgpu_choose_low_power_state(struct amdgpu_device *adev); #else static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { return false; } static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) { return false; } +static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { } #endif
#if defined(CONFIG_DRM_AMD_DC) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c index 5fa7f6d8aa309..70d761f79770d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c @@ -1118,4 +1118,19 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) #endif /* CONFIG_AMD_PMC */ }
+/** + * amdgpu_choose_low_power_state + * + * @adev: amdgpu_device_pointer + * + * Choose the target low power state for the GPU + */ +void amdgpu_choose_low_power_state(struct amdgpu_device *adev) +{ + if (amdgpu_acpi_is_s0ix_active(adev)) + adev->in_s0ix = true; + else if (amdgpu_acpi_is_s3_active(adev)) + adev->in_s3 = true; +} + #endif /* CONFIG_SUSPEND */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index fcd0c61499f89..52f4ef4e8b4b0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4195,13 +4195,15 @@ int amdgpu_device_prepare(struct drm_device *dev) struct amdgpu_device *adev = drm_to_adev(dev); int i, r;
+ amdgpu_choose_low_power_state(adev); + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0;
/* Evict the majority of BOs before starting suspend sequence */ r = amdgpu_device_evict_resources(adev); if (r) - return r; + goto unprepare;
flush_delayed_work(&adev->gfx.gfx_off_delay_work);
@@ -4212,10 +4214,15 @@ int amdgpu_device_prepare(struct drm_device *dev) continue; r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev); if (r) - return r; + goto unprepare; }
return 0; + +unprepare: + adev->in_s0ix = adev->in_s3 = false; + + return r; }
/**
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Jun Jun.Ma2@amd.com
[ Upstream commit bbfaf2aea7164db59739728d62d9cc91d64ff856 ]
Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue.
Fixes: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun Jun.Ma2@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c index 70d761f79770d..46916680f044b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c @@ -1127,6 +1127,9 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) */ void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { + if (adev->in_runpm) + return; + if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = true; else if (amdgpu_acpi_is_s3_active(adev))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhigang Luo Zhigang.Luo@amd.com
[ Upstream commit ab66c832847fcdffc97d4591ba5547e3990d9d33 ]
if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue.
also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout.
Signed-off-by: Zhigang Luo Zhigang.Luo@amd.com Acked-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 29 ++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 +++ drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 ++ drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 ++ 5 files changed, 41 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 52f4ef4e8b4b0..aac0e49f77c7f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -139,6 +139,8 @@ const char *amdgpu_asic_name[] = { "LAST", };
+static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev); + /** * DOC: pcie_replay_count * @@ -4638,6 +4640,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, retry: amdgpu_amdkfd_pre_reset(adev);
+ amdgpu_device_stop_pending_resets(adev); + if (from_hypervisor) r = amdgpu_virt_request_full_gpu(adev, true); else @@ -5509,11 +5513,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, tmp_adev->asic_reset_res = r; }
- /* - * Drop all pending non scheduler resets. Scheduler resets - * were already dropped during drm_sched_stop - */ - amdgpu_device_stop_pending_resets(tmp_adev); + if (!amdgpu_sriov_vf(tmp_adev)) + /* + * Drop all pending non scheduler resets. Scheduler resets + * were already dropped during drm_sched_stop + */ + amdgpu_device_stop_pending_resets(tmp_adev); }
tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter)); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index d7b76a3d2d558..6174bef0ecb88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -32,6 +32,7 @@
#include "amdgpu.h" #include "amdgpu_ras.h" +#include "amdgpu_reset.h" #include "vi.h" #include "soc15.h" #include "nv.h" @@ -456,7 +457,7 @@ static int amdgpu_virt_read_pf2vf_data(struct amdgpu_device *adev) return -EINVAL;
if (pf2vf_info->size > 1024) { - DRM_ERROR("invalid pf2vf message size\n"); + dev_err(adev->dev, "invalid pf2vf message size: 0x%x\n", pf2vf_info->size); return -EINVAL; }
@@ -467,7 +468,9 @@ static int amdgpu_virt_read_pf2vf_data(struct amdgpu_device *adev) adev->virt.fw_reserve.p_pf2vf, pf2vf_info->size, adev->virt.fw_reserve.checksum_key, checksum); if (checksum != checkval) { - DRM_ERROR("invalid pf2vf message\n"); + dev_err(adev->dev, + "invalid pf2vf message: header checksum=0x%x calculated checksum=0x%x\n", + checksum, checkval); return -EINVAL; }
@@ -481,7 +484,9 @@ static int amdgpu_virt_read_pf2vf_data(struct amdgpu_device *adev) adev->virt.fw_reserve.p_pf2vf, pf2vf_info->size, 0, checksum); if (checksum != checkval) { - DRM_ERROR("invalid pf2vf message\n"); + dev_err(adev->dev, + "invalid pf2vf message: header checksum=0x%x calculated checksum=0x%x\n", + checksum, checkval); return -EINVAL; }
@@ -517,7 +522,7 @@ static int amdgpu_virt_read_pf2vf_data(struct amdgpu_device *adev) ((struct amd_sriov_msg_pf2vf_info *)pf2vf_info)->uuid; break; default: - DRM_ERROR("invalid pf2vf version\n"); + dev_err(adev->dev, "invalid pf2vf version: 0x%x\n", pf2vf_info->version); return -EINVAL; }
@@ -617,8 +622,21 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) int ret;
ret = amdgpu_virt_read_pf2vf_data(adev); - if (ret) + if (ret) { + adev->virt.vf2pf_update_retry_cnt++; + if ((adev->virt.vf2pf_update_retry_cnt >= AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT) && + amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) { + if (amdgpu_reset_domain_schedule(adev->reset_domain, + &adev->virt.flr_work)) + return; + else + dev_err(adev->dev, "Failed to queue work! at %s", __func__); + } + goto out; + } + + adev->virt.vf2pf_update_retry_cnt = 0; amdgpu_virt_write_vf2pf_data(adev);
out: @@ -639,6 +657,7 @@ void amdgpu_virt_init_data_exchange(struct amdgpu_device *adev) adev->virt.fw_reserve.p_pf2vf = NULL; adev->virt.fw_reserve.p_vf2pf = NULL; adev->virt.vf2pf_update_interval_ms = 0; + adev->virt.vf2pf_update_retry_cnt = 0;
if (adev->mman.fw_vram_usage_va != NULL) { /* go through this logic in ip_init and reset to init workqueue*/ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h index dc6aaa4d67be7..fc2859726f0a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h @@ -51,6 +51,8 @@ /* tonga/fiji use this offset */ #define mmBIF_IOV_FUNC_IDENTIFIER 0x1503
+#define AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT 30 + enum amdgpu_sriov_vf_mode { SRIOV_VF_MODE_BARE_METAL = 0, SRIOV_VF_MODE_ONE_VF, @@ -250,6 +252,7 @@ struct amdgpu_virt { /* vf2pf message */ struct delayed_work vf2pf_work; uint32_t vf2pf_update_interval_ms; + int vf2pf_update_retry_cnt;
/* multimedia bandwidth config */ bool is_mm_bw_enabled; diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 12906ba74462f..d39c6baae007a 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -276,6 +276,8 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) timeout -= 10; } while (timeout > 1);
+ dev_warn(adev->dev, "waiting IDH_FLR_NOTIFICATION_CMPL timeout\n"); + flr_done: atomic_set(&adev->reset_domain->in_gpu_reset, 0); up_write(&adev->reset_domain->sem); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index e07757eea7adf..a311a2425ec01 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -300,6 +300,8 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) timeout -= 10; } while (timeout > 1);
+ dev_warn(adev->dev, "waiting IDH_FLR_NOTIFICATION_CMPL timeout\n"); + flr_done: atomic_set(&adev->reset_domain->in_gpu_reset, 0); up_write(&adev->reset_domain->sem);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 2965e6355dcdf157b5fafa25a2715f00064da8bf ]
As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend.
Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible.
Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary.
Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar lijo.lazar@amd.com Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: d0ce1aaa8531 ("Revert "drm/amd: Stop evicting resources on APUs in suspend"") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 - 3 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 705703bab2fdd..019a83c7170d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -781,6 +781,7 @@ struct amdgpu_device { bool need_swiotlb; bool accel_working; struct notifier_block acpi_nb; + struct notifier_block pm_nb; struct amdgpu_i2c_chan *i2c_bus[AMDGPU_MAX_I2C_BUS]; struct debugfs_blob_wrapper debugfs_vbios_blob; struct debugfs_blob_wrapper debugfs_discovery_blob; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index aac0e49f77c7f..32c571e9288ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -140,6 +140,8 @@ const char *amdgpu_asic_name[] = { };
static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev); +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, + void *data);
/** * DOC: pcie_replay_count @@ -3992,6 +3994,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
amdgpu_device_check_iommu_direct_map(adev);
+ adev->pm_nb.notifier_call = amdgpu_device_pm_notifier; + r = register_pm_notifier(&adev->pm_nb); + if (r) + goto failed; + return 0;
release_ras_con: @@ -4053,6 +4060,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) flush_delayed_work(&adev->delayed_init_work); adev->shutdown = true;
+ unregister_pm_notifier(&adev->pm_nb); + /* make sure IB test finished before entering exclusive mode * to avoid preemption on IB test * */ @@ -4183,6 +4192,41 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) /* * Suspend & resume. */ +/** + * amdgpu_device_pm_notifier - Notification block for Suspend/Hibernate events + * @nb: notifier block + * @mode: suspend mode + * @data: data + * + * This function is called when the system is about to suspend or hibernate. + * It is used to evict resources from the device before the system goes to + * sleep while there is still access to swap. + */ +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, + void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb); + int r; + + switch (mode) { + case PM_HIBERNATION_PREPARE: + adev->in_s4 = true; + fallthrough; + case PM_SUSPEND_PREPARE: + r = amdgpu_device_evict_resources(adev); + /* + * This is considered non-fatal at this time because + * amdgpu_device_prepare() will also fatally evict resources. + * See https://gitlab.freedesktop.org/drm/amd/-/issues/3781 + */ + if (r) + drm_warn(adev_to_drm(adev), "Failed to evict resources, freeze active processes if problems occur: %d\n", r); + break; + } + + return NOTIFY_DONE; +} + /** * amdgpu_device_prepare - prepare for device suspend * @@ -4222,7 +4266,7 @@ int amdgpu_device_prepare(struct drm_device *dev) return 0;
unprepare: - adev->in_s0ix = adev->in_s3 = false; + adev->in_s0ix = adev->in_s3 = adev->in_s4 = false;
return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 0a85a59519cac..06958608c9849 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2468,7 +2468,6 @@ static int amdgpu_pmops_freeze(struct device *dev) struct amdgpu_device *adev = drm_to_adev(drm_dev); int r;
- adev->in_s4 = true; r = amdgpu_device_suspend(drm_dev, true); if (r) return r;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit d0ce1aaa8531a4a4707711cab5721374751c51b0 ]
This reverts commit 3a9626c816db901def438dc2513622e281186d39.
This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit ce8f7d95899c2869b47ea6ce0b3e5bf304b2fff4) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -- drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 18 ------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++--------- 3 files changed, 2 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 019a83c7170d4..af86402c70a9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1409,11 +1409,9 @@ static inline int amdgpu_acpi_smart_shift_update(struct drm_device *dev, #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND) bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev); bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev); -void amdgpu_choose_low_power_state(struct amdgpu_device *adev); #else static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { return false; } static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) { return false; } -static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { } #endif
#if defined(CONFIG_DRM_AMD_DC) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c index 46916680f044b..5fa7f6d8aa309 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c @@ -1118,22 +1118,4 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) #endif /* CONFIG_AMD_PMC */ }
-/** - * amdgpu_choose_low_power_state - * - * @adev: amdgpu_device_pointer - * - * Choose the target low power state for the GPU - */ -void amdgpu_choose_low_power_state(struct amdgpu_device *adev) -{ - if (adev->in_runpm) - return; - - if (amdgpu_acpi_is_s0ix_active(adev)) - adev->in_s0ix = true; - else if (amdgpu_acpi_is_s3_active(adev)) - adev->in_s3 = true; -} - #endif /* CONFIG_SUSPEND */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 32c571e9288ae..57a7f72f366f7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4241,15 +4241,13 @@ int amdgpu_device_prepare(struct drm_device *dev) struct amdgpu_device *adev = drm_to_adev(dev); int i, r;
- amdgpu_choose_low_power_state(adev); - if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0;
/* Evict the majority of BOs before starting suspend sequence */ r = amdgpu_device_evict_resources(adev); if (r) - goto unprepare; + return r;
flush_delayed_work(&adev->gfx.gfx_off_delay_work);
@@ -4260,15 +4258,10 @@ int amdgpu_device_prepare(struct drm_device *dev) continue; r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev); if (r) - goto unprepare; + return r; }
return 0; - -unprepare: - adev->in_s0ix = adev->in_s3 = adev->in_s4 = false; - - return r; }
/**
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Cameron Jonathan.Cameron@huawei.com
[ Upstream commit ffbc26bc91c1f1eb3dcf5d8776e74cbae21ee13a ]
On architectures where an s64 is not 64-bit aligned, this may result insufficient alignment of the timestamp and the structure being too small. Use aligned_s64 to force the alignment.
Fixes: a1caeebab07e ("iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()") # aligned_s64 newer Reported-by: David Lechner dlechner@baylibre.com Reviewed-by: Nuno Sá nuno.sa@analog.com Reviewed-by: David Lechner dlechner@baylibre.com Link: https://patch.msgid.link/20250413103443.2420727-3-jic23@kernel.org Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7768-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c index 74b0c85944bd6..967f06cd3f94e 100644 --- a/drivers/iio/adc/ad7768-1.c +++ b/drivers/iio/adc/ad7768-1.c @@ -169,7 +169,7 @@ struct ad7768_state { union { struct { __be32 chan; - s64 timestamp; + aligned_s64 timestamp; } scan; __be32 d32; u8 d8[2];
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
[ Upstream commit bb49d940344bcb8e2b19e69d7ac86f567887ea9a ]
Follow the pattern of other drivers and use aligned_s64 for the timestamp. This will ensure that the timestamp is correctly aligned on all architectures.
Fixes: a5bf6fdd19c3 ("iio:chemical:sps30: Fix timestamp alignment") Signed-off-by: David Lechner dlechner@baylibre.com Reviewed-by: Nuno Sá nuno.sa@analog.com Link: https://patch.msgid.link/20250417-iio-more-timestamp-alignment-v1-5-eafac1e2... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/chemical/sps30.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/chemical/sps30.c b/drivers/iio/chemical/sps30.c index 814ce0aad1ccc..4085a36cd1db7 100644 --- a/drivers/iio/chemical/sps30.c +++ b/drivers/iio/chemical/sps30.c @@ -108,7 +108,7 @@ static irqreturn_t sps30_trigger_handler(int irq, void *p) int ret; struct { s32 data[4]; /* PM1, PM2P5, PM4, PM10 */ - s64 ts; + aligned_s64 ts; } scan;
mutex_lock(&state->lock);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 94cff94634e506a4a44684bee1875d2dbf782722 ]
On x86 during boot, clockevent_i8253_disable() can be invoked via x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice that i8253_lock is used in hard interrupt context. This causes lockdep to complain because it observed the lock being acquired with interrupts enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/i8253.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c index 39f7c2d736d16..b603c25f3dfaa 100644 --- a/drivers/clocksource/i8253.c +++ b/drivers/clocksource/i8253.c @@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void) #ifdef CONFIG_CLKEVT_I8253 void clockevent_i8253_disable(void) { - raw_spin_lock(&i8253_lock); + guard(raw_spinlock_irqsave)(&i8253_lock);
/* * Writing the MODE register should stop the counter, according to @@ -132,8 +132,6 @@ void clockevent_i8253_disable(void) outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE); - - raw_spin_unlock(&i8253_lock); }
static int pit_shutdown(struct clock_event_device *evt)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhu Yanjun yanjun.zhu@linux.dev
[ Upstream commit f81b33582f9339d2dc17c69b92040d3650bb4bae ]
Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x7d/0xa0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcf/0x610 mm/kasan/report.c:489 kasan_report+0xb5/0xe0 mm/kasan/report.c:602 rxe_queue_cleanup+0xd0/0xe0 drivers/infiniband/sw/rxe/rxe_queue.c:195 rxe_cq_cleanup+0x3f/0x50 drivers/infiniband/sw/rxe/rxe_cq.c:132 __rxe_cleanup+0x168/0x300 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_cq+0x22e/0x3a0 drivers/infiniband/sw/rxe/rxe_verbs.c:1109 create_cq+0x658/0xb90 drivers/infiniband/core/uverbs_cmd.c:1052 ib_uverbs_create_cq+0xc7/0x120 drivers/infiniband/core/uverbs_cmd.c:1095 ib_uverbs_write+0x969/0xc90 drivers/infiniband/core/uverbs_main.c:679 vfs_write fs/read_write.c:677 [inline] vfs_write+0x26a/0xcc0 fs/read_write.c:659 ksys_write+0x1b8/0x200 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xaa/0x1b0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
In the function rxe_create_cq, when rxe_cq_from_init fails, the function rxe_cleanup will be called to handle the allocated resources. In fact, some memory resources have already been freed in the function rxe_cq_from_init. Thus, this problem will occur.
The solution is to let rxe_cleanup do all the work.
Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://paste.ubuntu.com/p/tJgC42wDf6/ Tested-by: liuyi liuy22@mails.tsinghua.edu.cn Signed-off-by: Zhu Yanjun yanjun.zhu@linux.dev Link: https://patch.msgid.link/20250412075714.3257358-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rxe/rxe_cq.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c index b1a0ab3cd4bd1..43dfc6fd8a3ed 100644 --- a/drivers/infiniband/sw/rxe/rxe_cq.c +++ b/drivers/infiniband/sw/rxe/rxe_cq.c @@ -71,11 +71,8 @@ int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
err = do_mmap_info(rxe, uresp ? &uresp->mi : NULL, udata, cq->queue->buf, cq->queue->buf_size, &cq->queue->ip); - if (err) { - vfree(cq->queue->buf); - kfree(cq->queue); + if (err) return err; - }
cq->is_user = uresp;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qasim Ijaz qasdev00@gmail.com
[ Upstream commit 09d546303b370113323bfff456c4e8cff8756005 ]
In thrustmaster_interrupts(), the allocated send_buf is not freed if the usb_check_int_endpoints() check fails, leading to a memory leak.
Fix this by ensuring send_buf is freed before returning in the error path.
Fixes: 50420d7c79c3 ("HID: hid-thrustmaster: Fix warning in thrustmaster_probe by adding endpoint check") Signed-off-by: Qasim Ijaz qasdev00@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-thrustmaster.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hid/hid-thrustmaster.c b/drivers/hid/hid-thrustmaster.c index 3b81468a1df29..0bf70664c35ee 100644 --- a/drivers/hid/hid-thrustmaster.c +++ b/drivers/hid/hid-thrustmaster.c @@ -174,6 +174,7 @@ static void thrustmaster_interrupts(struct hid_device *hdev) u8 ep_addr[2] = {b_ep, 0};
if (!usb_check_int_endpoints(usbif, ep_addr)) { + kfree(send_buf); hid_err(hdev, "Unexpected non-int endpoint\n"); return; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Henry Martin bsdhenrymartin@gmail.com
[ Upstream commit bd07f751208ba190f9b0db5e5b7f35d5bb4a8a1e ]
devm_kasprintf() returns NULL when memory allocation fails. Currently, uclogic_input_configured() does not check for this case, which results in a NULL pointer dereference.
Add NULL check after devm_kasprintf() to prevent this issue.
Fixes: dd613a4e45f8 ("HID: uclogic: Correct devm device reference for hidinput input_dev name") Signed-off-by: Henry Martin bsdhenrymartin@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-uclogic-core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/hid-uclogic-core.c b/drivers/hid/hid-uclogic-core.c index 39114d5c55a0e..5b35f9f321d41 100644 --- a/drivers/hid/hid-uclogic-core.c +++ b/drivers/hid/hid-uclogic-core.c @@ -142,11 +142,12 @@ static int uclogic_input_configured(struct hid_device *hdev, suffix = "System Control"; break; } - } - - if (suffix) + } else { hi->input->name = devm_kasprintf(&hdev->dev, GFP_KERNEL, "%s %s", hdev->name, suffix); + if (!hi->input->name) + return -ENOMEM; + }
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Lingfeng lilingfeng3@huawei.com
[ Upstream commit c457dc1ec770a22636b473ce5d35614adfe97636 ]
When memory is insufficient, the allocation of nfs_lock_context in nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) as valid and proceed to execute rpc_run_task(), this will trigger a NULL pointer dereference in nfs4_locku_prepare. For example:
BUG: kernel NULL pointer dereference, address: 000000000000000c PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 Workqueue: rpciod rpc_async_schedule RIP: 0010:nfs4_locku_prepare+0x35/0xc2 Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 Call Trace: <TASK> __rpc_execute+0xbc/0x480 rpc_async_schedule+0x2f/0x40 process_one_work+0x232/0x5d0 worker_thread+0x1da/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x240 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 000000000000000c ---[ end trace 0000000000000000 ]---
Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and return NULL to terminate subsequent rpc_run_task, preventing NULL pointer dereference.
Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock") Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Jeff Layton jlayton@kernel.org Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 5b06b8d4e0147..acef50824d1a2 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6886,10 +6886,18 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, struct nfs4_unlockdata *p; struct nfs4_state *state = lsp->ls_state; struct inode *inode = state->inode; + struct nfs_lock_context *l_ctx;
p = kzalloc(sizeof(*p), GFP_KERNEL); if (p == NULL) return NULL; + l_ctx = nfs_get_lock_context(ctx); + if (!IS_ERR(l_ctx)) { + p->l_ctx = l_ctx; + } else { + kfree(p); + return NULL; + } p->arg.fh = NFS_FH(inode); p->arg.fl = &p->fl; p->arg.seqid = seqid; @@ -6897,7 +6905,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, p->lsp = lsp; /* Ensure we don't close file until we're done freeing locks! */ p->ctx = get_nfs_open_context(ctx); - p->l_ctx = nfs_get_lock_context(ctx); locks_init_lock(&p->fl); locks_copy_lock(&p->fl, fl); p->server = NFS_SERVER(inode);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit a73fa3690a1f3014d6677e368dce4e70767a6ba2 ]
spi_test_print_hex_dump() prints buffers holding less than 1024 bytes in full. Larger buffers are truncated: only the first 512 and the last 512 bytes are printed, separated by a truncation message. The latter is confusing in case the buffer holds exactly 1024 bytes, as all data is printed anyway.
Fix this by printing buffers holding up to and including 1024 bytes in full.
Fixes: 84e0c4e5e2c4ef42 ("spi: add loopback test driver to allow for spi_master regression tests") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/37ee1bc90c6554c9347040adabf04188c8f704aa.1746184171... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-loopback-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-loopback-test.c b/drivers/spi/spi-loopback-test.c index dd7de8fa37d04..ab29ae463f677 100644 --- a/drivers/spi/spi-loopback-test.c +++ b/drivers/spi/spi-loopback-test.c @@ -410,7 +410,7 @@ MODULE_LICENSE("GPL"); static void spi_test_print_hex_dump(char *pre, const void *ptr, size_t len) { /* limit the hex_dump */ - if (len < 1024) { + if (len <= 1024) { print_hex_dump(KERN_INFO, pre, DUMP_PREFIX_OFFSET, 16, 1, ptr, len, 0);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 2d3cbfd6d54a2c39ce3244f33f85c595844bd7b8 ]
Previously, when reducing a qdisc's limit via the ->change() operation, only the main skb queue was trimmed, potentially leaving packets in the gso_skb list. This could result in NULL pointer dereference when we only check sch->limit against sch->q.qlen.
This patch introduces a new helper, qdisc_dequeue_internal(), which ensures both the gso_skb list and the main queue are properly flushed when trimming excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie) are updated to use this helper in their ->change() routines.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme") Reported-by: Will willsroot@protonmail.com Reported-by: Savy savy@syst3mfailure.io Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sch_generic.h | 15 +++++++++++++++ net/sched/sch_codel.c | 2 +- net/sched/sch_fq.c | 2 +- net/sched/sch_fq_codel.c | 2 +- net/sched/sch_fq_pie.c | 2 +- net/sched/sch_hhf.c | 2 +- net/sched/sch_pie.c | 2 +- 7 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 80f657bf2e047..b34e9e93a1463 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -997,6 +997,21 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh) return skb; }
+static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool direct) +{ + struct sk_buff *skb; + + skb = __skb_dequeue(&sch->gso_skb); + if (skb) { + sch->q.qlen--; + return skb; + } + if (direct) + return __qdisc_dequeue_head(&sch->q); + else + return sch->dequeue(sch); +} + static inline struct sk_buff *qdisc_dequeue_head(struct Qdisc *sch) { struct sk_buff *skb = __qdisc_dequeue_head(&sch->q); diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c index 5f2e068157456..63c02040b426a 100644 --- a/net/sched/sch_codel.c +++ b/net/sched/sch_codel.c @@ -168,7 +168,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
qlen = sch->q.qlen; while (sch->q.qlen > sch->limit) { - struct sk_buff *skb = __qdisc_dequeue_head(&sch->q); + struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
dropped += qdisc_pkt_len(skb); qdisc_qstats_backlog_dec(sch, skb); diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index f59a2cb2c803d..91f5ef6be0f23 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -901,7 +901,7 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, sch_tree_lock(sch); } while (sch->q.qlen > sch->limit) { - struct sk_buff *skb = fq_dequeue(sch); + struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
if (!skb) break; diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 9330923a624c0..47b5a056165cb 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -431,7 +431,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
while (sch->q.qlen > sch->limit || q->memory_usage > q->memory_limit) { - struct sk_buff *skb = fq_codel_dequeue(sch); + struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
q->cstats.drop_len += qdisc_pkt_len(skb); rtnl_kfree_skbs(skb, skb); diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c index 68e6acd0f130d..607c580d75e4b 100644 --- a/net/sched/sch_fq_pie.c +++ b/net/sched/sch_fq_pie.c @@ -357,7 +357,7 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt,
/* Drop excess packets if new limit is lower */ while (sch->q.qlen > sch->limit) { - struct sk_buff *skb = fq_pie_qdisc_dequeue(sch); + struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
len_dropped += qdisc_pkt_len(skb); num_dropped += 1; diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c index d26cd436cbe31..83fc44f20e31c 100644 --- a/net/sched/sch_hhf.c +++ b/net/sched/sch_hhf.c @@ -560,7 +560,7 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt, qlen = sch->q.qlen; prev_backlog = sch->qstats.backlog; while (sch->q.qlen > sch->limit) { - struct sk_buff *skb = hhf_dequeue(sch); + struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
rtnl_kfree_skbs(skb, skb); } diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c index b60b31ef71cc5..e1bb151a97195 100644 --- a/net/sched/sch_pie.c +++ b/net/sched/sch_pie.c @@ -190,7 +190,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt, /* Drop excess packets if new limit is lower */ qlen = sch->q.qlen; while (sch->q.qlen > sch->limit) { - struct sk_buff *skb = __qdisc_dequeue_head(&sch->q); + struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
dropped += qdisc_pkt_len(skb); qdisc_qstats_backlog_dec(sch, skb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Jeffery andrew@codeconstruct.com.au
[ Upstream commit e4f349bd6e58051df698b82f94721f18a02a293d ]
mctp_flow_prepare_output() is called in mctp_route_output(), which places outbound packets onto a given interface. The packet may represent a message fragment, in which case we provoke an unbalanced reference count to the underlying device. This causes trouble if we ever attempt to remove the interface:
[ 48.702195] usb 1-1: USB disconnect, device number 2 [ 58.883056] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 69.022548] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 [ 79.172568] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2 ...
Predicate the invocation of mctp_dev_set_key() in mctp_flow_prepare_output() on not already having associated the device with the key. It's not yet realistic to uphold the property that the key maintains only one device reference earlier in the transmission sequence as the route (and therefore the device) may not be known at the time the key is associated with the socket.
Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers") Acked-by: Jeremy Kerr jk@codeconstruct.com.au Signed-off-by: Andrew Jeffery andrew@codeconstruct.com.au Link: https://patch.msgid.link/20250508-mctp-dev-refcount-v1-1-d4f965c67bb5@codeco... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/route.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mctp/route.c b/net/mctp/route.c index e72cdd4ce588f..62952ad5cb636 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -274,8 +274,10 @@ static void mctp_flow_prepare_output(struct sk_buff *skb, struct mctp_dev *dev)
key = flow->key;
- if (WARN_ON(key->dev && key->dev != dev)) + if (key->dev) { + WARN_ON(key->dev != dev); return; + }
mctp_dev_set_key(dev, key); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Othacehe othacehe@gnu.org
[ Upstream commit c92d6089d8ad7d4d815ebcedee3f3907b539ff1f ]
There is a situation where after THALT is set high, TGO stays high as well. Because jiffies are never updated, as we are in a context with interrupts disabled, we never exit that loop and have a deadlock.
That deadlock was noticed on a sama5d4 device that stayed locked for days.
Use retries instead of jiffies so that the timeout really works and we do not have a deadlock anymore.
Fixes: e86cd53afc590 ("net/macb: better manage tx errors") Signed-off-by: Mathieu Othacehe othacehe@gnu.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250509121935.16282-1-othacehe@gnu.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cadence/macb_main.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index fc3342944dbcc..d2f4709dee0de 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -962,22 +962,15 @@ static void macb_update_stats(struct macb *bp)
static int macb_halt_tx(struct macb *bp) { - unsigned long halt_time, timeout; - u32 status; + u32 status;
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(THALT));
- timeout = jiffies + usecs_to_jiffies(MACB_HALT_TIMEOUT); - do { - halt_time = jiffies; - status = macb_readl(bp, TSR); - if (!(status & MACB_BIT(TGO))) - return 0; - - udelay(250); - } while (time_before(halt_time, timeout)); - - return -ETIMEDOUT; + /* Poll TSR until TGO is cleared or timeout. */ + return read_poll_timeout_atomic(macb_readl, status, + !(status & MACB_BIT(TGO)), + 250, MACB_HALT_TIMEOUT, false, + bp, TSR); }
static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 498625a8ab2c8e1c9ab5105744310e8d6952cc01 ]
It has been reported that when under a bridge with stp_state=1, the logs get spammed with this message:
[ 251.734607] fsl_dpaa2_eth dpni.5 eth0: Couldn't decode source port
Further debugging shows the following info associated with packets: source_port=-1, switch_id=-1, vid=-1, vbid=1
In other words, they are data plane packets which are supposed to be decoded by dsa_tag_8021q_find_port_by_vbid(), but the latter (correctly) refuses to do so, because no switch port is currently in BR_STATE_LEARNING or BR_STATE_FORWARDING - so the packet is effectively unexpected.
The error goes away after the port progresses to BR_STATE_LEARNING in 15 seconds (the default forward_time of the bridge), because then, dsa_tag_8021q_find_port_by_vbid() can correctly associate the data plane packets with a plausible bridge port in a plausible STP state.
Re-reading IEEE 802.1D-1990, I see the following:
"4.4.2 Learning: (...) The Forwarding Process shall discard received frames."
IEEE 802.1D-2004 further clarifies:
"DISABLED, BLOCKING, LISTENING, and BROKEN all correspond to the DISCARDING port state. While those dot1dStpPortStates serve to distinguish reasons for discarding frames, the operation of the Forwarding and Learning processes is the same for all of them. (...) LISTENING represents a port that the spanning tree algorithm has selected to be part of the active topology (computing a Root Port or Designated Port role) but is temporarily discarding frames to guard against loops or incorrect learning."
Well, this is not what the driver does - instead it sets mac[port].ingress = true.
To get rid of the log spam, prevent unexpected data plane packets to be received by software by discarding them on ingress in the LISTENING state.
In terms of blame attribution: the prints only date back to commit d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID"). However, the settings would permit a LISTENING port to forward to a FORWARDING port, and the standard suggests that's not OK.
Fixes: 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20250509113816.2221992-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index f1f1368e8146f..d51d982c4bc01 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2083,6 +2083,7 @@ static void sja1105_bridge_stp_state_set(struct dsa_switch *ds, int port, switch (state) { case BR_STATE_DISABLED: case BR_STATE_BLOCKING: + case BR_STATE_LISTENING: /* From UM10944 description of DRPDTAG (why put this there?): * "Management traffic flows to the port regardless of the state * of the INGRESS flag". So BPDUs are still be allowed to pass. @@ -2092,11 +2093,6 @@ static void sja1105_bridge_stp_state_set(struct dsa_switch *ds, int port, mac[port].egress = false; mac[port].dyn_learn = false; break; - case BR_STATE_LISTENING: - mac[port].ingress = true; - mac[port].egress = false; - mac[port].dyn_learn = false; - break; case BR_STATE_LEARNING: mac[port].ingress = true; mac[port].egress = false;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 40696426b8c8c4f13cf6ac52f0470eed144be4b2 ]
The only reason nvme_pci_npages_prp() could be used as a compile-time known result in BUILD_BUG_ON() is because the compiler was always choosing to inline the function. Under special circumstances (sanitizer coverage functions disabled for __init functions on ARCH=um), the compiler decided to stop inlining it:
drivers/nvme/host/pci.c: In function 'nvme_init': include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_678' declared with attribute error: BUILD_BUG_ON failed: nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS 557 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert' 538 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert' 557 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG' 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) | ^~~~~~~~~~~~~~~~ drivers/nvme/host/pci.c:3804:9: note: in expansion of macro 'BUILD_BUG_ON' 3804 | BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS); | ^~~~~~~~~~~~
Force it to be __always_inline to make sure it is always available for use with BUILD_BUG_ON().
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202505061846.12FMyRjj-lkp@intel.com/ Fixes: c372cdd1efdf ("nvme-pci: iod npages fits in s8") Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index da858463b2557..553e6bb461b39 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -376,7 +376,7 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, * as it only leads to a small amount of wasted memory for the lifetime of * the I/O. */ -static int nvme_pci_npages_prp(void) +static __always_inline int nvme_pci_npages_prp(void) { unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Keith Busch kbusch@kernel.org
[ Upstream commit 3d8932133dcecbd9bef1559533c1089601006f45 ]
We need to lock this queue for that condition because the timeout work executes per-namespace and can poll the poll CQ.
Reported-by: Hannes Reinecke hare@kernel.org Closes: https://lore.kernel.org/all/20240902130728.1999-1-hare@kernel.org/ Fixes: a0fa9647a54e ("NVMe: add blk polling support") Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Daniel Wagner wagi@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 553e6bb461b39..49a3cb8f1f105 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1154,7 +1154,9 @@ static void nvme_poll_irqdisable(struct nvme_queue *nvmeq) WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); + spin_lock(&nvmeq->cq_poll_lock); nvme_poll_cq(nvmeq, NULL); + spin_unlock(&nvmeq->cq_poll_lock); enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit 66e48ef6ef506c89ec1b3851c6f9f5f80b5835ff ]
If CONFIG_SH_DMA_API=n:
WARNING: unmet direct dependencies detected for G2_DMA Depends on [n]: SH_DREAMCAST [=y] && SH_DMA_API [=n] Selected by [y]: - SND_AICA [=y] && SOUND [=y] && SND [=y] && SND_SUPERH [=y] && SH_DREAMCAST [=y]
SND_AICA selects G2_DMA. As the latter depends on SH_DMA_API, the former should depend on SH_DMA_API, too.
Fixes: f477a538c14d07f8 ("sh: dma: fix kconfig dependency for G2_DMA") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202505131320.PzgTtl9H-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/b90625f8a9078d0d304bafe862cbe3a3fab40082.1747121335... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/sh/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/sh/Kconfig b/sound/sh/Kconfig index b75fbb3236a7b..f5fa09d740b4c 100644 --- a/sound/sh/Kconfig +++ b/sound/sh/Kconfig @@ -14,7 +14,7 @@ if SND_SUPERH
config SND_AICA tristate "Dreamcast Yamaha AICA sound" - depends on SH_DREAMCAST + depends on SH_DREAMCAST && SH_DMA_API select SND_PCM select G2_DMA help
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit 588431474eb7572e57a927fa8558c9ba2f8af143 ]
MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features.
If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set.
Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features().
Kernel log:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]---
Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran cjubran@nvidia.com Reviewed-by: Shahar Shitrit shshitrit@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f520949b2024c..887d446354006 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4023,6 +4023,10 @@ static netdev_features_t mlx5e_fix_uplink_rep_features(struct net_device *netdev if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER) netdev_warn(netdev, "Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode\n");
+ features &= ~NETIF_F_HW_MACSEC; + if (netdev->features & NETIF_F_HW_MACSEC) + netdev_warn(netdev, "Disabling HW MACsec offload, not supported in switchdev mode\n"); + return features; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abdun Nihaal abdun.nihaal@gmail.com
[ Upstream commit 9d8a99c5a7c7f4f7eca2c168a4ec254409670035 ]
In one of the error paths in qlcnic_sriov_channel_cfg_cmd(), the memory allocated in qlcnic_sriov_alloc_bc_mbx_args() for mailbox arguments is not freed. Fix that by jumping to the error path that frees them, by calling qlcnic_free_mbx_args(). This was found using static analysis.
Fixes: f197a7aa6288 ("qlcnic: VF-PF communication channel implementation") Signed-off-by: Abdun Nihaal abdun.nihaal@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250512044829.36400-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c index 28d24d59efb84..d57b976b90409 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c @@ -1484,8 +1484,11 @@ static int qlcnic_sriov_channel_cfg_cmd(struct qlcnic_adapter *adapter, u8 cmd_o }
cmd_op = (cmd.rsp.arg[0] & 0xff); - if (cmd.rsp.arg[0] >> 25 == 2) - return 2; + if (cmd.rsp.arg[0] >> 25 == 2) { + ret = 2; + goto out; + } + if (cmd_op == QLCNIC_BC_CMD_CHANNEL_INIT) set_bit(QLC_BC_VF_STATE, &vf->state); else
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Tanislav demonsingur@gmail.com
[ Upstream commit 6b0cd72757c69bc2d45da42b41023e288d02e772 ]
max20086_parse_regulators_dt() calls of_regulator_match() using an array of struct of_regulator_match allocated on the stack for the matches argument.
of_regulator_match() calls devm_of_regulator_put_matches(), which calls devres_alloc() to allocate a struct devm_of_regulator_matches which will be de-allocated using devm_of_regulator_put_matches().
struct devm_of_regulator_matches is populated with the stack allocated matches array.
If the device fails to probe, devm_of_regulator_put_matches() will be called and will try to call of_node_put() on that stack pointer, generating the following dmesg entries:
max20086 6-0028: Failed to read DEVICE_ID reg: -121 kobject: '\xc0$\xa5\x03' (000000002cebcb7a): is not initialized, yet kobject_put() is being called.
Followed by a stack trace matching the call flow described above.
Switch to allocating the matches array using devm_kcalloc() to avoid accessing the stack pointer long after it's out of scope.
This also has the advantage of allowing multiple max20086 to probe without overriding the data stored inside the global of_regulator_match.
Fixes: bfff546aae50 ("regulator: Add MAX20086-MAX20089 driver") Signed-off-by: Cosmin Tanislav demonsingur@gmail.com Link: https://patch.msgid.link/20250508064947.2567255-1-demonsingur@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/max20086-regulator.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/regulator/max20086-regulator.c b/drivers/regulator/max20086-regulator.c index b8bf76c170fea..332fb58f90952 100644 --- a/drivers/regulator/max20086-regulator.c +++ b/drivers/regulator/max20086-regulator.c @@ -133,7 +133,7 @@ static int max20086_regulators_register(struct max20086 *chip)
static int max20086_parse_regulators_dt(struct max20086 *chip, bool *boot_on) { - struct of_regulator_match matches[MAX20086_MAX_REGULATORS] = { }; + struct of_regulator_match *matches; struct device_node *node; unsigned int i; int ret; @@ -144,6 +144,11 @@ static int max20086_parse_regulators_dt(struct max20086 *chip, bool *boot_on) return -ENODEV; }
+ matches = devm_kcalloc(chip->dev, chip->info->num_outputs, + sizeof(*matches), GFP_KERNEL); + if (!matches) + return -ENOMEM; + for (i = 0; i < chip->info->num_outputs; ++i) matches[i].name = max20086_output_names[i];
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Subbaraya Sundeep sbhatta@marvell.com
[ Upstream commit 865ab2461375e3a5a2526f91f9a9f17b8931bc9e ]
MASCEC hardware block has a field called maximum transmit size for TX secy. Max packet size going out of MCS block has be programmed taking into account full packet size which has L2 header,SecTag and ICV. MACSEC offload driver is configuring max transmit size as macsec interface MTU which is incorrect. Say with 1500 MTU of real device, macsec interface created on top of real device will have MTU of 1468(1500 - (SecTag + ICV)). This is causing packets from macsec interface of size greater than or equal to 1468 are not getting transmitted out because driver programmed max transmit size as 1468 instead of 1514(1500 + ETH_HDR_LEN).
Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/1747053756-4529-1-git-send-email-sbhatta@marvell.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c index a487a98eac88c..6da8d8f2a8701 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c @@ -428,7 +428,8 @@ static int cn10k_mcs_write_tx_secy(struct otx2_nic *pfvf, if (sw_tx_sc->encrypt) sectag_tci |= (MCS_TCI_E | MCS_TCI_C);
- policy = FIELD_PREP(MCS_TX_SECY_PLCY_MTU, secy->netdev->mtu); + policy = FIELD_PREP(MCS_TX_SECY_PLCY_MTU, + pfvf->netdev->mtu + OTX2_ETH_HLEN); /* Write SecTag excluding AN bits(1..0) */ policy |= FIELD_PREP(MCS_TX_SECY_PLCY_ST_TCI, sectag_tci >> 2); policy |= FIELD_PREP(MCS_TX_SECY_PLCY_ST_OFFSET, tag_offset);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pengtao He hept.hept.hept@gmail.com
[ Upstream commit 491deb9b8c4ad12fe51d554a69b8165b9ef9429f ]
We cannot set frag_list to NULL pointer when alloc_page failed. It will be used in tls_strp_check_queue_ok when the next time tls_strp_read_sock is called.
This is because we don't reset full_len in tls_strp_flush_anchor_copy() so the recv path will try to continue handling the partial record on the next call but we dettached the rcvq from the frag list. Alternative fix would be to reset full_len.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 Call trace: tls_strp_check_rcv+0x128/0x27c tls_strp_data_ready+0x34/0x44 tls_data_ready+0x3c/0x1f0 tcp_data_ready+0x9c/0xe4 tcp_data_queue+0xf6c/0x12d0 tcp_rcv_established+0x52c/0x798
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Pengtao He hept.hept.hept@gmail.com Link: https://patch.msgid.link/20250514132013.17274-1-hept.hept.hept@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_strp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c index f37f4a0fcd3c2..44d7f1aef9f12 100644 --- a/net/tls/tls_strp.c +++ b/net/tls/tls_strp.c @@ -396,7 +396,6 @@ static int tls_strp_read_copy(struct tls_strparser *strp, bool qshort) return 0;
shinfo = skb_shinfo(strp->anchor); - shinfo->frag_list = NULL;
/* If we don't know the length go max plus page for cipher overhead */ need_spc = strp->stm.full_len ?: TLS_MAX_PAYLOAD_SIZE + PAGE_SIZE; @@ -412,6 +411,8 @@ static int tls_strp_read_copy(struct tls_strparser *strp, bool qshort) page, 0, 0); }
+ shinfo->frag_list = NULL; + strp->copy_mode = 1; strp->stm.offset = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 6d6d7f91cc8c111d40416ac9240a3bb9396c5235 ]
If there are still layout segments in the layout plh_return_lsegs list after a layout return, we should be resetting the state to ensure they eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pnfs.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -732,6 +732,14 @@ pnfs_mark_matching_lsegs_invalid(struct return remaining; }
+static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) +{ + struct pnfs_layout_segment *lseg; + + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); +} + static void pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, struct list_head *free_me, @@ -1180,6 +1188,7 @@ void pnfs_layoutreturn_free_lsegs(struct pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); pnfs_free_returned_lsegs(lo, &freeme, range, seq); pnfs_set_layout_stateid(lo, stateid, NULL, true); + pnfs_reset_return_info(lo); } else pnfs_mark_layout_stateid_invalid(lo, &freeme); out_unlock:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Lynch nathan.lynch@amd.com
commit df180e65305f8c1e020d54bfc2132349fd693de1 upstream.
Several issues with this change:
* The analysis is flawed and it's unclear what problem is being fixed. There is no difference between wait_event_freezable_timeout() and wait_event_timeout() with respect to device interrupts. And of course "the interrupt notifying the finish of an operation happens during wait_event_freezable_timeout()" -- that's how it's supposed to work.
* The link at the "Closes:" tag appears to be an unrelated use-after-free in idxd.
* It introduces a regression: dmatest threads are meant to be freezable and this change breaks that.
See discussion here: https://lore.kernel.org/dmaengine/878qpa13fe.fsf@AUSNATLYNCH.amd.com/
Fixes: e87ca16e9911 ("dmaengine: dmatest: Fix dmatest waiting less when interrupted") Signed-off-by: Nathan Lynch nathan.lynch@amd.com Link: https://lore.kernel.org/r/20250403-dmaengine-dmatest-revert-waiting-less-v1-... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/dmatest.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -827,9 +827,9 @@ static int dmatest_func(void *data) } else { dma_async_issue_pending(chan);
- wait_event_timeout(thread->done_wait, - done->done, - msecs_to_jiffies(params->timeout)); + wait_event_freezable_timeout(thread->done_wait, + done->done, + msecs_to_jiffies(params->timeout));
status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 90436d234230e9a950ccd87831108b688b27a234 upstream.
Fix MAX_REG_OFFSET calculation, make it point to the last register in 'struct pt_regs' and not to the marker itself, which could allow regs_get_register() to return an invalid offset.
Cc: stable@vger.kernel.org Fixes: 803b0fc5c3f2baa6e5 ("LoongArch: Add process management") Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/loongarch/include/asm/ptrace.h +++ b/arch/loongarch/include/asm/ptrace.h @@ -54,7 +54,7 @@ static inline void instruction_pointer_s
/* Query offset/name of register from its name/offset */ extern int regs_query_register_offset(const char *name); -#define MAX_REG_OFFSET (offsetof(struct pt_regs, __last)) +#define MAX_REG_OFFSET (offsetof(struct pt_regs, __last) - sizeof(unsigned long))
/** * regs_get_register() - get register value from its offset
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 54db6d1bdd71fa90172a2a6aca3308bbf7fa7eb5 upstream.
If the discard worker is running and there's currently only one block group, that block group is a data block group, it's in the unused block groups discard list and is being used (it got an extent allocated from it after becoming unused), the worker can end up in an infinite loop if a transaction abort happens or the async discard is disabled (during remount or unmount for example).
This happens like this:
1) Task A, the discard worker, is at peek_discard_list() and find_next_block_group() returns block group X;
2) Block group X is in the unused block groups discard list (its discard index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past it become an unused block group and was added to that list, but then later it got an extent allocated from it, so its ->used counter is not zero anymore;
3) The current transaction is aborted by task B and we end up at __btrfs_handle_fs_error() in the transaction abort path, where we call btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode (setting SB_RDONLY in the super block's s_flags field);
4) Task A calls __add_to_discard_list() with the goal of moving the block group from the unused block groups discard list into another discard list, but at __add_to_discard_list() we end up doing nothing because btrfs_run_discard_work() returns false, since the super block has SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set anymore in fs_info->flags. So block group X remains in the unused block groups discard list;
5) Task A then does a goto into the 'again' label, calls find_next_block_group() again we gets block group X again. Then it repeats the previous steps over and over since there are not other block groups in the discard lists and block group X is never moved out of the unused block groups discard list since btrfs_run_discard_work() keeps returning false and therefore __add_to_discard_list() doesn't move block group X out of that discard list.
When this happens we can get a soft lockup report like this:
[71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] [71.957] Modules linked in: xfs af_packet rfkill (...) [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.957] Tainted: [W]=WARN [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [71.957] Code: c1 01 48 83 (...) [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 [71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 [71.957] PKRU: 55555554 [71.957] Call Trace: [71.957] <TASK> [71.957] process_one_work+0x17e/0x330 [71.957] worker_thread+0x2ce/0x3f0 [71.957] ? __pfx_worker_thread+0x10/0x10 [71.957] kthread+0xef/0x220 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork+0x34/0x50 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork_asm+0x1a/0x30 [71.957] </TASK> [71.957] Kernel panic - not syncing: softlockup: hung tasks [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.992] Call Trace: [71.993] <IRQ> [71.994] dump_stack_lvl+0x5a/0x80 [71.994] panic+0x10b/0x2da [71.995] watchdog_timer_fn.cold+0x9a/0xa1 [71.996] ? __pfx_watchdog_timer_fn+0x10/0x10 [71.997] __hrtimer_run_queues+0x132/0x2a0 [71.997] hrtimer_interrupt+0xff/0x230 [71.998] __sysvec_apic_timer_interrupt+0x55/0x100 [71.999] sysvec_apic_timer_interrupt+0x6c/0x90 [72.000] </IRQ> [72.000] <TASK> [72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [72.002] Code: c1 01 48 83 (...) [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] [72.011] process_one_work+0x17e/0x330 [72.012] worker_thread+0x2ce/0x3f0 [72.013] ? __pfx_worker_thread+0x10/0x10 [72.014] kthread+0xef/0x220 [72.014] ? __pfx_kthread+0x10/0x10 [72.015] ret_from_fork+0x34/0x50 [72.015] ? __pfx_kthread+0x10/0x10 [72.016] ret_from_fork_asm+0x1a/0x30 [72.017] </TASK> [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [72.019] Rebooting in 90 seconds..
So fix this by making sure we move a block group out of the unused block groups discard list when calling __add_to_discard_list().
Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard") Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov boris@bur.io Reviewed-by: Daniel Vacek neelx@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/discard.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -78,8 +78,6 @@ static void __add_to_discard_list(struct struct btrfs_block_group *block_group) { lockdep_assert_held(&discard_ctl->lock); - if (!btrfs_run_discard_work(discard_ctl)) - return;
if (list_empty(&block_group->discard_list) || block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED) { @@ -102,6 +100,9 @@ static void add_to_discard_list(struct b if (!btrfs_is_block_group_data_only(block_group)) return;
+ if (!btrfs_run_discard_work(discard_ctl)) + return; + spin_lock(&discard_ctl->lock); __add_to_discard_list(discard_ctl, block_group); spin_unlock(&discard_ctl->lock); @@ -233,6 +234,18 @@ again: block_group->used != 0) { if (btrfs_is_block_group_data_only(block_group)) { __add_to_discard_list(discard_ctl, block_group); + /* + * The block group must have been moved to other + * discard list even if discard was disabled in + * the meantime or a transaction abort happened, + * otherwise we can end up in an infinite loop, + * always jumping into the 'again' label and + * keep getting this block group over and over + * in case there are no other block groups in + * the discard lists. + */ + ASSERT(block_group->discard_index != + BTRFS_DISCARD_INDEX_UNUSED); } else { list_del_init(&block_group->discard_list); btrfs_put_block_group(block_group);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin Wayne.Lin@amd.com
commit d433981385c62c72080e26f1c00a961d18b233be upstream.
[Why] Now forcing aux->transfer to return 0 when incomplete AUX write is inappropriate. It should return bytes have been transferred.
[How] aux->transfer is asked not to change original msg except reply field of drm_dp_aux_msg structure. Copy the msg->buffer when it's write request, and overwrite the first byte when sink reply 1 byte indicating partially written byte number. Then we can return the correct value without changing the original msg.
Fixes: 3637e457eb00 ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Reviewed-by: Ray Wu ray.wu@amd.com Signed-off-by: Wayne Lin Wayne.Lin@amd.com Signed-off-by: Ray Wu ray.wu@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 7ac37f0dcd2e0b729fa7b5513908dc8ab802b540) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10707,7 +10707,8 @@ int amdgpu_dm_process_dmub_aux_transfer_ /* The reply is stored in the top nibble of the command. */ payload->reply[0] = (adev->dm.dmub_notify->aux_reply.command >> 4) & 0xF;
- if (!payload->write && p_notify->aux_reply.length) + /*write req may receive a byte indicating partially written number as well*/ + if (p_notify->aux_reply.length) memcpy(payload->data, p_notify->aux_reply.data, p_notify->aux_reply.length);
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -66,6 +66,7 @@ static ssize_t dm_dp_aux_transfer(struct enum aux_return_code_type operation_result; struct amdgpu_device *adev; struct ddc_service *ddc; + uint8_t copy[16];
if (WARN_ON(msg->size > 16)) return -E2BIG; @@ -81,6 +82,11 @@ static ssize_t dm_dp_aux_transfer(struct (msg->request & DP_AUX_I2C_WRITE_STATUS_UPDATE) != 0; payload.defer_delay = 0;
+ if (payload.write) { + memcpy(copy, msg->buffer, msg->size); + payload.data = copy; + } + result = dc_link_aux_transfer_raw(TO_DM_AUX(aux)->ddc_service, &payload, &operation_result);
@@ -104,9 +110,9 @@ static ssize_t dm_dp_aux_transfer(struct */ if (payload.write && result >= 0) { if (result) { - /*one byte indicating partially written bytes. Force 0 to retry*/ + /*one byte indicating partially written bytes*/ drm_info(adev_to_drm(adev), "amdgpu: AUX partially written\n"); - result = 0; + result = payload.data[0]; } else if (!payload.reply[0]) /*I2C_ACK|AUX_ACK*/ result = msg->size;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin Wayne.Lin@amd.com
commit d33724ffb743d3d2698bd969e29253ae0cff9739 upstream.
It's expected that we'll encounter temporary exceptions during aux transactions. Adjust logging from drm_info to drm_dbg_dp to prevent flooding with unnecessary log messages.
Fixes: 3637e457eb00 ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Signed-off-by: Wayne Lin Wayne.Lin@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://lore.kernel.org/r/20250513032026.838036-1-Wayne.Lin@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 9a9c3e1fe5256da14a0a307dff0478f90c55fc8c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -111,7 +111,7 @@ static ssize_t dm_dp_aux_transfer(struct if (payload.write && result >= 0) { if (result) { /*one byte indicating partially written bytes*/ - drm_info(adev_to_drm(adev), "amdgpu: AUX partially written\n"); + drm_dbg_dp(adev_to_drm(adev), "amdgpu: AUX partially written\n"); result = payload.data[0]; } else if (!payload.reply[0]) /*I2C_ACK|AUX_ACK*/ @@ -137,11 +137,11 @@ static ssize_t dm_dp_aux_transfer(struct break; }
- drm_info(adev_to_drm(adev), "amdgpu: DP AUX transfer fail:%d\n", operation_result); + drm_dbg_dp(adev_to_drm(adev), "amdgpu: DP AUX transfer fail:%d\n", operation_result); }
if (payload.reply[0]) - drm_info(adev_to_drm(adev), "amdgpu: AUX reply command not ACK: 0x%02x.", + drm_dbg_dp(adev_to_drm(adev), "amdgpu: AUX reply command not ACK: 0x%02x.", payload.reply[0]);
return result;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Linton jeremy.linton@arm.com
commit adfab6b39202481bb43286fff94def4953793fdb upstream.
The original PPTT code had a bug where the processor subtable length was not correctly validated when encountering a truncated acpi_pptt_processor node.
Commit 7ab4f0e37a0f4 ("ACPI PPTT: Fix coding mistakes in a couple of sizeof() calls") attempted to fix this by validating the size is as large as the acpi_pptt_processor node structure. This introduced a regression where the last processor node in the PPTT table is ignored if it doesn't contain any private resources. That results errors like:
ACPI PPTT: PPTT table found, but unable to locate core XX (XX) ACPI: SPE must be homogeneous
Furthermore, it fails in a common case where the node length isn't equal to the acpi_pptt_processor structure size, leaving the original bug in a modified form.
Correct the regression by adjusting the loop termination conditions as suggested by the bug reporters. An additional check performed after the subtable node type is detected, validates the acpi_pptt_processor node is fully contained in the PPTT table. Repeating the check in acpi_pptt_leaf_node() is largely redundant as the node is already known to be fully contained in the table.
The case where a final truncated node's parent property is accepted, but the node itself is rejected should not be considered a bug.
Fixes: 7ab4f0e37a0f4 ("ACPI PPTT: Fix coding mistakes in a couple of sizeof() calls") Reported-by: Maximilian Heyne mheyne@amazon.de Closes: https://lore.kernel.org/linux-acpi/20250506-draco-taped-15f475cd@mheyne-amaz... Reported-by: Yicong Yang yangyicong@hisilicon.com Closes: https://lore.kernel.org/linux-acpi/20250507035124.28071-1-yangyicong@huawei.... Signed-off-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Yicong Yang yangyicong@hisilicon.com Reviewed-by: Sudeep Holla sudeep.holla@arm.com Tested-by: Maximilian Heyne mheyne@amazon.de Cc: All applicable stable@vger.kernel.org # 7ab4f0e37a0f4: ACPI PPTT: Fix coding mistakes ... Link: https://patch.msgid.link/20250508023025.1301030-1-jeremy.linton@arm.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/pptt.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -219,16 +219,18 @@ static int acpi_pptt_leaf_node(struct ac sizeof(struct acpi_table_pptt)); proc_sz = sizeof(struct acpi_pptt_processor);
- while ((unsigned long)entry + proc_sz < table_end) { + /* ignore subtable types that are smaller than a processor node */ + while ((unsigned long)entry + proc_sz <= table_end) { cpu_node = (struct acpi_pptt_processor *)entry; + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR && cpu_node->parent == node_entry) return 0; if (entry->length == 0) return 0; + entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, entry->length); - } return 1; } @@ -261,15 +263,18 @@ static struct acpi_pptt_processor *acpi_ proc_sz = sizeof(struct acpi_pptt_processor);
/* find the processor structure associated with this cpuid */ - while ((unsigned long)entry + proc_sz < table_end) { + while ((unsigned long)entry + proc_sz <= table_end) { cpu_node = (struct acpi_pptt_processor *)entry;
if (entry->length == 0) { pr_warn("Invalid zero length subtable\n"); break; } + /* entry->length may not equal proc_sz, revalidate the processor structure length */ if (entry->type == ACPI_PPTT_TYPE_PROCESSOR && acpi_cpu_id == cpu_node->acpi_processor_id && + (unsigned long)entry + entry->length <= table_end && + entry->length == proc_sz + cpu_node->number_of_priv_resources * sizeof(u32) && acpi_pptt_leaf_node(table_hdr, cpu_node)) { return (struct acpi_pptt_processor *)entry; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wentao Liang vulab@iscas.ac.cn
commit 9e000f1b7f31684cc5927e034360b87ac7919593 upstream.
The function snd_es1968_capture_open() calls the function snd_pcm_hw_constraint_pow2(), but does not check its return value. A proper implementation can be found in snd_cx25821_pcm_open().
Add error handling for snd_pcm_hw_constraint_pow2() and propagate its error code.
Fixes: b942cf815b57 ("[ALSA] es1968 - Fix stuttering capture") Cc: stable@vger.kernel.org # v2.6.22 Signed-off-by: Wentao Liang vulab@iscas.ac.cn Link: https://patch.msgid.link/20250514092444.331-1-vulab@iscas.ac.cn Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/es1968.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/sound/pci/es1968.c +++ b/sound/pci/es1968.c @@ -1569,7 +1569,7 @@ static int snd_es1968_capture_open(struc struct snd_pcm_runtime *runtime = substream->runtime; struct es1968 *chip = snd_pcm_substream_chip(substream); struct esschan *es; - int apu1, apu2; + int err, apu1, apu2;
apu1 = snd_es1968_alloc_apu_pair(chip, ESM_APU_PCM_CAPTURE); if (apu1 < 0) @@ -1613,7 +1613,9 @@ static int snd_es1968_capture_open(struc runtime->hw = snd_es1968_capture; runtime->hw.buffer_bytes_max = runtime->hw.period_bytes_max = calc_available_memory_size(chip) - 1024; /* keep MIXBUF size */ - snd_pcm_hw_constraint_pow2(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES); + err = snd_pcm_hw_constraint_pow2(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES); + if (err < 0) + return err;
spin_lock_irq(&chip->substream_lock); list_add(&es->list, &chip->substream_list);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Heusel christian@heusel.eu
commit 2b24eb060c2bb9ef79e1d3bcf633ba1bc95215d6 upstream.
A user reported on the Arch Linux Forums that their device is emitting the following message in the kernel journal, which is fixed by adding the quirk as submitted in this patch:
> kernel: usb 1-2: current rate 8436480 is different from the runtime rate 48000
There also is an entry for this product line added long time ago. Their specific device has the following ID:
$ lsusb | grep Audio Bus 001 Device 002: ID 1101:0003 EasyPass Industrial Co., Ltd Audioengine D1
Link: https://bbs.archlinux.org/viewtopic.php?id=305494 Fixes: 93f9d1a4ac593 ("ALSA: usb-audio: Apply sample rate quirk for Audioengine D1") Cc: stable@vger.kernel.org Signed-off-by: Christian Heusel christian@heusel.eu Link: https://patch.msgid.link/20250512-audioengine-quirk-addition-v1-1-4c370af6ef... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2146,6 +2146,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_FIXED_RATE), DEVICE_FLG(0x0fd9, 0x0008, /* Hauppauge HVR-950Q */ QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER), + DEVICE_FLG(0x1101, 0x0003, /* Audioengine D1 */ + QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x1224, 0x2a25, /* Jieli Technology USB PHY 2.0 */ QUIRK_FLAG_GET_SAMPLE_RATE | QUIRK_FLAG_MIC_RES_16), DEVICE_FLG(0x1395, 0x740a, /* Sennheiser DECT */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Chauvet kwizart@gmail.com
commit 7b9938a14460e8ec7649ca2e80ac0aae9815bf02 upstream.
Microdia JP001 does not support reading the sample rate which leads to many lines of "cannot get freq at ep 0x84". This patch adds the USB ID to quirks.c and avoids those error messages.
usb 7-4: New USB device found, idVendor=0c45, idProduct=636b, bcdDevice= 1.00 usb 7-4: New USB device strings: Mfr=2, Product=1, SerialNumber=3 usb 7-4: Product: JP001 usb 7-4: Manufacturer: JP001 usb 7-4: SerialNumber: JP001 usb 7-4: 3:1: cannot get freq at ep 0x84
Cc: stable@vger.kernel.org Signed-off-by: Nicolas Chauvet kwizart@gmail.com Link: https://patch.msgid.link/20250515102132.73062-1-kwizart@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2138,6 +2138,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_CTL_MSG_DELAY_1M), DEVICE_FLG(0x0c45, 0x6340, /* Sonix HD USB Camera */ QUIRK_FLAG_GET_SAMPLE_RATE), + DEVICE_FLG(0x0c45, 0x636b, /* Microdia JP001 USB Camera */ + QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x0d8c, 0x0014, /* USB Audio Device */ QUIRK_FLAG_CTL_MSG_DELAY_1M), DEVICE_FLG(0x0ecb, 0x205c, /* JBL Quantum610 Wireless */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hyejeong Choi hjeong.choi@samsung.com
commit 72c7d62583ebce7baeb61acce6057c361f73be4a upstream.
smp_store_mb() inserts memory barrier after storing operation. It is different with what the comment is originally aiming so Null pointer dereference can be happened if memory update is reordered.
Signed-off-by: Hyejeong Choi hjeong.choi@samsung.com Fixes: a590d0fdbaa5 ("dma-buf: Update reservation shared_count after adding the new fence") CC: stable@vger.kernel.org Reviewed-by: Christian König christian.koenig@amd.com Link: https://lore.kernel.org/r/20250513020638.GA2329653@au1-maretx-p37.eng.sarc.s... Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma-buf/dma-resv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -308,8 +308,9 @@ void dma_resv_add_fence(struct dma_resv count++;
dma_resv_list_set(fobj, i, fence, usage); - /* pointer update must be visible before we extend the num_fences */ - smp_store_mb(fobj->num_fences, count); + /* fence update must be visible before we extend the num_fences */ + smp_wmb(); + fobj->num_fences = count; } EXPORT_SYMBOL(dma_resv_add_fence);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 4f98616b855cb0e3b5917918bb07b44728eb96ea upstream.
netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host.
There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping.
Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-)
--- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -1082,6 +1082,42 @@ static int netvsc_dma_map(struct hv_devi return 0; }
+/* Build an "array" of mpb entries describing the data to be transferred + * over VMBus. After the desc header fields, each "array" entry is variable + * size, and each entry starts after the end of the previous entry. The + * "offset" and "len" fields for each entry imply the size of the entry. + * + * The pfns are in HV_HYP_PAGE_SIZE, because all communication with Hyper-V + * uses that granularity, even if the system page size of the guest is larger. + * Each entry in the input "pb" array must describe a contiguous range of + * guest physical memory so that the pfns are sequential if the range crosses + * a page boundary. The offset field must be < HV_HYP_PAGE_SIZE. + */ +static inline void netvsc_build_mpb_array(struct hv_page_buffer *pb, + u32 page_buffer_count, + struct vmbus_packet_mpb_array *desc, + u32 *desc_size) +{ + struct hv_mpb_array *mpb_entry = &desc->range; + int i, j; + + for (i = 0; i < page_buffer_count; i++) { + u32 offset = pb[i].offset; + u32 len = pb[i].len; + + mpb_entry->offset = offset; + mpb_entry->len = len; + + for (j = 0; j < HVPFN_UP(offset + len); j++) + mpb_entry->pfn_array[j] = pb[i].pfn + j; + + mpb_entry = (struct hv_mpb_array *)&mpb_entry->pfn_array[j]; + } + + desc->rangecount = page_buffer_count; + *desc_size = (char *)mpb_entry - (char *)desc; +} + static inline int netvsc_send_pkt( struct hv_device *device, struct hv_netvsc_packet *packet, @@ -1124,6 +1160,9 @@ static inline int netvsc_send_pkt(
packet->dma_range = NULL; if (packet->page_buf_cnt) { + struct vmbus_channel_packet_page_buffer desc; + u32 desc_size; + if (packet->cp_partial) pb += packet->rmsg_pgcnt;
@@ -1133,11 +1172,12 @@ static inline int netvsc_send_pkt( goto exit; }
- ret = vmbus_sendpacket_pagebuffer(out_channel, - pb, packet->page_buf_cnt, - &nvmsg, sizeof(nvmsg), - req_id); - + netvsc_build_mpb_array(pb, packet->page_buf_cnt, + (struct vmbus_packet_mpb_array *)&desc, + &desc_size); + ret = vmbus_sendpacket_mpb_desc(out_channel, + (struct vmbus_packet_mpb_array *)&desc, + desc_size, &nvmsg, sizeof(nvmsg), req_id); if (ret) netvsc_dma_unmap(ndev_ctx->device_ctx, packet); } else {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 41a6328b2c55276f89ea3812069fd7521e348bbf upstream.
Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux.
The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error.
The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues.
After commit 14ad6ed30a10 ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent.
Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost.
Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array.
This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb.
Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 12 +++++++ drivers/net/hyperv/netvsc_drv.c | 65 +++++++++----------------------------- drivers/net/hyperv/rndis_filter.c | 24 ++------------ 3 files changed, 33 insertions(+), 68 deletions(-)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -891,6 +891,18 @@ struct nvsp_message { sizeof(struct nvsp_message)) #define NETVSC_MIN_IN_MSG_SIZE sizeof(struct vmpacket_descriptor)
+/* Maximum # of contiguous data ranges that can make up a trasmitted packet. + * Typically it's the max SKB fragments plus 2 for the rndis packet and the + * linear portion of the SKB. But if MAX_SKB_FRAGS is large, the value may + * need to be limited to MAX_PAGE_BUFFER_COUNT, which is the max # of entries + * in a GPA direct packet sent to netvsp over VMBus. + */ +#if MAX_SKB_FRAGS + 2 < MAX_PAGE_BUFFER_COUNT +#define MAX_DATA_RANGES (MAX_SKB_FRAGS + 2) +#else +#define MAX_DATA_RANGES MAX_PAGE_BUFFER_COUNT +#endif + /* Estimated requestor size: * out_ring_size/min_out_msg_size + in_ring_size/min_in_msg_size */ --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -325,43 +325,10 @@ static u16 netvsc_select_queue(struct ne return txq; }
-static u32 fill_pg_buf(unsigned long hvpfn, u32 offset, u32 len, - struct hv_page_buffer *pb) -{ - int j = 0; - - hvpfn += offset >> HV_HYP_PAGE_SHIFT; - offset = offset & ~HV_HYP_PAGE_MASK; - - while (len > 0) { - unsigned long bytes; - - bytes = HV_HYP_PAGE_SIZE - offset; - if (bytes > len) - bytes = len; - pb[j].pfn = hvpfn; - pb[j].offset = offset; - pb[j].len = bytes; - - offset += bytes; - len -= bytes; - - if (offset == HV_HYP_PAGE_SIZE && len) { - hvpfn++; - offset = 0; - j++; - } - } - - return j + 1; -} - static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb, struct hv_netvsc_packet *packet, struct hv_page_buffer *pb) { - u32 slots_used = 0; - char *data = skb->data; int frags = skb_shinfo(skb)->nr_frags; int i;
@@ -370,28 +337,28 @@ static u32 init_page_array(void *hdr, u3 * 2. skb linear data * 3. skb fragment data */ - slots_used += fill_pg_buf(virt_to_hvpfn(hdr), - offset_in_hvpage(hdr), - len, - &pb[slots_used]);
+ pb[0].offset = offset_in_hvpage(hdr); + pb[0].len = len; + pb[0].pfn = virt_to_hvpfn(hdr); packet->rmsg_size = len; - packet->rmsg_pgcnt = slots_used; + packet->rmsg_pgcnt = 1;
- slots_used += fill_pg_buf(virt_to_hvpfn(data), - offset_in_hvpage(data), - skb_headlen(skb), - &pb[slots_used]); + pb[1].offset = offset_in_hvpage(skb->data); + pb[1].len = skb_headlen(skb); + pb[1].pfn = virt_to_hvpfn(skb->data);
for (i = 0; i < frags; i++) { skb_frag_t *frag = skb_shinfo(skb)->frags + i; - - slots_used += fill_pg_buf(page_to_hvpfn(skb_frag_page(frag)), - skb_frag_off(frag), - skb_frag_size(frag), - &pb[slots_used]); + struct hv_page_buffer *cur_pb = &pb[i + 2]; + u64 pfn = page_to_hvpfn(skb_frag_page(frag)); + u32 offset = skb_frag_off(frag); + + cur_pb->offset = offset_in_hvpage(offset); + cur_pb->len = skb_frag_size(frag); + cur_pb->pfn = pfn + (offset >> HV_HYP_PAGE_SHIFT); } - return slots_used; + return frags + 2; }
static int count_skb_frag_slots(struct sk_buff *skb) @@ -482,7 +449,7 @@ static int netvsc_xmit(struct sk_buff *s struct net_device *vf_netdev; u32 rndis_msg_size; u32 hash; - struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT]; + struct hv_page_buffer pb[MAX_DATA_RANGES];
/* If VF is present and up then redirect packets to it. * Skip the VF if it is marked down or has no carrier. --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -225,8 +225,7 @@ static int rndis_filter_send_request(str struct rndis_request *req) { struct hv_netvsc_packet *packet; - struct hv_page_buffer page_buf[2]; - struct hv_page_buffer *pb = page_buf; + struct hv_page_buffer pb; int ret;
/* Setup the packet to send it */ @@ -235,27 +234,14 @@ static int rndis_filter_send_request(str packet->total_data_buflen = req->request_msg.msg_len; packet->page_buf_cnt = 1;
- pb[0].pfn = virt_to_phys(&req->request_msg) >> - HV_HYP_PAGE_SHIFT; - pb[0].len = req->request_msg.msg_len; - pb[0].offset = offset_in_hvpage(&req->request_msg); - - /* Add one page_buf when request_msg crossing page boundary */ - if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) { - packet->page_buf_cnt++; - pb[0].len = HV_HYP_PAGE_SIZE - - pb[0].offset; - pb[1].pfn = virt_to_phys((void *)&req->request_msg - + pb[0].len) >> HV_HYP_PAGE_SHIFT; - pb[1].offset = 0; - pb[1].len = req->request_msg.msg_len - - pb[0].len; - } + pb.pfn = virt_to_phys(&req->request_msg) >> HV_HYP_PAGE_SHIFT; + pb.len = req->request_msg.msg_len; + pb.offset = offset_in_hvpage(&req->request_msg);
trace_rndis_send(dev->ndev, 0, &req->request_msg);
rcu_read_lock_bh(); - ret = netvsc_send(dev->ndev, packet, NULL, pb, NULL, false); + ret = netvsc_send(dev->ndev, packet, NULL, &pb, NULL, false); rcu_read_unlock_bh();
return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 5bbc644bbf4e97a05bc0cb052189004588ff8a09 upstream.
init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed.
Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 1 - drivers/net/hyperv/netvsc.c | 7 +++---- drivers/net/hyperv/netvsc_drv.c | 1 - 3 files changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -156,7 +156,6 @@ struct hv_netvsc_packet { u8 cp_partial; /* partial copy into send buffer */
u8 rmsg_size; /* RNDIS header and PPI size */ - u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */ u8 page_buf_cnt;
u16 q_idx; --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -980,8 +980,7 @@ static void netvsc_copy_to_send_buf(stru + pend_size; int i; u32 padding = 0; - u32 page_count = packet->cp_partial ? packet->rmsg_pgcnt : - packet->page_buf_cnt; + u32 page_count = packet->cp_partial ? 1 : packet->page_buf_cnt; u32 remain;
/* Add padding */ @@ -1164,7 +1163,7 @@ static inline int netvsc_send_pkt( u32 desc_size;
if (packet->cp_partial) - pb += packet->rmsg_pgcnt; + pb++;
ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb); if (ret) { @@ -1326,7 +1325,7 @@ int netvsc_send(struct net_device *ndev, packet->send_buf_index = section_index;
if (packet->cp_partial) { - packet->page_buf_cnt -= packet->rmsg_pgcnt; + packet->page_buf_cnt--; packet->total_data_buflen = msd_len + packet->rmsg_size; } else { packet->page_buf_cnt = 0; --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -342,7 +342,6 @@ static u32 init_page_array(void *hdr, u3 pb[0].len = len; pb[0].pfn = virt_to_hvpfn(hdr); packet->rmsg_size = len; - packet->rmsg_pgcnt = 1;
pb[1].offset = offset_in_hvpage(skb->data); pb[1].len = skb_headlen(skb);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 380b75d3078626aadd0817de61f3143f5db6e393 upstream.
vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor.
Update the storvsc driver to reflect this new approach.
Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel.c | 6 +++--- drivers/scsi/storvsc_drv.c | 1 + 2 files changed, 4 insertions(+), 3 deletions(-)
--- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -1172,9 +1172,10 @@ int vmbus_sendpacket_pagebuffer(struct v EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
/* - * vmbus_sendpacket_multipagebuffer - Send a multi-page buffer packet + * vmbus_sendpacket_mpb_desc - Send one or more multi-page buffer packets * using a GPADL Direct packet type. - * The buffer includes the vmbus descriptor. + * The desc argument must include space for the VMBus descriptor. The + * rangecount field must already be set. */ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel, struct vmbus_packet_mpb_array *desc, @@ -1196,7 +1197,6 @@ int vmbus_sendpacket_mpb_desc(struct vmb desc->length8 = (u16)(packetlen_aligned >> 3); desc->transactionid = VMBUS_RQST_ERROR; /* will be updated in hv_ringbuffer_write() */ desc->reserved = 0; - desc->rangecount = 1;
bufferlist[0].iov_base = desc; bufferlist[0].iov_len = desc_size; --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1810,6 +1810,7 @@ static int storvsc_queuecommand(struct S return SCSI_MLQUEUE_DEVICE_BUSY; }
+ payload->rangecount = 1; payload->range.len = length; payload->range.offset = offset_in_hvpg;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 45a442fe369e6c4e0b4aa9f63b31c3f2f9e2090e upstream.
With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it.
Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel.c | 59 ------------------------------------------------- include/linux/hyperv.h | 7 ----- 2 files changed, 66 deletions(-)
--- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -1113,65 +1113,6 @@ int vmbus_sendpacket(struct vmbus_channe EXPORT_SYMBOL(vmbus_sendpacket);
/* - * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer - * packets using a GPADL Direct packet type. This interface allows you - * to control notifying the host. This will be useful for sending - * batched data. Also the sender can control the send flags - * explicitly. - */ -int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel, - struct hv_page_buffer pagebuffers[], - u32 pagecount, void *buffer, u32 bufferlen, - u64 requestid) -{ - int i; - struct vmbus_channel_packet_page_buffer desc; - u32 descsize; - u32 packetlen; - u32 packetlen_aligned; - struct kvec bufferlist[3]; - u64 aligned_data = 0; - - if (pagecount > MAX_PAGE_BUFFER_COUNT) - return -EINVAL; - - /* - * Adjust the size down since vmbus_channel_packet_page_buffer is the - * largest size we support - */ - descsize = sizeof(struct vmbus_channel_packet_page_buffer) - - ((MAX_PAGE_BUFFER_COUNT - pagecount) * - sizeof(struct hv_page_buffer)); - packetlen = descsize + bufferlen; - packetlen_aligned = ALIGN(packetlen, sizeof(u64)); - - /* Setup the descriptor */ - desc.type = VM_PKT_DATA_USING_GPA_DIRECT; - desc.flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; - desc.dataoffset8 = descsize >> 3; /* in 8-bytes granularity */ - desc.length8 = (u16)(packetlen_aligned >> 3); - desc.transactionid = VMBUS_RQST_ERROR; /* will be updated in hv_ringbuffer_write() */ - desc.reserved = 0; - desc.rangecount = pagecount; - - for (i = 0; i < pagecount; i++) { - desc.range[i].len = pagebuffers[i].len; - desc.range[i].offset = pagebuffers[i].offset; - desc.range[i].pfn = pagebuffers[i].pfn; - } - - bufferlist[0].iov_base = &desc; - bufferlist[0].iov_len = descsize; - bufferlist[1].iov_base = buffer; - bufferlist[1].iov_len = bufferlen; - bufferlist[2].iov_base = &aligned_data; - bufferlist[2].iov_len = (packetlen_aligned - packetlen); - - return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL); -} -EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer); - -/* * vmbus_sendpacket_mpb_desc - Send one or more multi-page buffer packets * using a GPADL Direct packet type. * The desc argument must include space for the VMBus descriptor. The --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1224,13 +1224,6 @@ extern int vmbus_sendpacket(struct vmbus enum vmbus_packet_type type, u32 flags);
-extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel, - struct hv_page_buffer pagebuffers[], - u32 pagecount, - void *buffer, - u32 bufferlen, - u64 requestid); - extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel, struct vmbus_packet_mpb_array *mpb, u32 desc_size,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: pengdonglin pengdonglin@xiaomi.com
commit e333332657f615ac2b55aa35565c4a882018bbe9 upstream.
When using the stacktrace trigger command to trace syscalls, the preemption count was consistently reported as 1 when the system call event itself had 0 (".").
For example:
root@ubuntu22-vm:/sys/kernel/tracing/events/syscalls/sys_enter_read $ echo stacktrace > trigger $ echo 1 > enable
sshd-416 [002] ..... 232.864910: sys_read(fd: a, buf: 556b1f3221d0, count: 8000) sshd-416 [002] ...1. 232.864913: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe
The root cause is that the trace framework disables preemption in __DO_TRACE before invoking the trigger callback.
Use the tracing_gen_ctx_dec() that will accommodate for the increase of the preemption count in __DO_TRACE when calling the callback. The result is the accurate reporting of:
sshd-410 [004] ..... 210.117660: sys_read(fd: 4, buf: 559b725ba130, count: 40000) sshd-410 [004] ..... 210.117662: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe
Cc: stable@vger.kernel.org Fixes: ce33c845b030c ("tracing: Dump stacktrace trigger to the corresponding instance") Link: https://lore.kernel.org/20250512094246.1167956-1-dolinux.peng@gmail.com Signed-off-by: pengdonglin dolinux.peng@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_trigger.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1539,7 +1539,7 @@ stacktrace_trigger(struct event_trigger_ struct trace_event_file *file = data->private_data;
if (file) - __trace_stack(file->tr, tracing_gen_ctx(), STACK_SKIP); + __trace_stack(file->tr, tracing_gen_ctx_dec(), STACK_SKIP); else trace_dump_stack(STACK_SKIP); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: pengdonglin pengdonglin@xiaomi.com
commit 11aff32439df6ca5b3b891b43032faf88f4a6a29 upstream.
The preemption count of the stacktrace filter command to trace ksys_read is consistently incorrect:
$ echo ksys_read:stacktrace > set_ftrace_filter
<...>-453 [004] ...1. 38.308956: <stack trace> => ksys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe
The root cause is that the trace framework disables preemption when invoking the filter command callback in function_trace_probe_call:
preempt_disable_notrace(); probe_ops->func(ip, parent_ip, probe_opsbe->tr, probe_ops, probe->data); preempt_enable_notrace();
Use tracing_gen_ctx_dec() to account for the preempt_disable_notrace(), which will output the correct preemption count:
$ echo ksys_read:stacktrace > set_ftrace_filter
<...>-410 [006] ..... 31.420396: <stack trace> => ksys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe
Cc: stable@vger.kernel.org Fixes: 36590c50b2d07 ("tracing: Merge irqflags + preempt counter.") Link: https://lore.kernel.org/20250512094246.1167956-2-dolinux.peng@gmail.com Signed-off-by: pengdonglin dolinux.peng@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_functions.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/kernel/trace/trace_functions.c +++ b/kernel/trace/trace_functions.c @@ -561,11 +561,7 @@ ftrace_traceoff(unsigned long ip, unsign
static __always_inline void trace_stack(struct trace_array *tr) { - unsigned int trace_ctx; - - trace_ctx = tracing_gen_ctx(); - - __trace_stack(tr, trace_ctx, FTRACE_STACK_SKIP); + __trace_stack(tr, tracing_gen_ctx_dec(), FTRACE_STACK_SKIP); }
static void
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 1b0c192c92ea1fe2dcb178f84adf15fe37c3e7c8 upstream.
When using trace_array_printk() on a created instance, the correct function to use to initialize it is:
trace_array_init_printk()
Not
trace_printk_init_buffer()
The former is a proper function to use, the latter is for initializing trace_printk() and causes the NOTICE banner to be displayed.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Divya Indi divya.indi@oracle.com Link: https://lore.kernel.org/20250509152657.0f6744d9@gandalf.local.home Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Fixes: 38ce2a9e33db6 ("tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- samples/ftrace/sample-trace-array.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/samples/ftrace/sample-trace-array.c +++ b/samples/ftrace/sample-trace-array.c @@ -112,7 +112,7 @@ static int __init sample_trace_array_ini /* * If context specific per-cpu buffers havent already been allocated. */ - trace_printk_init_buffers(); + trace_array_init_printk(tr);
simple_tsk = kthread_run(simple_thread, NULL, "sample-instance"); if (IS_ERR(simple_tsk)) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Ke make24@iscas.ac.cn
commit b2ea5f49580c0762d17d80d8083cb89bc3acf74f upstream.
If device_add() fails, do not use device_unregister() for error handling. device_unregister() consists two functions: device_del() and put_device(). device_unregister() should only be called after device_add() succeeded because device_del() undoes what device_add() does if successful. Change device_unregister() to put_device() call before returning from the function.
As comment of device_add() says, 'if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable@vger.kernel.org Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support") Signed-off-by: Ma Ke make24@iscas.ac.cn Acked-by: Thierry Reding treding@nvidia.com Link: https://lore.kernel.org/r/20250303072739.3874987-1-make24@iscas.ac.cn Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/tegra/xusb.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/phy/tegra/xusb.c +++ b/drivers/phy/tegra/xusb.c @@ -542,16 +542,16 @@ static int tegra_xusb_port_init(struct t
err = dev_set_name(&port->dev, "%s-%u", name, index); if (err < 0) - goto unregister; + goto put_device;
err = device_add(&port->dev); if (err < 0) - goto unregister; + goto put_device;
return 0;
-unregister: - device_unregister(&port->dev); +put_device: + put_device(&port->dev); return err; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
commit 54c4c58713aaff76c2422ff5750e557ab3b100d7 upstream.
It has been observed on the Renesas RZ/G3S SoC that unbinding and binding the PHY driver leads to role autodetection failures. This issue occurs when PHY 3 is the first initialized PHY. PHY 3 does not have an interrupt associated with the USB2_INT_ENABLE register (as rcar_gen3_int_enable[3] = 0). As a result, rcar_gen3_init_otg() is called to initialize OTG without enabling PHY interrupts.
To resolve this, add rcar_gen3_is_any_otg_rphy_initialized() and call it in role_store(), role_show(), and rcar_gen3_init_otg(). At the same time, rcar_gen3_init_otg() is only called when initialization for a PHY with interrupt bits is in progress. As a result, the struct rcar_gen3_phy::otg_initialized is no longer needed.
Fixes: 549b6b55b005 ("phy: renesas: rcar-gen3-usb2: enable/disable independent irqs") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Tested-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Link: https://lore.kernel.org/r/20250507125032.565017-2-claudiu.beznea.uj@bp.renes... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 33 +++++++++++++------------------ 1 file changed, 14 insertions(+), 19 deletions(-)
--- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c +++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c @@ -103,7 +103,6 @@ struct rcar_gen3_phy { struct rcar_gen3_chan *ch; u32 int_enable_bits; bool initialized; - bool otg_initialized; bool powered; };
@@ -311,16 +310,15 @@ static bool rcar_gen3_is_any_rphy_initia return false; }
-static bool rcar_gen3_needs_init_otg(struct rcar_gen3_chan *ch) +static bool rcar_gen3_is_any_otg_rphy_initialized(struct rcar_gen3_chan *ch) { - int i; - - for (i = 0; i < NUM_OF_PHYS; i++) { - if (ch->rphys[i].otg_initialized) - return false; + for (enum rcar_gen3_phy_index i = PHY_INDEX_BOTH_HC; i <= PHY_INDEX_EHCI; + i++) { + if (ch->rphys[i].initialized) + return true; }
- return true; + return false; }
static bool rcar_gen3_are_all_rphys_power_off(struct rcar_gen3_chan *ch) @@ -342,7 +340,7 @@ static ssize_t role_store(struct device bool is_b_device; enum phy_mode cur_mode, new_mode;
- if (!ch->is_otg_channel || !rcar_gen3_is_any_rphy_initialized(ch)) + if (!ch->is_otg_channel || !rcar_gen3_is_any_otg_rphy_initialized(ch)) return -EIO;
if (sysfs_streq(buf, "host")) @@ -380,7 +378,7 @@ static ssize_t role_show(struct device * { struct rcar_gen3_chan *ch = dev_get_drvdata(dev);
- if (!ch->is_otg_channel || !rcar_gen3_is_any_rphy_initialized(ch)) + if (!ch->is_otg_channel || !rcar_gen3_is_any_otg_rphy_initialized(ch)) return -EIO;
return sprintf(buf, "%s\n", rcar_gen3_is_host(ch) ? "host" : @@ -393,6 +391,9 @@ static void rcar_gen3_init_otg(struct rc void __iomem *usb2_base = ch->base; u32 val;
+ if (!ch->is_otg_channel || rcar_gen3_is_any_otg_rphy_initialized(ch)) + return; + /* Should not use functions of read-modify-write a register */ val = readl(usb2_base + USB2_LINECTRL1); val = (val & ~USB2_LINECTRL1_DP_RPD) | USB2_LINECTRL1_DPRPD_EN | @@ -456,12 +457,9 @@ static int rcar_gen3_phy_usb2_init(struc writel(USB2_SPD_RSM_TIMSET_INIT, usb2_base + USB2_SPD_RSM_TIMSET); writel(USB2_OC_TIMSET_INIT, usb2_base + USB2_OC_TIMSET);
- /* Initialize otg part */ - if (channel->is_otg_channel) { - if (rcar_gen3_needs_init_otg(channel)) - rcar_gen3_init_otg(channel); - rphy->otg_initialized = true; - } + /* Initialize otg part (only if we initialize a PHY with IRQs). */ + if (rphy->int_enable_bits) + rcar_gen3_init_otg(channel);
rphy->initialized = true;
@@ -477,9 +475,6 @@ static int rcar_gen3_phy_usb2_exit(struc
rphy->initialized = false;
- if (channel->is_otg_channel) - rphy->otg_initialized = false; - val = readl(usb2_base + USB2_INT_ENABLE); val &= ~rphy->int_enable_bits; if (!rcar_gen3_is_any_rphy_initialized(channel))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
commit 86e70849f4b2b4597ac9f7c7931f2a363774be25 upstream.
phy-rcar-gen3-usb2 driver exports 4 PHYs. The timing registers are common to all PHYs. There is no need to set them every time a PHY is initialized. Set timing register only when the 1st PHY is initialized.
Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver") Cc: stable@vger.kernel.org Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Tested-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Link: https://lore.kernel.org/r/20250507125032.565017-6-claudiu.beznea.uj@bp.renes... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c +++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c @@ -454,8 +454,11 @@ static int rcar_gen3_phy_usb2_init(struc val = readl(usb2_base + USB2_INT_ENABLE); val |= USB2_INT_ENABLE_UCOM_INTEN | rphy->int_enable_bits; writel(val, usb2_base + USB2_INT_ENABLE); - writel(USB2_SPD_RSM_TIMSET_INIT, usb2_base + USB2_SPD_RSM_TIMSET); - writel(USB2_OC_TIMSET_INIT, usb2_base + USB2_OC_TIMSET); + + if (!rcar_gen3_is_any_rphy_initialized(channel)) { + writel(USB2_SPD_RSM_TIMSET_INIT, usb2_base + USB2_SPD_RSM_TIMSET); + writel(USB2_OC_TIMSET_INIT, usb2_base + USB2_OC_TIMSET); + }
/* Initialize otg part (only if we initialize a PHY with IRQs). */ if (rphy->int_enable_bits)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve Siwinski ssiwinski@atto.com
commit e8007fad5457ea547ca63bb011fdb03213571c7e upstream.
The REPORT ZONES buffer size is currently limited by the HBA's maximum segment count to ensure the buffer can be mapped. However, the block layer further limits the number of iovec entries to 1024 when allocating a bio.
To avoid allocation of buffers too large to be mapped, further restrict the maximum buffer size to BIO_MAX_INLINE_VECS.
Replace the UIO_MAXIOV symbolic name with the more contextually appropriate BIO_MAX_INLINE_VECS.
Fixes: b091ac616846 ("sd_zbc: Fix report zones buffer allocation") Cc: stable@vger.kernel.org Signed-off-by: Steve Siwinski ssiwinski@atto.com Link: https://lore.kernel.org/r/20250508200122.243129-1-ssiwinski@atto.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/bio.c | 2 +- drivers/scsi/sd_zbc.c | 6 +++++- include/linux/bio.h | 1 + 3 files changed, 7 insertions(+), 2 deletions(-)
--- a/block/bio.c +++ b/block/bio.c @@ -576,7 +576,7 @@ struct bio *bio_kmalloc(unsigned short n { struct bio *bio;
- if (nr_vecs > UIO_MAXIOV) + if (nr_vecs > BIO_MAX_INLINE_VECS) return NULL; return kmalloc(struct_size(bio, bi_inline_vecs, nr_vecs), gfp_mask); } --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -197,6 +197,7 @@ static void *sd_zbc_alloc_report_buffer( unsigned int nr_zones, size_t *buflen) { struct request_queue *q = sdkp->disk->queue; + unsigned int max_segments; size_t bufsize; void *buf;
@@ -208,12 +209,15 @@ static void *sd_zbc_alloc_report_buffer( * Furthermore, since the report zone command cannot be split, make * sure that the allocated buffer can always be mapped by limiting the * number of pages allocated to the HBA max segments limit. + * Since max segments can be larger than the max inline bio vectors, + * further limit the allocated buffer to BIO_MAX_INLINE_VECS. */ nr_zones = min(nr_zones, sdkp->zone_info.nr_zones); bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE); bufsize = min_t(size_t, bufsize, queue_max_hw_sectors(q) << SECTOR_SHIFT); - bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT); + max_segments = min(BIO_MAX_INLINE_VECS, queue_max_segments(q)); + bufsize = min_t(size_t, bufsize, max_segments << PAGE_SHIFT);
while (bufsize >= SECTOR_SIZE) { buf = kvzalloc(bufsize, GFP_KERNEL | __GFP_NORETRY); --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -11,6 +11,7 @@ #include <linux/uio.h>
#define BIO_MAX_VECS 256U +#define BIO_MAX_INLINE_VECS UIO_MAXIOV
static inline unsigned int bio_max_segs(unsigned int nr_segs) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jethro Donaldson devel@jro.nz
commit 1fe4a44b7fa3955bcb7b4067c07b778fe90d8ee7 upstream.
The response buffer for the CREATE request handled by smb311_posix_mkdir() is leaked on the error path (goto err_free_rsp_buf) because the structure pointer *rsp passed to free_rsp_buf() is not assigned until *after* the error condition is checked.
As *rsp is initialised to NULL, free_rsp_buf() becomes a no-op and the leak is instead reported by __kmem_cache_shutdown() upon subsequent rmmod of cifs.ko if (and only if) the error path has been hit.
Pass rsp_iov.iov_base to free_rsp_buf() instead, similar to the code in other functions in smb2pdu.c for which *rsp is assigned late.
Cc: stable@vger.kernel.org Signed-off-by: Jethro Donaldson devel@jro.nz Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -2826,7 +2826,7 @@ int smb311_posix_mkdir(const unsigned in /* Eventually save off posix specific response info and timestaps */
err_free_rsp_buf: - free_rsp_buf(resp_buftype, rsp); + free_rsp_buf(resp_buftype, rsp_iov.iov_base); kfree(pc_buf); err_free_req: cifs_small_buf_release(req);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit 78ab4be549533432d97ea8989d2f00b508fa68d8 upstream.
A warning on driver removal started occurring after commit 9dd05df8403b ("net: warn if NAPI instance wasn't shut down"). Disable tx napi before deleting it in mt76_dma_cleanup().
WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy) Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024 RIP: 0010:__netif_napi_del_locked+0xf0/0x100 Call Trace: <TASK> mt76_dma_cleanup+0x54/0x2f0 [mt76] mt7921_pci_remove+0xd5/0x190 [mt7921e] pci_device_remove+0x47/0xc0 device_release_driver_internal+0x19e/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xb0 __do_sys_delete_module.isra.0+0x197/0x2e0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Tested with mt7921e but the same pattern can be actually applied to other mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled in their *_dma_init() functions and only toggled off and on again inside their suspend/resume/reset paths. So it should be okay to disable tx napi in such a generic way.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Tested-by: Ming Yen Hsieh mingyen.hsieh@mediatek.com Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/mediatek/mt76/dma.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -789,6 +789,7 @@ void mt76_dma_cleanup(struct mt76_dev *d int i;
mt76_worker_disable(&dev->tx_worker); + napi_disable(&dev->tx_napi); netif_napi_del(&dev->tx_napi);
for (i = 0; i < ARRAY_SIZE(dev->phys); i++) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 6b3ab7f2cbfaeb6580709cd8ef4d72cfd01bfde4 upstream.
After a recent change [1] in clang's randstruct implementation to randomize structures that only contain function pointers, there is an error because qede_ll_ops get randomized but does not use a designated initializer for the first member:
drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer 206 | { | ^
Explicitly initialize the common member using a designated initializer to fix the build.
Cc: stable@vger.kernel.org Fixes: 035f7f87b729 ("randstruct: Enable Clang support") Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b2... [1] Signed-off-by: Nathan Chancellor nathan@kernel.org Link: https://patch.msgid.link/20250507-qede-fix-clang-randstruct-v1-1-5ccc15626fb... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -204,7 +204,7 @@ static struct pci_driver qede_pci_driver };
static struct qed_eth_cb_ops qede_ll_ops = { - { + .common = { #ifdef CONFIG_RFS_ACCEL .arfs_filter_op = qede_arfs_filter_op, #endif
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ronald Wahl ronald.wahl@legrand.com
commit fca280992af8c2fbd511bc43f65abb4a17363f2f upstream.
Recent kernels complain about a missing lock in k3-udma.c when the lock validator is enabled:
[ 4.128073] WARNING: CPU: 0 PID: 746 at drivers/dma/ti/../virt-dma.h:169 udma_start.isra.0+0x34/0x238 [ 4.137352] CPU: 0 UID: 0 PID: 746 Comm: kworker/0:3 Not tainted 6.12.9-arm64 #28 [ 4.144867] Hardware name: pp-v12 (DT) [ 4.148648] Workqueue: events udma_check_tx_completion [ 4.153841] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.160834] pc : udma_start.isra.0+0x34/0x238 [ 4.165227] lr : udma_start.isra.0+0x30/0x238 [ 4.169618] sp : ffffffc083cabcf0 [ 4.172963] x29: ffffffc083cabcf0 x28: 0000000000000000 x27: ffffff800001b005 [ 4.180167] x26: ffffffc0812f0000 x25: 0000000000000000 x24: 0000000000000000 [ 4.187370] x23: 0000000000000001 x22: 00000000e21eabe9 x21: ffffff8000fa0670 [ 4.194571] x20: ffffff8001b6bf00 x19: ffffff8000fa0430 x18: ffffffc083b95030 [ 4.201773] x17: 0000000000000000 x16: 00000000f0000000 x15: 0000000000000048 [ 4.208976] x14: 0000000000000048 x13: 0000000000000000 x12: 0000000000000001 [ 4.216179] x11: ffffffc08151a240 x10: 0000000000003ea1 x9 : ffffffc08046ab68 [ 4.223381] x8 : ffffffc083cabac0 x7 : ffffffc081df3718 x6 : 0000000000029fc8 [ 4.230583] x5 : ffffffc0817ee6d8 x4 : 0000000000000bc0 x3 : 0000000000000000 [ 4.237784] x2 : 0000000000000000 x1 : 00000000001fffff x0 : 0000000000000000 [ 4.244986] Call trace: [ 4.247463] udma_start.isra.0+0x34/0x238 [ 4.251509] udma_check_tx_completion+0xd0/0xdc [ 4.256076] process_one_work+0x244/0x3fc [ 4.260129] process_scheduled_works+0x6c/0x74 [ 4.264610] worker_thread+0x150/0x1dc [ 4.268398] kthread+0xd8/0xe8 [ 4.271492] ret_from_fork+0x10/0x20 [ 4.275107] irq event stamp: 220 [ 4.278363] hardirqs last enabled at (219): [<ffffffc080a27c7c>] _raw_spin_unlock_irq+0x38/0x50 [ 4.287183] hardirqs last disabled at (220): [<ffffffc080a1c154>] el1_dbg+0x24/0x50 [ 4.294879] softirqs last enabled at (182): [<ffffffc080037e68>] handle_softirqs+0x1c0/0x3cc [ 4.303437] softirqs last disabled at (177): [<ffffffc080010170>] __do_softirq+0x1c/0x28 [ 4.311559] ---[ end trace 0000000000000000 ]---
This commit adds the missing locking.
Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA") Cc: Peter Ujfalusi peter.ujfalusi@gmail.com Cc: Vignesh Raghavendra vigneshr@ti.com Cc: Vinod Koul vkoul@kernel.org Cc: dmaengine@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ronald Wahl ronald.wahl@legrand.com Acked-by: Peter Ujfalusi peter.ujfalusi@gmail.com Link: https://lore.kernel.org/r/20250414173113.80677-1-rwahl@gmx.de Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/ti/k3-udma.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -1088,8 +1088,11 @@ static void udma_check_tx_completion(str u32 residue_diff; ktime_t time_diff; unsigned long delay; + unsigned long flags;
while (1) { + spin_lock_irqsave(&uc->vc.lock, flags); + if (uc->desc) { /* Get previous residue and time stamp */ residue_diff = uc->tx_drain.residue; @@ -1124,6 +1127,8 @@ static void udma_check_tx_completion(str break; }
+ spin_unlock_irqrestore(&uc->vc.lock, flags); + usleep_range(ktime_to_us(delay), ktime_to_us(delay) + 10); continue; @@ -1140,6 +1145,8 @@ static void udma_check_tx_completion(str
break; } + + spin_unlock_irqrestore(&uc->vc.lock, flags); }
static irqreturn_t udma_ring_irq_handler(int irq, void *data)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yemike Abhilash Chandra y-abhilashchandra@ti.com
commit 8ca9590c39b69b55a8de63d2b21b0d44f523b43a upstream.
Currently, a local dma_cap_mask_t variable is used to store device cap_mask within udma_of_xlate(). However, the DMA_PRIVATE flag in the device cap_mask can get cleared when the last channel is released. This can happen right after storing the cap_mask locally in udma_of_xlate(), and subsequent dma_request_channel() can fail due to mismatch in the cap_mask. Fix this by removing the local dma_cap_mask_t variable and directly using the one from the dma_device structure.
Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA") Cc: stable@vger.kernel.org Signed-off-by: Vaishnav Achath vaishnav.a@ti.com Acked-by: Peter Ujfalusi peter.ujfalusi@gmail.com Reviewed-by: Udit Kumar u-kumar1@ti.com Signed-off-by: Yemike Abhilash Chandra y-abhilashchandra@ti.com Link: https://lore.kernel.org/r/20250417075521.623651-1-y-abhilashchandra@ti.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/ti/k3-udma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -4216,7 +4216,6 @@ static struct dma_chan *udma_of_xlate(st struct of_dma *ofdma) { struct udma_dev *ud = ofdma->of_dma_data; - dma_cap_mask_t mask = ud->ddev.cap_mask; struct udma_filter_param filter_param; struct dma_chan *chan;
@@ -4248,7 +4247,7 @@ static struct dma_chan *udma_of_xlate(st } }
- chan = __dma_request_channel(&mask, udma_dma_filter_fn, &filter_param, + chan = __dma_request_channel(&ud->ddev.cap_mask, udma_dma_filter_fn, &filter_param, ofdma->of_node); if (!chan) { dev_err(ud->dev, "get channel fail in %s.\n", __func__);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 3fd2f4bc010cdfbc07dd21018dc65bd9370eb7a4 upstream.
Memory allocated for wqs is not freed if an error occurs during idxd_setup_wqs(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error.
Fixes: 7c5dd23e57c1 ("dmaengine: idxd: fix wq conf_dev 'struct device' lifetime") Fixes: 700af3a0a26c ("dmaengine: idxd: add 'struct idxd_dev' as wrapper for conf_dev") Fixes: de5819b99489 ("dmaengine: idxd: track enabled workqueues in bitmap") Fixes: b0325aefd398 ("dmaengine: idxd: add WQ operation cap restriction support") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Link: https://lore.kernel.org/r/20250404120217.48772-2-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -155,8 +155,8 @@ static int idxd_setup_wqs(struct idxd_de
idxd->wq_enable_map = bitmap_zalloc_node(idxd->max_wqs, GFP_KERNEL, dev_to_node(dev)); if (!idxd->wq_enable_map) { - kfree(idxd->wqs); - return -ENOMEM; + rc = -ENOMEM; + goto err_bitmap; }
for (i = 0; i < idxd->max_wqs; i++) { @@ -175,10 +175,8 @@ static int idxd_setup_wqs(struct idxd_de conf_dev->bus = &dsa_bus_type; conf_dev->type = &idxd_wq_device_type; rc = dev_set_name(conf_dev, "wq%d.%d", idxd->id, wq->id); - if (rc < 0) { - put_device(conf_dev); + if (rc < 0) goto err; - }
mutex_init(&wq->wq_lock); init_waitqueue_head(&wq->err_queue); @@ -189,7 +187,6 @@ static int idxd_setup_wqs(struct idxd_de wq->enqcmds_retries = IDXD_ENQCMDS_RETRIES; wq->wqcfg = kzalloc_node(idxd->wqcfg_size, GFP_KERNEL, dev_to_node(dev)); if (!wq->wqcfg) { - put_device(conf_dev); rc = -ENOMEM; goto err; } @@ -197,9 +194,8 @@ static int idxd_setup_wqs(struct idxd_de if (idxd->hw.wq_cap.op_config) { wq->opcap_bmap = bitmap_zalloc(IDXD_MAX_OPCAP_BITS, GFP_KERNEL); if (!wq->opcap_bmap) { - put_device(conf_dev); rc = -ENOMEM; - goto err; + goto err_opcap_bmap; } bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS); } @@ -208,12 +204,28 @@ static int idxd_setup_wqs(struct idxd_de
return 0;
- err: +err_opcap_bmap: + kfree(wq->wqcfg); + +err: + put_device(conf_dev); + kfree(wq); + while (--i >= 0) { wq = idxd->wqs[i]; + if (idxd->hw.wq_cap.op_config) + bitmap_free(wq->opcap_bmap); + kfree(wq->wqcfg); conf_dev = wq_confdev(wq); put_device(conf_dev); + kfree(wq); + } + bitmap_free(idxd->wq_enable_map); + +err_bitmap: + kfree(idxd->wqs); + return rc; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 817bced19d1dbdd0b473580d026dc0983e30e17b upstream.
Memory allocated for engines is not freed if an error occurs during idxd_setup_engines(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error.
Fixes: 75b911309060 ("dmaengine: idxd: fix engine conf_dev lifetime") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Link: https://lore.kernel.org/r/20250404120217.48772-3-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -259,6 +259,7 @@ static int idxd_setup_engines(struct idx rc = dev_set_name(conf_dev, "engine%d.%d", idxd->id, engine->id); if (rc < 0) { put_device(conf_dev); + kfree(engine); goto err; }
@@ -272,7 +273,10 @@ static int idxd_setup_engines(struct idx engine = idxd->engines[i]; conf_dev = engine_confdev(engine); put_device(conf_dev); + kfree(engine); } + kfree(idxd->engines); + return rc; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit aa6f4f945b10eac57aed46154ae7d6fada7fccc7 upstream.
Memory allocated for groups is not freed if an error occurs during idxd_setup_groups(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error.
Fixes: defe49f96012 ("dmaengine: idxd: fix group conf_dev lifetime") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Link: https://lore.kernel.org/r/20250404120217.48772-4-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -310,6 +310,7 @@ static int idxd_setup_groups(struct idxd rc = dev_set_name(conf_dev, "group%d.%d", idxd->id, group->id); if (rc < 0) { put_device(conf_dev); + kfree(group); goto err; }
@@ -329,7 +330,10 @@ static int idxd_setup_groups(struct idxd while (--i >= 0) { group = idxd->groups[i]; put_device(group_confdev(group)); + kfree(group); } + kfree(idxd->groups); + return rc; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 61259fb96e023f7299c442c48b13e72c441fc0f2 upstream.
The idxd_setup_internals() is missing some cleanup when things fail in the middle.
Add the appropriate cleanup routines:
- cleanup groups - cleanup enginces - cleanup wqs
to make sure it exits gracefully.
Fixes: defe49f96012 ("dmaengine: idxd: fix group conf_dev lifetime") Cc: stable@vger.kernel.org Suggested-by: Fenghua Yu fenghuay@nvidia.com Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20250404120217.48772-5-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 58 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 51 insertions(+), 7 deletions(-)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -141,6 +141,25 @@ static void idxd_cleanup_interrupts(stru pci_free_irq_vectors(pdev); }
+static void idxd_clean_wqs(struct idxd_device *idxd) +{ + struct idxd_wq *wq; + struct device *conf_dev; + int i; + + for (i = 0; i < idxd->max_wqs; i++) { + wq = idxd->wqs[i]; + if (idxd->hw.wq_cap.op_config) + bitmap_free(wq->opcap_bmap); + kfree(wq->wqcfg); + conf_dev = wq_confdev(wq); + put_device(conf_dev); + kfree(wq); + } + bitmap_free(idxd->wq_enable_map); + kfree(idxd->wqs); +} + static int idxd_setup_wqs(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -229,6 +248,21 @@ err_bitmap: return rc; }
+static void idxd_clean_engines(struct idxd_device *idxd) +{ + struct idxd_engine *engine; + struct device *conf_dev; + int i; + + for (i = 0; i < idxd->max_engines; i++) { + engine = idxd->engines[i]; + conf_dev = engine_confdev(engine); + put_device(conf_dev); + kfree(engine); + } + kfree(idxd->engines); +} + static int idxd_setup_engines(struct idxd_device *idxd) { struct idxd_engine *engine; @@ -280,6 +314,19 @@ static int idxd_setup_engines(struct idx return rc; }
+static void idxd_clean_groups(struct idxd_device *idxd) +{ + struct idxd_group *group; + int i; + + for (i = 0; i < idxd->max_groups; i++) { + group = idxd->groups[i]; + put_device(group_confdev(group)); + kfree(group); + } + kfree(idxd->groups); +} + static int idxd_setup_groups(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -353,7 +400,7 @@ static void idxd_cleanup_internals(struc static int idxd_setup_internals(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; - int rc, i; + int rc;
init_waitqueue_head(&idxd->cmd_waitq);
@@ -378,14 +425,11 @@ static int idxd_setup_internals(struct i return 0;
err_wkq_create: - for (i = 0; i < idxd->max_groups; i++) - put_device(group_confdev(idxd->groups[i])); + idxd_clean_groups(idxd); err_group: - for (i = 0; i < idxd->max_engines; i++) - put_device(engine_confdev(idxd->engines[i])); + idxd_clean_engines(idxd); err_engine: - for (i = 0; i < idxd->max_wqs; i++) - put_device(wq_confdev(idxd->wqs[i])); + idxd_clean_wqs(idxd); err_wqs: return rc; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 61d651572b6c4fe50c7b39a390760f3a910c7ccf upstream.
The idxd_cleanup_internals() function only decreases the reference count of groups, engines, and wqs but is missing the step to release memory resources.
To fix this, use the cleanup helper to properly release the memory resources.
Fixes: ddf742d4f3f1 ("dmaengine: idxd: Add missing cleanup for early error out in probe call") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20250404120217.48772-6-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -386,14 +386,9 @@ static int idxd_setup_groups(struct idxd
static void idxd_cleanup_internals(struct idxd_device *idxd) { - int i; - - for (i = 0; i < idxd->max_groups; i++) - put_device(group_confdev(idxd->groups[i])); - for (i = 0; i < idxd->max_engines; i++) - put_device(engine_confdev(idxd->engines[i])); - for (i = 0; i < idxd->max_wqs; i++) - put_device(wq_confdev(idxd->wqs[i])); + idxd_clean_groups(idxd); + idxd_clean_engines(idxd); + idxd_clean_wqs(idxd); destroy_workqueue(idxd->wq); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit d5449ff1b04dfe9ed8e455769aa01e4c2ccf6805 upstream.
The remove call stack is missing idxd cleanup to free bitmap, ida and the idxd_device. Call idxd_free() helper routines to make sure we exit gracefully.
Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Cc: stable@vger.kernel.org Suggested-by: Vinicius Costa Gomes vinicius.gomes@intel.com Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20250404120217.48772-9-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -796,6 +796,7 @@ static void idxd_remove(struct pci_dev * destroy_workqueue(idxd->wq); perfmon_pmu_remove(idxd); put_device(idxd_confdev(idxd)); + idxd_free(idxd); }
static struct pci_driver idxd_pci_driver = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 46a5cca76c76c86063000a12936f8e7875295838 upstream.
Memory allocated for idxd is not freed if an error occurs during idxd_alloc(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error.
Fixes: a8563a33a5e2 ("dmanegine: idxd: reformat opcap output to match bitmap_parse() input") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Link: https://lore.kernel.org/r/20250404120217.48772-7-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -537,28 +537,34 @@ static struct idxd_device *idxd_alloc(st idxd_dev_set_type(&idxd->idxd_dev, idxd->data->type); idxd->id = ida_alloc(&idxd_ida, GFP_KERNEL); if (idxd->id < 0) - return NULL; + goto err_ida;
idxd->opcap_bmap = bitmap_zalloc_node(IDXD_MAX_OPCAP_BITS, GFP_KERNEL, dev_to_node(dev)); - if (!idxd->opcap_bmap) { - ida_free(&idxd_ida, idxd->id); - return NULL; - } + if (!idxd->opcap_bmap) + goto err_opcap;
device_initialize(conf_dev); conf_dev->parent = dev; conf_dev->bus = &dsa_bus_type; conf_dev->type = idxd->data->dev_type; rc = dev_set_name(conf_dev, "%s%d", idxd->data->name_prefix, idxd->id); - if (rc < 0) { - put_device(conf_dev); - return NULL; - } + if (rc < 0) + goto err_name;
spin_lock_init(&idxd->dev_lock); spin_lock_init(&idxd->cmd_lock);
return idxd; + +err_name: + put_device(conf_dev); + bitmap_free(idxd->opcap_bmap); +err_opcap: + ida_free(&idxd_ida, idxd->id); +err_ida: + kfree(idxd); + + return NULL; }
static int idxd_enable_system_pasid(struct idxd_device *idxd)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuai Xue xueshuai@linux.alibaba.com
commit 90022b3a6981ec234902be5dbf0f983a12c759fc upstream.
Memory allocated for idxd is not freed if an error occurs during idxd_pci_probe(). To fix it, free the allocated memory in the reverse order of allocation before exiting the function in case of an error.
Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Cc: stable@vger.kernel.org Signed-off-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com Link: https://lore.kernel.org/r/20250404120217.48772-8-xueshuai@linux.alibaba.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/init.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -520,6 +520,17 @@ static void idxd_read_caps(struct idxd_d multi_u64_to_bmap(idxd->opcap_bmap, &idxd->hw.opcap.bits[0], 4); }
+static void idxd_free(struct idxd_device *idxd) +{ + if (!idxd) + return; + + put_device(idxd_confdev(idxd)); + bitmap_free(idxd->opcap_bmap); + ida_free(&idxd_ida, idxd->id); + kfree(idxd); +} + static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_data *data) { struct device *dev = &pdev->dev; @@ -739,7 +750,7 @@ static int idxd_pci_probe(struct pci_dev err: pci_iounmap(pdev, idxd->reg_base); err_iomap: - put_device(idxd_confdev(idxd)); + idxd_free(idxd); err_idxd_alloc: pci_disable_device(pdev); return rc;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Kuchynski akuchynski@chromium.org
commit 364618c89d4c57c85e5fc51a2446cd939bf57802 upstream.
This patch introduces the ucsi_con_mutex_lock / ucsi_con_mutex_unlock functions to the UCSI driver. ucsi_con_mutex_lock ensures the connector mutex is only locked if a connection is established and the partner pointer is valid. This resolves a deadlock scenario where ucsi_displayport_remove_partner holds con->mutex waiting for dp_altmode_work to complete while dp_altmode_work attempts to acquire it.
Cc: stable stable@kernel.org Fixes: af8622f6a585 ("usb: typec: ucsi: Support for DisplayPort alt mode") Signed-off-by: Andrei Kuchynski akuchynski@chromium.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20250424084429.3220757-2-akuchynski@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/displayport.c | 19 +++++++++++-------- drivers/usb/typec/ucsi/ucsi.c | 34 ++++++++++++++++++++++++++++++++++ drivers/usb/typec/ucsi/ucsi.h | 3 +++ 3 files changed, 48 insertions(+), 8 deletions(-)
--- a/drivers/usb/typec/ucsi/displayport.c +++ b/drivers/usb/typec/ucsi/displayport.c @@ -54,7 +54,8 @@ static int ucsi_displayport_enter(struct u8 cur = 0; int ret;
- mutex_lock(&dp->con->lock); + if (!ucsi_con_mutex_lock(dp->con)) + return -ENOTCONN;
if (!dp->override && dp->initialized) { const struct typec_altmode *p = typec_altmode_get_partner(alt); @@ -100,7 +101,7 @@ static int ucsi_displayport_enter(struct schedule_work(&dp->work); ret = 0; err_unlock: - mutex_unlock(&dp->con->lock); + ucsi_con_mutex_unlock(dp->con);
return ret; } @@ -112,7 +113,8 @@ static int ucsi_displayport_exit(struct u64 command; int ret = 0;
- mutex_lock(&dp->con->lock); + if (!ucsi_con_mutex_lock(dp->con)) + return -ENOTCONN;
if (!dp->override) { const struct typec_altmode *p = typec_altmode_get_partner(alt); @@ -144,7 +146,7 @@ static int ucsi_displayport_exit(struct schedule_work(&dp->work);
out_unlock: - mutex_unlock(&dp->con->lock); + ucsi_con_mutex_unlock(dp->con);
return ret; } @@ -202,20 +204,21 @@ static int ucsi_displayport_vdm(struct t int cmd = PD_VDO_CMD(header); int svdm_version;
- mutex_lock(&dp->con->lock); + if (!ucsi_con_mutex_lock(dp->con)) + return -ENOTCONN;
if (!dp->override && dp->initialized) { const struct typec_altmode *p = typec_altmode_get_partner(alt);
dev_warn(&p->dev, "firmware doesn't support alternate mode overriding\n"); - mutex_unlock(&dp->con->lock); + ucsi_con_mutex_unlock(dp->con); return -EOPNOTSUPP; }
svdm_version = typec_altmode_get_svdm_version(alt); if (svdm_version < 0) { - mutex_unlock(&dp->con->lock); + ucsi_con_mutex_unlock(dp->con); return svdm_version; }
@@ -259,7 +262,7 @@ static int ucsi_displayport_vdm(struct t break; }
- mutex_unlock(&dp->con->lock); + ucsi_con_mutex_unlock(dp->con);
return 0; } --- a/drivers/usb/typec/ucsi/ucsi.c +++ b/drivers/usb/typec/ucsi/ucsi.c @@ -1399,6 +1399,40 @@ void ucsi_set_drvdata(struct ucsi *ucsi, EXPORT_SYMBOL_GPL(ucsi_set_drvdata);
/** + * ucsi_con_mutex_lock - Acquire the connector mutex + * @con: The connector interface to lock + * + * Returns true on success, false if the connector is disconnected + */ +bool ucsi_con_mutex_lock(struct ucsi_connector *con) +{ + bool mutex_locked = false; + bool connected = true; + + while (connected && !mutex_locked) { + mutex_locked = mutex_trylock(&con->lock) != 0; + connected = con->status.flags & UCSI_CONSTAT_CONNECTED; + if (connected && !mutex_locked) + msleep(20); + } + + connected = connected && con->partner; + if (!connected && mutex_locked) + mutex_unlock(&con->lock); + + return connected; +} + +/** + * ucsi_con_mutex_unlock - Release the connector mutex + * @con: The connector interface to unlock + */ +void ucsi_con_mutex_unlock(struct ucsi_connector *con) +{ + mutex_unlock(&con->lock); +} + +/** * ucsi_create - Allocate UCSI instance * @dev: Device interface to the PPM (Platform Policy Manager) * @ops: I/O routines --- a/drivers/usb/typec/ucsi/ucsi.h +++ b/drivers/usb/typec/ucsi/ucsi.h @@ -15,6 +15,7 @@
struct ucsi; struct ucsi_altmode; +struct ucsi_connector;
/* UCSI offsets (Bytes) */ #define UCSI_VERSION 0 @@ -62,6 +63,8 @@ int ucsi_register(struct ucsi *ucsi); void ucsi_unregister(struct ucsi *ucsi); void *ucsi_get_drvdata(struct ucsi *ucsi); void ucsi_set_drvdata(struct ucsi *ucsi, void *data); +bool ucsi_con_mutex_lock(struct ucsi_connector *con); +void ucsi_con_mutex_unlock(struct ucsi_connector *con);
void ucsi_connector_change(struct ucsi *ucsi, u8 num);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: RD Babiera rdbabiera@google.com
commit 165376f6b23e9a779850e750fb2eb06622e5a531 upstream.
The DisplayPort driver's sysfs nodes may be present to the userspace before typec_altmode_set_drvdata() completes in dp_altmode_probe. This means that a sysfs read can trigger a NULL pointer error by deferencing dp->hpd in hpd_show or dp->lock in pin_assignment_show, as dev_get_drvdata() returns NULL in those cases.
Remove manual sysfs node creation in favor of adding attribute group as default for devices bound to the driver. The ATTRIBUTE_GROUPS() macro is not used here otherwise the path to the sysfs nodes is no longer compliant with the ABI.
Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera rdbabiera@google.com Link: https://lore.kernel.org/r/20240229001101.3889432-2-rdbabiera@google.com [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren jianqi.ren.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/altmodes/displayport.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
--- a/drivers/usb/typec/altmodes/displayport.c +++ b/drivers/usb/typec/altmodes/displayport.c @@ -543,15 +543,20 @@ static ssize_t pin_assignment_show(struc } static DEVICE_ATTR_RW(pin_assignment);
-static struct attribute *dp_altmode_attrs[] = { +static struct attribute *displayport_attrs[] = { &dev_attr_configuration.attr, &dev_attr_pin_assignment.attr, NULL };
-static const struct attribute_group dp_altmode_group = { +static const struct attribute_group displayport_group = { .name = "displayport", - .attrs = dp_altmode_attrs, + .attrs = displayport_attrs, +}; + +static const struct attribute_group *displayport_groups[] = { + &displayport_group, + NULL, };
int dp_altmode_probe(struct typec_altmode *alt) @@ -559,7 +564,6 @@ int dp_altmode_probe(struct typec_altmod const struct typec_altmode *port = typec_altmode_get_partner(alt); struct fwnode_handle *fwnode; struct dp_altmode *dp; - int ret;
/* FIXME: Port can only be DFP_U. */
@@ -570,10 +574,6 @@ int dp_altmode_probe(struct typec_altmod DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo))) return -ENODEV;
- ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group); - if (ret) - return ret; - dp = devm_kzalloc(&alt->dev, sizeof(*dp), GFP_KERNEL); if (!dp) return -ENOMEM; @@ -604,7 +604,6 @@ void dp_altmode_remove(struct typec_altm { struct dp_altmode *dp = typec_altmode_get_drvdata(alt);
- sysfs_remove_group(&alt->dev.kobj, &dp_altmode_group); cancel_work_sync(&dp->work);
if (dp->connector_fwnode) { @@ -629,6 +628,7 @@ static struct typec_altmode_driver dp_al .driver = { .name = "typec_displayport", .owner = THIS_MODULE, + .dev_groups = displayport_groups, }, }; module_typec_altmode_driver(dp_altmode_driver);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit e56aac6e5a25630645607b6856d4b2a17b2311a5 upstream.
The "command" variable can be controlled by the user via debugfs. The worry is that if con_index is zero then "&uc->ucsi->connector[con_index - 1]" would be an array underflow.
Fixes: 170a6726d0e2 ("usb: typec: ucsi: add support for separate DP altmode devices") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/c69ef0b3-61b0-4dde-98dd-97b97f81d912@stanley.mount... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [ The function ucsi_ccg_sync_write() is renamed to ucsi_ccg_sync_control() in commit 13f2ec3115c8 ("usb: typec: ucsi:simplify command sending API"). Apply this patch to ucsi_ccg_sync_write() in 6.1.y accordingly. ] Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi_ccg.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -585,6 +585,10 @@ static int ucsi_ccg_sync_write(struct uc uc->has_multiple_dp) { con_index = (uc->last_cmd_sent >> 16) & UCSI_CMD_CONNECTOR_MASK; + if (con_index == 0) { + ret = -EINVAL; + goto unlock; + } con = &uc->ucsi->connector[con_index - 1]; ucsi_ccg_update_set_new_cam_cmd(uc, con, (u64 *)val); } @@ -600,6 +604,7 @@ static int ucsi_ccg_sync_write(struct uc err_clear_bit: clear_bit(DEV_CMD_PENDING, &uc->flags); pm_runtime_put_sync(uc->dev); +unlock: mutex_unlock(&uc->lock);
return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: GONG Ruiqi gongruiqi1@huawei.com
commit b0e525d7a22ea350e75e2aec22e47fcfafa4cacd upstream.
The error handling for the case `con_index == 0` should involve dropping the pm usage counter, as ucsi_ccg_sync_control() gets it at the beginning. Fix it.
Cc: stable stable@kernel.org Fixes: e56aac6e5a25 ("usb: typec: fix potential array underflow in ucsi_ccg_sync_control()") Signed-off-by: GONG Ruiqi gongruiqi1@huawei.com Reviewed-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20250107015750.2778646-1-gongruiqi1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [Minor context change fixed.] Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi_ccg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -587,7 +587,7 @@ static int ucsi_ccg_sync_write(struct uc UCSI_CMD_CONNECTOR_MASK; if (con_index == 0) { ret = -EINVAL; - goto unlock; + goto err_put; } con = &uc->ucsi->connector[con_index - 1]; ucsi_ccg_update_set_new_cam_cmd(uc, con, (u64 *)val); @@ -603,8 +603,8 @@ static int ucsi_ccg_sync_write(struct uc
err_clear_bit: clear_bit(DEV_CMD_PENDING, &uc->flags); +err_put: pm_runtime_put_sync(uc->dev); -unlock: mutex_unlock(&uc->lock);
return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Feng Tang feng.tang@linux.alibaba.com
commit ab00ddd802f80e31fc9639c652d736fe3913feae upstream.
When running mm selftest to verify mm patches, 'compaction_test' case failed on an x86 server with 1TB memory. And the root cause is that it has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB for that x86 server, and when it succeeds, it expects it's large than 1/3 of 80% of the free memory in system. This logic only works for platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false alarm for others.
Fix it by changing the fixed page number to self-adjustable number according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Feng Tang feng.tang@linux.alibaba.com Acked-by: Dev Jain dev.jain@arm.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Tested-by: Baolin Wang baolin.wang@inux.alibaba.com Cc: Shuah Khan shuah@kernel.org Cc: Sri Jayaramappa sjayaram@akamai.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/vm/compaction_test.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/vm/compaction_test.c +++ b/tools/testing/selftests/vm/compaction_test.c @@ -89,6 +89,8 @@ int check_compaction(unsigned long mem_f int compaction_index = 0; char initial_nr_hugepages[20] = {0}; char nr_hugepages[20] = {0}; + char target_nr_hugepages[24] = {0}; + int slen;
/* We want to test with 80% of available memory. Else, OOM killer comes in to play */ @@ -119,11 +121,18 @@ int check_compaction(unsigned long mem_f
lseek(fd, 0, SEEK_SET);
- /* Request a large number of huge pages. The Kernel will allocate - as much as it can */ - if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) { - ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n", - strerror(errno)); + /* + * Request huge pages for about half of the free memory. The Kernel + * will allocate as much as it can, and we expect it will get at least 1/3 + */ + nr_hugepages_ul = mem_free / hugepage_size / 2; + snprintf(target_nr_hugepages, sizeof(target_nr_hugepages), + "%lu", nr_hugepages_ul); + + slen = strlen(target_nr_hugepages); + if (write(fd, target_nr_hugepages, slen) != slen) { + ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n", + nr_hugepages_ul, strerror(errno)); goto close_fd; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Byungchul Park byungchul@sk.com
commit 2774f256e7c0219e2b0a0894af1c76bdabc4f974 upstream.
With numa balancing on, when a numa system is running where a numa node doesn't have its local memory so it has no managed zones, the following oops has been observed. It's because wakeup_kswapd() is called with a wrong zone index, -1. Fixed it by checking the index before calling wakeup_kswapd().
BUG: unable to handle page fault for address: 00000000000033f3 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) Code: (omitted) RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace:
<TASK> ? __die ? page_fault_oops ? __pte_offset_map_lock ? exc_page_fault ? asm_exc_page_fault ? wakeup_kswapd migrate_misplaced_page __handle_mm_fault handle_mm_fault do_user_addr_fault exc_page_fault asm_exc_page_fault RIP: 0033:0x55b897ba0808 Code: (omitted) RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287 RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0 RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0 RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000 </TASK>
Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com Signed-off-by: Byungchul Park byungchul@sk.com Reported-by: Hyeongtak Ji hyeongtak.ji@sk.com Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system") Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Jianqi Ren jianqi.ren.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/migrate.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/migrate.c +++ b/mm/migrate.c @@ -2422,6 +2422,14 @@ static int numamigrate_isolate_page(pg_d if (managed_zone(pgdat->node_zones + z)) break; } + + /* + * If there are no managed zones, it should not proceed + * further. + */ + if (z < 0) + return 0; + wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE); return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Lu luxu.kernel@bytedance.com
commit f754f27e98f88428aaf6be6e00f5cbce97f62d4b upstream.
In sparse vmemmap model, the virtual address of vmemmap is calculated as: ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)). And the struct page's va can be calculated with an offset: (vmemmap + (pfn)).
However, when initializing struct pages, kernel actually starts from the first page from the same section that phys_ram_base belongs to. If the first page's physical address is not (phys_ram_base >> PAGE_SHIFT), then we get an va below VMEMMAP_START when calculating va for it's struct page.
For example, if phys_ram_base starts from 0x82000000 with pfn 0x82000, the first page in the same section is actually pfn 0x80000. During init_unavailable_range(), we will initialize struct page for pfn 0x80000 with virtual address ((struct page *)VMEMMAP_START - 0x2000), which is below VMEMMAP_START as well as PCI_IO_END.
This commit fixes this bug by introducing a new variable 'vmemmap_start_pfn' which is aligned with memory section size and using it to calculate vmemmap address instead of phys_ram_base.
Fixes: a11dd49dcb93 ("riscv: Sparse-Memory/vmemmap out-of-bounds fix") Signed-off-by: Xu Lu luxu.kernel@bytedance.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Tested-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Björn Töpel bjorn@rivosinc.com Link: https://lore.kernel.org/r/20241209122617.53341-1-luxu.kernel@bytedance.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/page.h | 1 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/mm/init.c | 17 ++++++++++++++++- 3 files changed, 18 insertions(+), 2 deletions(-)
--- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -115,6 +115,7 @@ struct kernel_mapping {
extern struct kernel_mapping kernel_map; extern phys_addr_t phys_ram_base; +extern unsigned long vmemmap_start_pfn;
#define is_kernel_mapping(x) \ ((x) >= kernel_map.virt_addr && (x) < (kernel_map.virt_addr + kernel_map.size)) --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -79,7 +79,7 @@ * Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel * is configured with CONFIG_SPARSEMEM_VMEMMAP enabled. */ -#define vmemmap ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)) +#define vmemmap ((struct page *)VMEMMAP_START - vmemmap_start_pfn)
#define PCI_IO_SIZE SZ_16M #define PCI_IO_END VMEMMAP_START --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -22,6 +22,7 @@ #include <linux/hugetlb.h>
#include <asm/fixmap.h> +#include <asm/sparsemem.h> #include <asm/tlbflush.h> #include <asm/sections.h> #include <asm/soc.h> @@ -52,6 +53,13 @@ EXPORT_SYMBOL(pgtable_l5_enabled); phys_addr_t phys_ram_base __ro_after_init; EXPORT_SYMBOL(phys_ram_base);
+#ifdef CONFIG_SPARSEMEM_VMEMMAP +#define VMEMMAP_ADDR_ALIGN (1ULL << SECTION_SIZE_BITS) + +unsigned long vmemmap_start_pfn __ro_after_init; +EXPORT_SYMBOL(vmemmap_start_pfn); +#endif + unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss; EXPORT_SYMBOL(empty_zero_page); @@ -210,8 +218,12 @@ static void __init setup_bootmem(void) memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start);
phys_ram_end = memblock_end_of_DRAM(); - if (!IS_ENABLED(CONFIG_XIP_KERNEL)) + if (!IS_ENABLED(CONFIG_XIP_KERNEL)) { phys_ram_base = memblock_start_of_DRAM(); +#ifdef CONFIG_SPARSEMEM_VMEMMAP + vmemmap_start_pfn = round_down(phys_ram_base, VMEMMAP_ADDR_ALIGN) >> PAGE_SHIFT; +#endif +} /* * Reserve physical address space that would be mapped to virtual * addresses greater than (void *)(-PAGE_SIZE) because: @@ -946,6 +958,9 @@ asmlinkage void __init setup_vm(uintptr_ kernel_map.xiprom_sz = (uintptr_t)(&_exiprom) - (uintptr_t)(&_xiprom);
phys_ram_base = CONFIG_PHYS_RAM_BASE; +#ifdef CONFIG_SPARSEMEM_VMEMMAP + vmemmap_start_pfn = round_down(phys_ram_base, VMEMMAP_ADDR_ALIGN) >> PAGE_SHIFT; +#endif kernel_map.phys_addr = (uintptr_t)CONFIG_PHYS_RAM_BASE; kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_start);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Puranjay Mohan puranjay@kernel.org
commit 19d3c179a37730caf600a97fed3794feac2b197b upstream.
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls __bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them the struct bpf_tramp_image *im pointer as an argument in R0.
The trampoline generation code uses emit_addr_mov_i64() to emit instructions for moving the bpf_tramp_image address into R0, but emit_addr_mov_i64() assumes the address to be in the vmalloc() space and uses only 48 bits. Because bpf_tramp_image is allocated using kzalloc(), its address can use more than 48-bits, in this case the trampoline will pass an invalid address to __bpf_tramp_enter/exit() causing a kernel crash.
Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64() as it can work with addresses that are greater than 48-bits.
Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Signed-off-by: Puranjay Mohan puranjay@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Closes: https://lore.kernel.org/all/SJ0PR15MB461564D3F7E7A763498CA6A8CBDB2@SJ0PR15MB... Link: https://lore.kernel.org/bpf/20240711151838.43469-1-puranjay@kernel.org [Minor context change fixed.] Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/net/bpf_jit_comp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1942,7 +1942,7 @@ static int prepare_trampoline(struct jit emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (flags & BPF_TRAMP_F_CALL_ORIG) { - emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_enter, ctx); }
@@ -1986,7 +1986,7 @@ static int prepare_trampoline(struct jit
if (flags & BPF_TRAMP_F_CALL_ORIG) { im->ip_epilogue = ctx->image + ctx->idx; - emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_exit, ctx); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Collingbourne pcc@google.com
commit a552e2ef5fd1a6c78267cd4ec5a9b49aa11bbb1c upstream.
When BPF_TRAMP_F_CALL_ORIG is enabled, the address of a bpf_tramp_image struct on the stack is passed during the size calculation pass and an address on the heap is passed during code generation. This may cause a heap buffer overflow if the heap address is tagged because emit_a64_mov_i64() will emit longer code than it did during the size calculation pass. The same problem could occur without tag-based KASAN if one of the 16-bit words of the stack address happened to be all-ones during the size calculation pass. Fix the problem by assuming the worst case (4 instructions) when calculating the size of the bpf_tramp_image address emission.
Fixes: 19d3c179a377 ("bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG") Signed-off-by: Peter Collingbourne pcc@google.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Xu Kuohai xukuohai@huawei.com Link: https://linux-review.googlesource.com/id/I1496f2bc24fba7a1d492e16e2b94cf4371... Link: https://lore.kernel.org/bpf/20241018221644.3240898-1-pcc@google.com [Minor context change fixed.] Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/net/bpf_jit_comp.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1942,7 +1942,11 @@ static int prepare_trampoline(struct jit emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (flags & BPF_TRAMP_F_CALL_ORIG) { - emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); + /* for the first pass, assume the worst case */ + if (!ctx->image) + ctx->idx += 4; + else + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_enter, ctx); }
@@ -1986,7 +1990,11 @@ static int prepare_trampoline(struct jit
if (flags & BPF_TRAMP_F_CALL_ORIG) { im->ip_epilogue = ctx->image + ctx->idx; - emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); + /* for the first pass, assume the worst case */ + if (!ctx->image) + ctx->idx += 4; + else + emit_a64_mov_i64(A64_R(0), (const u64)im, ctx); emit_call((const u64)__bpf_tramp_exit, ctx); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit e67e0eb6a98b261caf45048f9eb95fd7609289c0 upstream.
LoongArch's toolchain may change the default code model from normal to medium. This is unnecessary for kernel, and generates some relocations which cannot be handled by the module loader. So explicitly specify the code model to normal in Makefile (for Rust 'normal' is 'small').
Cc: stable@vger.kernel.org Tested-by: Haiyong Sun sunhaiyong@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -38,7 +38,7 @@ endif
ifdef CONFIG_64BIT ld-emul = $(64bit-emul) -cflags-y += -mabi=lp64s +cflags-y += -mabi=lp64s -mcmodel=normal endif
cflags-y += -pipe -msoft-float
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Wupeng mawupeng1@huawei.com
commit af288a426c3e3552b62595c6138ec6371a17dbba upstream.
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined) add page poison checks in do_migrate_range in order to make offline hwpoisoned page possible by introducing isolate_lru_page and try_to_unmap for hwpoisoned page. However folio lock must be held before calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------ kernel BUG at ./include/linux/swapops.h:400! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 Tainted: [W]=WARN Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : try_to_unmap_one+0xb08/0xd3c lr : try_to_unmap_one+0x3dc/0xd3c Call trace: try_to_unmap_one+0xb08/0xd3c (P) try_to_unmap_one+0x3dc/0xd3c (L) rmap_walk_anon+0xdc/0x1f8 rmap_walk+0x3c/0x58 try_to_unmap+0x88/0x90 unmap_poisoned_folio+0x30/0xa8 do_migrate_range+0x4a0/0x568 offline_pages+0x5a4/0x670 memory_block_action+0x17c/0x374 memory_subsys_offline+0x3c/0x78 device_offline+0xa4/0xd0 state_store+0x8c/0xf0 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x3a8/0x4bc ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0xc8/0xcc el0t_64_sync+0x198/0x19c Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") Signed-off-by: Ma Wupeng mawupeng1@huawei.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Miaohe Lin linmiaohe@huawei.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Xiangyu Chen xiangyu.chen@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory_hotplug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1656,8 +1656,12 @@ do_migrate_range(unsigned long start_pfn if (PageHWPoison(page)) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); - if (folio_mapped(folio)) + if (folio_mapped(folio)) { + folio_lock(folio); try_to_unmap(folio, TTU_IGNORE_MLOCK); + folio_unlock(folio); + } + continue; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 10206302af856791fbcc27a33ed3c3eb09b2793d upstream.
We must serialize calls to sctp_udp_sock_stop() and sctp_udp_sock_start() or risk a crash as syzbot reported:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f] CPU: 1 UID: 0 PID: 6551 Comm: syz.1.44 Not tainted 6.14.0-syzkaller-g7f2ff7b62617 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3653 Call Trace: <TASK> udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:181 sctp_udp_sock_stop+0x71/0x160 net/sctp/protocol.c:930 proc_sctp_do_udp_port+0x264/0x450 net/sctp/sysctl.c:553 proc_sys_call_handler+0x3d0/0x5b0 fs/proc/proc_sysctl.c:601 iter_file_splice_write+0x91c/0x1150 fs/splice.c:738 do_splice_from fs/splice.c:935 [inline] direct_splice_actor+0x18f/0x6c0 fs/splice.c:1158 splice_direct_to_actor+0x342/0xa30 fs/splice.c:1102 do_splice_direct_actor fs/splice.c:1201 [inline] do_splice_direct+0x174/0x240 fs/splice.c:1227 do_sendfile+0xafd/0xe50 fs/read_write.c:1368 __do_sys_sendfile64 fs/read_write.c:1429 [inline] __se_sys_sendfile64 fs/read_write.c:1415 [inline] __x64_sys_sendfile64+0x1d8/0x220 fs/read_write.c:1415 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
Fixes: 046c052b475e ("sctp: enable udp tunneling socks") Reported-by: syzbot+fae49d997eb56fa7c74d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67ea5c01.050a0220.1547ec.012b.GAE@google.com/... Signed-off-by: Eric Dumazet edumazet@google.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Xin Long lucien.xin@gmail.com Link: https://patch.msgid.link/20250331091532.224982-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren jianqi.ren.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sysctl.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -518,6 +518,8 @@ static int proc_sctp_do_auth(struct ctl_ return ret; }
+static DEFINE_MUTEX(sctp_sysctl_mutex); + static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -542,6 +544,7 @@ static int proc_sctp_do_udp_port(struct if (new_value > max || new_value < min) return -EINVAL;
+ mutex_lock(&sctp_sysctl_mutex); net->sctp.udp_port = new_value; sctp_udp_sock_stop(net); if (new_value) { @@ -554,6 +557,7 @@ static int proc_sctp_do_udp_port(struct lock_sock(sk); sctp_sk(sk)->udp_port = htons(net->sctp.udp_port); release_sock(sk); + mutex_unlock(&sctp_sysctl_mutex); }
return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 28cb13f29faf6290597b24b728dc3100c019356f upstream.
Instead of doing a BUG_ON() handle the error by returning -EUCLEAN, aborting the transaction and logging an error message.
Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren jianqi.ren.cn@windriver.com Signed-off-by: He Zhe zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -179,6 +179,14 @@ search_again: ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); num_refs = btrfs_extent_refs(leaf, ei); + if (unlikely(num_refs == 0)) { + ret = -EUCLEAN; + btrfs_err(fs_info, + "unexpected zero reference count for extent item (%llu %u %llu)", + key.objectid, key.type, key.offset); + btrfs_abort_transaction(trans, ret); + goto out_free; + } extent_flags = btrfs_extent_flags(leaf, ei); } else { ret = -EINVAL; @@ -190,8 +198,6 @@ search_again:
goto out_free; } - - BUG_ON(num_refs == 0); } else { num_refs = 0; extent_flags = 0; @@ -221,10 +227,19 @@ search_again: goto search_again; } spin_lock(&head->lock); - if (head->extent_op && head->extent_op->update_flags) + if (head->extent_op && head->extent_op->update_flags) { extent_flags |= head->extent_op->flags_to_set; - else - BUG_ON(num_refs == 0); + } else if (unlikely(num_refs == 0)) { + spin_unlock(&head->lock); + mutex_unlock(&head->mutex); + spin_unlock(&delayed_refs->lock); + ret = -EUCLEAN; + btrfs_err(fs_info, + "unexpected zero reference count for extent %llu (%s)", + bytenr, metadata ? "metadata" : "data"); + btrfs_abort_transaction(trans, ret); + goto out_free; + }
num_refs += head->ref_mod; spin_unlock(&head->lock);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 8965d42bcf54d42cbc72fe34a9d0ec3f8527debd upstream.
It would be better to not store nft_ctx inside nft_trans object, the netlink ctx strucutre is huge and most of its information is never needed in places that use trans->ctx.
Avoid/reduce its usage if possible, no runtime behaviour change intended.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nf_tables_api.c | 17 ++++++++--------- net/netfilter/nft_immediate.c | 2 +- 3 files changed, 10 insertions(+), 11 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1121,7 +1121,7 @@ static inline bool nft_chain_is_bound(st
int nft_chain_add(struct nft_table *table, struct nft_chain *chain); void nft_chain_del(struct nft_chain *chain); -void nf_tables_chain_destroy(struct nft_ctx *ctx); +void nf_tables_chain_destroy(struct nft_chain *chain);
struct nft_stats { u64 bytes; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2034,9 +2034,9 @@ static void nf_tables_chain_free_chain_r kvfree(chain->blob_next); }
-void nf_tables_chain_destroy(struct nft_ctx *ctx) +void nf_tables_chain_destroy(struct nft_chain *chain) { - struct nft_chain *chain = ctx->chain; + const struct nft_table *table = chain->table; struct nft_hook *hook, *next;
if (WARN_ON(chain->use > 0)) @@ -2048,7 +2048,7 @@ void nf_tables_chain_destroy(struct nft_ if (nft_is_base_chain(chain)) { struct nft_base_chain *basechain = nft_base_chain(chain);
- if (nft_base_chain_netdev(ctx->family, basechain->ops.hooknum)) { + if (nft_base_chain_netdev(table->family, basechain->ops.hooknum)) { list_for_each_entry_safe(hook, next, &basechain->hook_list, list) { list_del_rcu(&hook->list); @@ -2515,7 +2515,7 @@ err_chain_add: err_trans: nft_use_dec_restore(&table->use); err_destroy_chain: - nf_tables_chain_destroy(ctx); + nf_tables_chain_destroy(chain);
return err; } @@ -8994,7 +8994,7 @@ static void nft_commit_release(struct nf kfree(nft_trans_chain_name(trans)); break; case NFT_MSG_DELCHAIN: - nf_tables_chain_destroy(&trans->ctx); + nf_tables_chain_destroy(nft_trans_chain(trans)); break; case NFT_MSG_DELRULE: nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); @@ -9955,7 +9955,7 @@ static void nf_tables_abort_release(stru nf_tables_table_destroy(&trans->ctx); break; case NFT_MSG_NEWCHAIN: - nf_tables_chain_destroy(&trans->ctx); + nf_tables_chain_destroy(nft_trans_chain(trans)); break; case NFT_MSG_NEWRULE: nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); @@ -10677,7 +10677,7 @@ int __nft_release_basechain(struct nft_c } nft_chain_del(ctx->chain); nft_use_dec(&ctx->table->use); - nf_tables_chain_destroy(ctx); + nf_tables_chain_destroy(ctx->chain);
return 0; } @@ -10753,10 +10753,9 @@ static void __nft_release_table(struct n nft_obj_destroy(&ctx, obj); } list_for_each_entry_safe(chain, nc, &table->chains, list) { - ctx.chain = chain; nft_chain_del(chain); nft_use_dec(&table->use); - nf_tables_chain_destroy(&ctx); + nf_tables_chain_destroy(chain); } nf_tables_table_destroy(&ctx); } --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -221,7 +221,7 @@ static void nft_immediate_destroy(const list_del(&rule->list); nf_tables_rule_destroy(&chain_ctx, rule); } - nf_tables_chain_destroy(&chain_ctx); + nf_tables_chain_destroy(chain); break; default: break;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit c03d278fdf35e73dd0ec543b9b556876b9d9a8dc upstream.
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed synchronize_net() call when unregistering basechain hook, however, net_device removal event handler for the NFPROTO_NETDEV was not updated to wait for RCU grace period.
Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") does not remove basechain rules on device removal, I was hinted to remove rules on net_device removal later, see 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on netdevice removal").
Although NETDEV_UNREGISTER event is guaranteed to be handled after synchronize_net() call, this path needs to wait for rcu grace period via rcu callback to release basechain hooks if netns is alive because an ongoing netlink dump could be in progress (sockets hold a reference on the netns).
Note that nf_tables_pre_exit_net() unregisters and releases basechain hooks but it is possible to see NETDEV_UNREGISTER at a later stage in the netns exit path, eg. veth peer device in another netns:
cleanup_net() default_device_exit_batch() unregister_netdevice_many_notify() notifier_call_chain() nf_tables_netdev_event() __nft_release_basechain()
In this particular case, same rule of thumb applies: if netns is alive, then wait for rcu grace period because netlink dump in the other netns could be in progress. Otherwise, if the other netns is going away then no netlink dump can be in progress and basechain hooks can be released inmediately.
While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain validation, which should not ever happen.
Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 4 +++ net/netfilter/nf_tables_api.c | 41 +++++++++++++++++++++++++++++++------- 2 files changed, 38 insertions(+), 7 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1045,6 +1045,7 @@ struct nft_rule_blob { * @use: number of jump references to this chain * @flags: bitmask of enum nft_chain_flags * @name: name of the chain + * @rcu_head: rcu head for deferred release */ struct nft_chain { struct nft_rule_blob __rcu *blob_gen_0; @@ -1061,6 +1062,7 @@ struct nft_chain { char *name; u16 udlen; u8 *udata; + struct rcu_head rcu_head;
/* Only used during control plane commit phase: */ struct nft_rule_blob *blob_next; @@ -1203,6 +1205,7 @@ static inline void nft_use_inc_restore(u * @sets: sets in the table * @objects: stateful objects in the table * @flowtables: flow tables in the table + * @net: netnamespace this table belongs to * @hgenerator: handle generator state * @handle: table handle * @use: number of chain references to this table @@ -1218,6 +1221,7 @@ struct nft_table { struct list_head sets; struct list_head objects; struct list_head flowtables; + possible_net_t net; u64 hgenerator; u64 handle; u32 use; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1413,6 +1413,7 @@ static int nf_tables_newtable(struct sk_ INIT_LIST_HEAD(&table->sets); INIT_LIST_HEAD(&table->objects); INIT_LIST_HEAD(&table->flowtables); + write_pnet(&table->net, net); table->family = family; table->flags = flags; table->handle = ++nft_net->table_handle; @@ -10662,22 +10663,48 @@ int nft_data_dump(struct sk_buff *skb, i } EXPORT_SYMBOL_GPL(nft_data_dump);
-int __nft_release_basechain(struct nft_ctx *ctx) +static void __nft_release_basechain_now(struct nft_ctx *ctx) { struct nft_rule *rule, *nr;
- if (WARN_ON(!nft_is_base_chain(ctx->chain))) - return 0; - - nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain); list_for_each_entry_safe(rule, nr, &ctx->chain->rules, list) { list_del(&rule->list); - nft_use_dec(&ctx->chain->use); nf_tables_rule_release(ctx, rule); } + nf_tables_chain_destroy(ctx->chain); +} + +static void nft_release_basechain_rcu(struct rcu_head *head) +{ + struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); + struct nft_ctx ctx = { + .family = chain->table->family, + .chain = chain, + .net = read_pnet(&chain->table->net), + }; + + __nft_release_basechain_now(&ctx); + put_net(ctx.net); +} + +int __nft_release_basechain(struct nft_ctx *ctx) +{ + struct nft_rule *rule; + + if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain))) + return 0; + + nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain); + list_for_each_entry(rule, &ctx->chain->rules, list) + nft_use_dec(&ctx->chain->use); + nft_chain_del(ctx->chain); nft_use_dec(&ctx->table->use); - nf_tables_chain_destroy(ctx->chain); + + if (maybe_get_net(ctx->net)) + call_rcu(&ctx->chain->rcu_head, nft_release_basechain_rcu); + else + __nft_release_basechain_now(ctx);
return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit b04df3da1b5c6f6dc7cdccc37941740c078c4043 upstream.
nf_tables_chain_destroy can sleep, it can't be used from call_rcu callbacks.
Moreover, nf_tables_rule_release() is only safe for error unwinding, while transaction mutex is held and the to-be-desroyed rule was not exposed to either dataplane or dumps, as it deactives+frees without the required synchronize_rcu() in-between.
nft_rule_expr_deactivate() callbacks will change ->use counters of other chains/sets, see e.g. nft_lookup .deactivate callback, these must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard and way more intrusive. As-is, we can get:
WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x.. Workqueue: events nf_tables_trans_destroy_work RIP: 0010:nft_set_destroy+0x3fe/0x5c0 Call Trace: <TASK> nf_tables_trans_destroy_work+0x6b7/0xad0 process_one_work+0x64a/0xce0 worker_thread+0x613/0x10d0
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nft_trans_rule objects + one nft_trans_chain object, deactivate the rules + the chain and then defer the freeing to the nft destroy workqueue. We'd still need to keep the synchronize_rcu path as a fallback to handle -ENOMEM corner cases though.
Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/ Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 3 --- net/netfilter/nf_tables_api.c | 32 +++++++++++++++----------------- 2 files changed, 15 insertions(+), 20 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1062,7 +1062,6 @@ struct nft_chain { char *name; u16 udlen; u8 *udata; - struct rcu_head rcu_head;
/* Only used during control plane commit phase: */ struct nft_rule_blob *blob_next; @@ -1205,7 +1204,6 @@ static inline void nft_use_inc_restore(u * @sets: sets in the table * @objects: stateful objects in the table * @flowtables: flow tables in the table - * @net: netnamespace this table belongs to * @hgenerator: handle generator state * @handle: table handle * @use: number of chain references to this table @@ -1221,7 +1219,6 @@ struct nft_table { struct list_head sets; struct list_head objects; struct list_head flowtables; - possible_net_t net; u64 hgenerator; u64 handle; u32 use; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1413,7 +1413,6 @@ static int nf_tables_newtable(struct sk_ INIT_LIST_HEAD(&table->sets); INIT_LIST_HEAD(&table->objects); INIT_LIST_HEAD(&table->flowtables); - write_pnet(&table->net, net); table->family = family; table->flags = flags; table->handle = ++nft_net->table_handle; @@ -3511,8 +3510,11 @@ void nf_tables_rule_destroy(const struct kfree(rule); }
+/* can only be used if rule is no longer visible to dumps */ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule) { + lockdep_commit_lock_is_held(ctx->net); + nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE); nf_tables_rule_destroy(ctx, rule); } @@ -5248,6 +5250,8 @@ void nf_tables_deactivate_set(const stru struct nft_set_binding *binding, enum nft_trans_phase phase) { + lockdep_commit_lock_is_held(ctx->net); + switch (phase) { case NFT_TRANS_PREPARE_ERROR: nft_set_trans_unbind(ctx, set); @@ -10674,19 +10678,6 @@ static void __nft_release_basechain_now( nf_tables_chain_destroy(ctx->chain); }
-static void nft_release_basechain_rcu(struct rcu_head *head) -{ - struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); - struct nft_ctx ctx = { - .family = chain->table->family, - .chain = chain, - .net = read_pnet(&chain->table->net), - }; - - __nft_release_basechain_now(&ctx); - put_net(ctx.net); -} - int __nft_release_basechain(struct nft_ctx *ctx) { struct nft_rule *rule; @@ -10701,11 +10692,18 @@ int __nft_release_basechain(struct nft_c nft_chain_del(ctx->chain); nft_use_dec(&ctx->table->use);
- if (maybe_get_net(ctx->net)) - call_rcu(&ctx->chain->rcu_head, nft_release_basechain_rcu); - else + if (!maybe_get_net(ctx->net)) { __nft_release_basechain_now(ctx); + return 0; + } + + /* wait for ruleset dumps to complete. Owning chain is no longer in + * lists, so new dumps can't find any of these rules anymore. + */ + synchronize_rcu();
+ __nft_release_basechain_now(ctx); + put_net(ctx->net); return 0; } EXPORT_SYMBOL_GPL(__nft_release_basechain);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
commit dc7eb8755797ed41a0d1b5c0c39df3c8f401b3d9 upstream.
When sme_alloc() is called with existing storage and we are not flushing we will always allocate new storage, both leaking the existing storage and corrupting the state. Fix this by separating the checks for flushing and for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should call sme_free() themselves.
Fixes: 5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115-arm64-sme-flush-v1-1-7472bd3459b7@kernel.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1259,8 +1259,10 @@ void fpsimd_release_task(struct task_str */ void sme_alloc(struct task_struct *task, bool flush) { - if (task->thread.za_state && flush) { - memset(task->thread.za_state, 0, za_state_size(task)); + if (task->thread.za_state) { + if (flush) + memset(task->thread.za_state, 0, + za_state_size(task)); return; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej S. Szmigiero mail@maciej.szmigiero.name
commit dd410d784402c5775f66faf8b624e85e41c38aaf upstream.
Wakeup for IRQ1 should be disabled only in cases where i8042 had actually enabled it, otherwise "wake_depth" for this IRQ will try to drop below zero and there will be an unpleasant WARN() logged:
kernel: atkbd serio0: Disabling IRQ1 wakeup source to avoid platform firmware bug kernel: ------------[ cut here ]------------ kernel: Unbalanced IRQ 1 wake disable kernel: WARNING: CPU: 10 PID: 6431 at kernel/irq/manage.c:920 irq_set_irq_wake+0x147/0x1a0
The PMC driver uses DEFINE_SIMPLE_DEV_PM_OPS() to define its dev_pm_ops which sets amd_pmc_suspend_handler() to the .suspend, .freeze, and .poweroff handlers. i8042_pm_suspend(), however, is only set as the .suspend handler.
Fix the issue by call PMC suspend handler only from the same set of dev_pm_ops handlers as i8042_pm_suspend(), which currently means just the .suspend handler.
To reproduce this issue try hibernating (S4) the machine after a fresh boot without putting it into s2idle first.
Fixes: 8e60615e8932 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN") Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Maciej S. Szmigiero mail@maciej.szmigiero.name Link: https://lore.kernel.org/r/c8f28c002ca3c66fbeeb850904a1f43118e17200.173618460... [ij: edited the commit message.] Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/amd/pmc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/platform/x86/amd/pmc.c +++ b/drivers/platform/x86/amd/pmc.c @@ -834,6 +834,10 @@ static int __maybe_unused amd_pmc_suspen { struct amd_pmc_dev *pdev = dev_get_drvdata(dev);
+ /* + * Must be called only from the same set of dev_pm_ops handlers + * as i8042_pm_suspend() is called: currently just from .suspend. + */ if (pdev->cpu_id == AMD_CPU_ID_CZN) { int rc = amd_pmc_czn_wa_irq1(pdev);
@@ -846,7 +850,9 @@ static int __maybe_unused amd_pmc_suspen return 0; }
-static SIMPLE_DEV_PM_OPS(amd_pmc_pm, amd_pmc_suspend_handler, NULL); +static const struct dev_pm_ops amd_pmc_pm = { + .suspend = amd_pmc_suspend_handler, +};
#endif
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shravya KN shravya.k-n@broadcom.com
commit 3051a77a09dfe3022aa012071346937fdf059033 upstream.
The MTU setting at the time an XDP multi-buffer is attached determines whether the aggregation ring will be used and the rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need to be changed and it may become out-of-sync with the settings initially done in bnxt_set_rx_skb_mode(). This may result in random memory corruption and crashes as the HW may DMA data larger than the allocated buffer size, such as:
BUG: kernel NULL pointer dereference, address: 00000000000003c0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1 Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021 RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en] Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202 RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380 RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980 R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990 FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]
To address the issue, we now call bnxt_set_rx_skb_mode() within bnxt_change_mtu() to properly set the AGG rings configuration and update rx_skb_func based on the new MTU value. Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on the current MTU.
Fixes: 08450ea98ae9 ("bnxt_en: Fix max_mtu setting for multi-buf XDP") Co-developed-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Shravya KN shravya.k-n@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4041,7 +4041,7 @@ int bnxt_set_rx_skb_mode(struct bnxt *bp struct net_device *dev = bp->dev;
if (page_mode) { - bp->flags &= ~BNXT_FLAG_AGG_RINGS; + bp->flags &= ~(BNXT_FLAG_AGG_RINGS | BNXT_FLAG_NO_AGG_RINGS); bp->flags |= BNXT_FLAG_RX_PAGE_MODE;
if (bp->xdp_prog->aux->xdp_has_frags) @@ -12799,6 +12799,14 @@ static int bnxt_change_mtu(struct net_de bnxt_close_nic(bp, true, false);
dev->mtu = new_mtu; + + /* MTU change may change the AGG ring settings if an XDP multi-buffer + * program is attached. We need to set the AGG rings settings and + * rx_skb_func accordingly. + */ + if (READ_ONCE(bp->xdp_prog)) + bnxt_set_rx_skb_mode(bp, true); + bnxt_set_ring_params(bp);
if (netif_running(dev))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
commit 4e13d3a9c25b7080f8a619f961e943fe08c2672c upstream.
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in __ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags instead of testing HDRINCL on the socket to avoid a race condition which causes uninit-value access.
Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()") Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1985,7 +1985,8 @@ struct sk_buff *__ip6_make_skb(struct so struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb)); u8 icmp6_type;
- if (sk->sk_socket->type == SOCK_RAW && !inet_sk(sk)->hdrincl) + if (sk->sk_socket->type == SOCK_RAW && + !(fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH)) icmp6_type = fl6->fl6_icmp_type; else icmp6_type = icmp6_hdr(skb)->icmp6_type;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
commit fc1092f51567277509563800a3c56732070b6aa4 upstream.
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Shigeru Yoshida syoshida@redhat.com Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_output.c | 3 ++- net/ipv4/raw.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1580,7 +1580,8 @@ struct sk_buff *__ip_make_skb(struct soc * so icmphdr does not in skb linear region and can not get icmp_type * by icmp_hdr(skb)->type. */ - if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl) + if (sk->sk_type == SOCK_RAW && + !(fl4->flowi4_flags & FLOWI_FLAG_KNOWN_NH)) icmp_type = fl4->fl4_icmp_type; else icmp_type = icmp_hdr(skb)->type; --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -608,6 +608,9 @@ static int raw_sendmsg(struct sock *sk, (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), daddr, saddr, 0, 0, sk->sk_uid);
+ fl4.fl4_icmp_type = 0; + fl4.fl4_icmp_code = 0; + if (!hdrincl) { rfv.msg = msg; rfv.hlen = 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Théo Lebrun theo.lebrun@bootlin.com
commit 32ce3bb57b6b402de2aec1012511e7ac4e7449dc upstream.
dev_get_drvdata() gets used to acquire the pointer to cqspi and the SPI controller. Neither embed the other; this lead to memory corruption.
On a given platform (Mobileye EyeQ5) the memory corruption is hidden inside cqspi->f_pdata. Also, this uninitialised memory is used as a mutex (ctlr->bus_lock_mutex) by spi_controller_suspend().
Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations") Reviewed-by: Dhruva Gole d-gole@ti.com Signed-off-by: Théo Lebrun theo.lebrun@bootlin.com Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-1-6b6af8bcbf59@bootlin.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Zhaoyang Li lizy04@hust.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi-cadence-quadspi.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/spi/spi-cadence-quadspi.c +++ b/drivers/spi/spi-cadence-quadspi.c @@ -1775,10 +1775,9 @@ static int cqspi_remove(struct platform_ static int cqspi_suspend(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); - struct spi_master *master = dev_get_drvdata(dev); int ret;
- ret = spi_master_suspend(master); + ret = spi_master_suspend(cqspi->master); cqspi_controller_enable(cqspi, 0);
clk_disable_unprepare(cqspi->clk); @@ -1789,7 +1788,6 @@ static int cqspi_suspend(struct device * static int cqspi_resume(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); - struct spi_master *master = dev_get_drvdata(dev);
clk_prepare_enable(cqspi->clk); cqspi_wait_idle(cqspi); @@ -1798,7 +1796,7 @@ static int cqspi_resume(struct device *d cqspi->current_cs = -1; cqspi->sclk = 0;
- return spi_master_resume(master); + return spi_master_resume(cqspi->master); }
static DEFINE_SIMPLE_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend, cqspi_resume);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 4aaffc85751da5722e858e4333e8cf0aa4b6c78f upstream.
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM.
v2: Move suspend flags out of pm notifier (Mario)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 06f2dcc241e7e5c681f81fbc46cacdf4bfd7d6d7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +++++------------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +--------- 2 files changed, 6 insertions(+), 22 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4199,28 +4199,20 @@ static int amdgpu_device_evict_resources * @data: data * * This function is called when the system is about to suspend or hibernate. - * It is used to evict resources from the device before the system goes to - * sleep while there is still access to swap. + * It is used to set the appropriate flags so that eviction can be optimized + * in the pm prepare callback. */ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, void *data) { struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb); - int r;
switch (mode) { case PM_HIBERNATION_PREPARE: adev->in_s4 = true; - fallthrough; - case PM_SUSPEND_PREPARE: - r = amdgpu_device_evict_resources(adev); - /* - * This is considered non-fatal at this time because - * amdgpu_device_prepare() will also fatally evict resources. - * See https://gitlab.freedesktop.org/drm/amd/-/issues/3781 - */ - if (r) - drm_warn(adev_to_drm(adev), "Failed to evict resources, freeze active processes if problems occur: %d\n", r); + break; + case PM_POST_HIBERNATION: + adev->in_s4 = false; break; }
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2480,13 +2480,8 @@ static int amdgpu_pmops_freeze(struct de static int amdgpu_pmops_thaw(struct device *dev) { struct drm_device *drm_dev = dev_get_drvdata(dev); - struct amdgpu_device *adev = drm_to_adev(drm_dev); - int r;
- r = amdgpu_device_resume(drm_dev, true); - adev->in_s4 = false; - - return r; + return amdgpu_device_resume(drm_dev, true); }
static int amdgpu_pmops_poweroff(struct device *dev) @@ -2499,9 +2494,6 @@ static int amdgpu_pmops_poweroff(struct static int amdgpu_pmops_restore(struct device *dev) { struct drm_device *drm_dev = dev_get_drvdata(dev); - struct amdgpu_device *adev = drm_to_adev(drm_dev); - - adev->in_s4 = false;
return amdgpu_device_resume(drm_dev, true); }
On 5/20/25 06:49, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On 5/20/25 07:49, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 5/20/25 06:49, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, 20 May 2025 15:49:25 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.1: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 115 tests: 115 pass, 0 fail
Linux version: 6.1.140-rc1-g1fb2f21fca77 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Tue, 20 May 2025 at 19:26, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.1.140-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 1fb2f21fca7786e898e99a8b92606fa61519c261 * git describe: v6.1.139-98-g1fb2f21fca77 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.13...
## Test Regressions (compared to v6.1.138-97-g03bf4e168bff)
## Metric Regressions (compared to v6.1.138-97-g03bf4e168bff)
## Test Fixes (compared to v6.1.138-97-g03bf4e168bff)
## Metric Fixes (compared to v6.1.138-97-g03bf4e168bff)
## Test result summary total: 159745, pass: 140741, fail: 4152, skip: 14627, xfail: 225
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 133 total, 133 passed, 0 failed * arm64: 41 total, 41 passed, 0 failed * i386: 21 total, 19 passed, 2 failed * mips: 26 total, 22 passed, 4 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 32 total, 31 passed, 1 failed * riscv: 11 total, 11 passed, 0 failed * s390: 14 total, 14 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 7 total, 7 passed, 0 failed * x86_64: 33 total, 31 passed, 2 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-exec * kselftest-fpu * kselftest-futex * kselftest-intel_pstate * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-mincore * kselftest-mqueue * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-sigaltstack * kselftest-size * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * lava * libgpiod * libhugetlbfs * log-parser-boot * log-parser-build-clang * log-parser-build-gcc * log-parser-test * ltp-capability * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * modules * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On Tue, May 20, 2025 at 03:49:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
Am 20.05.2025 um 15:49 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
Hi!
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
The kernel, bpf tool and perf tool builds fine for v6.1.140-rc1 on x86 and arm64 Azure VM.
Tested-by: Hardik Garg hargar@linux.microsoft.com
Thanks, Hardik
On 5/20/2025 6:49 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.140 release. There are 97 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 May 2025 12:57:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.140-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.140-rc1
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix pm notifier handling
Théo Lebrun theo.lebrun@bootlin.com spi: cadence-qspi: fix pointer reference in runtime PM hooks
Shigeru Yoshida syoshida@redhat.com ipv4: Fix uninit-value access in __ip_make_skb()
Shigeru Yoshida syoshida@redhat.com ipv6: Fix potential uninit-value access in __ip6_make_skb()
Shravya KN shravya.k-n@broadcom.com bnxt_en: Fix receive ring space parameters when XDP is active
Maciej S. Szmigiero mail@maciej.szmigiero.name platform/x86/amd/pmc: Only disable IRQ1 wakeup where i8042 actually enabled it
Mark Brown broonie@kernel.org arm64/sme: Always exit sme_alloc() early with existing storage
Florian Westphal fw@strlen.de netfilter: nf_tables: do not defer rule destruction via call_rcu
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: wait for rcu grace period on net_device removal
Florian Westphal fw@strlen.de netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx
Filipe Manana fdmanana@suse.com btrfs: don't BUG_ON() when 0 reference count at btrfs_lookup_extent_info()
Eric Dumazet edumazet@google.com sctp: add mutual exclusion in proc_sctp_do_udp_port()
Ma Wupeng mawupeng1@huawei.com hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Huacai Chen chenhuacai@kernel.org LoongArch: Explicitly specify code model in Makefile
Peter Collingbourne pcc@google.com bpf, arm64: Fix address emission with tag-based KASAN enabled
Puranjay Mohan puranjay@kernel.org bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
Xu Lu luxu.kernel@bytedance.com riscv: mm: Fix the out of bound issue of vmemmap address
Byungchul Park byungchul@sk.com mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Feng Tang feng.tang@linux.alibaba.com selftests/mm: compaction_test: support platform with huge mount of memory
GONG Ruiqi gongruiqi1@huawei.com usb: typec: fix pm usage counter imbalance in ucsi_ccg_sync_control()
Dan Carpenter dan.carpenter@linaro.org usb: typec: fix potential array underflow in ucsi_ccg_sync_control()
RD Babiera rdbabiera@google.com usb: typec: altmodes/displayport: create sysfs nodes as driver's default device attribute group
Andrei Kuchynski akuchynski@chromium.org usb: typec: ucsi: displayport: Fix deadlock
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_alloc
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing cleanups in cleanup internals
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: Add missing cleanup for early error out in idxd_setup_internals
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines
Shuai Xue xueshuai@linux.alibaba.com dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs
Yemike Abhilash Chandra y-abhilashchandra@ti.com dmaengine: ti: k3-udma: Use cap_mask directly from dma_device structure instead of a local copy
Ronald Wahl ronald.wahl@legrand.com dmaengine: ti: k3-udma: Add missing locking
Nathan Chancellor nathan@kernel.org net: qede: Initialize qede_ll_ops with designated initializer
Fedor Pchelkin pchelkin@ispras.ru wifi: mt76: disable napi on driver removal
Jethro Donaldson devel@jro.nz smb: client: fix memory leak during error handling for POSIX mkdir
Steve Siwinski ssiwinski@atto.com scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com phy: renesas: rcar-gen3-usb2: Set timing registers only once
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bind
Ma Ke make24@iscas.ac.cn phy: Fix error handling in tegra_xusb_port_init
Steven Rostedt rostedt@goodmis.org tracing: samples: Initialize trace_array_printk() with the correct function
pengdonglin pengdonglin@xiaomi.com ftrace: Fix preemption accounting for stacktrace filter command
pengdonglin pengdonglin@xiaomi.com ftrace: Fix preemption accounting for stacktrace trigger command
Michael Kelley mhklinux@outlook.com Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
Michael Kelley mhklinux@outlook.com Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
Michael Kelley mhklinux@outlook.com hv_netvsc: Remove rmsg_pgcnt
Michael Kelley mhklinux@outlook.com hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
Michael Kelley mhklinux@outlook.com hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
Hyejeong Choi hjeong.choi@samsung.com dma-buf: insert memory barrier before updating num_fences
Nicolas Chauvet kwizart@gmail.com ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera
Christian Heusel christian@heusel.eu ALSA: usb-audio: Add sample rate quirk for Audioengine D1
Wentao Liang vulab@iscas.ac.cn ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()
Jeremy Linton jeremy.linton@arm.com ACPI: PPTT: Fix processor subtable walk
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Avoid flooding unnecessary info messages
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Correct the reply value when AUX write incomplete
Filipe Manana fdmanana@suse.com btrfs: fix discard worker infinite loop after disabling discard
Huacai Chen chenhuacai@kernel.org LoongArch: Fix MAX_REG_OFFSET calculation
Nathan Lynch nathan.lynch@amd.com dmaengine: Revert "dmaengine: dmatest: Fix dmatest waiting less when interrupted"
Trond Myklebust trond.myklebust@hammerspace.com NFSv4/pnfs: Reset the layout state after a layoutreturn
Pengtao He hept.hept.hept@gmail.com net/tls: fix kernel panic when alloc_page failed
Subbaraya Sundeep sbhatta@marvell.com octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy
Cosmin Tanislav demonsingur@gmail.com regulator: max20086: fix invalid memory access
Abdun Nihaal abdun.nihaal@gmail.com qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd()
Carolina Jubran cjubran@nvidia.com net/mlx5e: Disable MACsec offload for uplink representor profile
Geert Uytterhoeven geert+renesas@glider.be ALSA: sh: SND_AICA should depend on SH_DMA_API
Keith Busch kbusch@kernel.org nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
Kees Cook kees@kernel.org nvme-pci: make nvme_pci_npages_prp() __always_inline
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENING
Mathieu Othacehe othacehe@gnu.org net: cadence: macb: Fix a possible deadlock in macb_halt_tx.
Andrew Jeffery andrew@codeconstruct.com.au net: mctp: Ensure keys maintain only one ref to corresponding dev
Cong Wang xiyou.wangcong@gmail.com net_sched: Flush gso_skb list too during ->change()
Geert Uytterhoeven geert+renesas@glider.be spi: loopback-test: Do not split 1024-byte hexdumps
Li Lingfeng lilingfeng3@huawei.com nfs: handle failure of nfs_get_lock_context in unlock path
Henry Martin bsdhenrymartin@gmail.com HID: uclogic: Add NULL check in uclogic_input_configured()
Qasim Ijaz qasdev00@gmail.com HID: thrustmaster: fix memory leak in thrustmaster_interrupts()
Zhu Yanjun yanjun.zhu@linux.dev RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug
Sebastian Andrzej Siewior bigeasy@linutronix.de clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
David Lechner dlechner@baylibre.com iio: chemical: sps30: use aligned_s64 for timestamp
Jonathan Cameron Jonathan.Cameron@huawei.com iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
Alex Deucher alexander.deucher@amd.com Revert "drm/amd: Stop evicting resources on APUs in suspend"
Mario Limonciello mario.limonciello@amd.com drm/amd: Add Suspend/Hibernate notification callback support
Zhigang Luo Zhigang.Luo@amd.com drm/amdgpu: trigger flr_work if reading pf2vf data failed
Ma Jun Jun.Ma2@amd.com drm/amdgpu: Fix the runtime resume failure issue
Mario Limonciello mario.limonciello@amd.com drm/amd: Stop evicting resources on APUs in suspend
Jonathan Cameron Jonathan.Cameron@huawei.com iio: adc: ad7266: Fix potential timestamp alignment issue.
Michal Suchanek msuchanek@suse.de tpm: tis: Double the timeout B to 4s
Masami Hiramatsu (Google) mhiramat@kernel.org tracing: probes: Fix a possible race in trace_probe_log APIs
Hans de Goede hdegoede@redhat.com platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection
Kees Cook kees@kernel.org binfmt_elf: Move brk for static PIE even if ASLR disabled
Kees Cook kees@kernel.org binfmt_elf: Honor PT_LOAD alignment for static PIE
Kees Cook kees@kernel.org binfmt_elf: Calculate total_size earlier
Kees Cook kees@kernel.org selftests/exec: Build both static and non-static load_address tests
Kees Cook keescook@chromium.org binfmt_elf: Leave a gap between .bss and brk
Muhammad Usama Anjum usama.anjum@collabora.com selftests/exec: load_address: conform test to TAP format output
Kees Cook keescook@chromium.org binfmt_elf: elf_bss no longer used by load_elf_binary()
Eric W. Biederman ebiederm@xmission.com binfmt_elf: Support segments with 0 filesz and misaligned starts
Kees Cook keescook@chromium.org binfmt: Fix whitespace issues
Diffstat:
Makefile | 4 +- arch/arm64/kernel/fpsimd.c | 6 +- arch/arm64/net/bpf_jit_comp.c | 12 +- arch/loongarch/Makefile | 2 +- arch/loongarch/include/asm/ptrace.h | 2 +- arch/riscv/include/asm/page.h | 1 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/mm/init.c | 17 +- block/bio.c | 2 +- drivers/acpi/pptt.c | 11 +- drivers/char/tpm/tpm_tis_core.h | 2 +- drivers/clocksource/i8253.c | 4 +- drivers/dma-buf/dma-resv.c | 5 +- drivers/dma/dmatest.c | 6 +- drivers/dma/idxd/init.c | 145 +++++++--- drivers/dma/ti/k3-udma.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 51 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 29 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 + drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 16 +- drivers/hid/hid-thrustmaster.c | 1 + drivers/hid/hid-uclogic-core.c | 7 +- drivers/hv/channel.c | 65 +---- drivers/iio/adc/ad7266.c | 2 +- drivers/iio/adc/ad7768-1.c | 2 +- drivers/iio/chemical/sps30.c | 2 +- drivers/infiniband/sw/rxe/rxe_cq.c | 5 +- drivers/net/dsa/sja1105/sja1105_main.c | 6 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +- drivers/net/ethernet/cadence/macb_main.c | 19 +- .../ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 + drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +- .../ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 7 +- drivers/net/hyperv/hyperv_net.h | 13 +- drivers/net/hyperv/netvsc.c | 57 +++- drivers/net/hyperv/netvsc_drv.c | 62 +---- drivers/net/hyperv/rndis_filter.c | 24 +- drivers/net/wireless/mediatek/mt76/dma.c | 1 + drivers/nvme/host/pci.c | 4 +- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 38 ++- drivers/phy/tegra/xusb.c | 8 +- drivers/platform/x86/amd/pmc.c | 8 +- drivers/platform/x86/asus-wmi.c | 3 +- drivers/regulator/max20086-regulator.c | 7 +- drivers/scsi/sd_zbc.c | 6 +- drivers/scsi/storvsc_drv.c | 1 + drivers/spi/spi-cadence-quadspi.c | 6 +- drivers/spi/spi-loopback-test.c | 2 +- drivers/usb/typec/altmodes/displayport.c | 18 +- drivers/usb/typec/ucsi/displayport.c | 19 +- drivers/usb/typec/ucsi/ucsi.c | 34 +++ drivers/usb/typec/ucsi/ucsi.h | 3 + drivers/usb/typec/ucsi/ucsi_ccg.c | 5 + fs/binfmt_elf.c | 293 ++++++++++++--------- fs/binfmt_elf_fdpic.c | 2 +- fs/btrfs/discard.c | 17 +- fs/btrfs/extent-tree.c | 25 +- fs/exec.c | 2 +- fs/nfs/nfs4proc.c | 9 +- fs/nfs/pnfs.c | 9 + fs/smb/client/smb2pdu.c | 2 +- include/linux/bio.h | 1 + include/linux/hyperv.h | 7 - include/linux/tpm.h | 2 +- include/net/netfilter/nf_tables.h | 3 +- include/net/sch_generic.h | 15 ++ include/uapi/linux/elf.h | 2 +- kernel/trace/trace_dynevent.c | 16 +- kernel/trace/trace_dynevent.h | 1 + kernel/trace/trace_events_trigger.c | 2 +- kernel/trace/trace_functions.c | 6 +- kernel/trace/trace_kprobe.c | 2 +- kernel/trace/trace_probe.c | 9 + kernel/trace/trace_uprobe.c | 2 +- mm/memory_hotplug.c | 6 +- mm/migrate.c | 8 + net/ipv4/ip_output.c | 3 +- net/ipv4/raw.c | 3 + net/ipv6/ip6_output.c | 3 +- net/mctp/route.c | 4 +- net/netfilter/nf_tables_api.c | 54 ++-- net/netfilter/nft_immediate.c | 2 +- net/sched/sch_codel.c | 2 +- net/sched/sch_fq.c | 2 +- net/sched/sch_fq_codel.c | 2 +- net/sched/sch_fq_pie.c | 2 +- net/sched/sch_hhf.c | 2 +- net/sched/sch_pie.c | 2 +- net/sctp/sysctl.c | 4 + net/tls/tls_strp.c | 3 +- samples/ftrace/sample-trace-array.c | 2 +- sound/pci/es1968.c | 6 +- sound/sh/Kconfig | 2 +- sound/usb/quirks.c | 4 + tools/testing/selftests/exec/Makefile | 19 +- tools/testing/selftests/exec/load_address.c | 83 ++++-- tools/testing/selftests/vm/compaction_test.c | 19 +- 103 files changed, 937 insertions(+), 530 deletions(-)
linux-stable-mirror@lists.linaro.org