This is the start of the stable review cycle for the 5.10.193 release.
There are 84 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.193-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.193-rc1
Oscar Salvador <osalvador(a)suse.de>
mm,hwpoison: fix printing of page flags
Bard Liao <yung-chuan.liao(a)linux.intel.com>
ASoC: Intel: sof_sdw: include rt711.h for RT711 JD mode
Miaohe Lin <linmiaohe(a)huawei.com>
mm: memory-failure: fix unexpected return value in soft_offline_page()
Kefeng Wang <wangkefeng.wang(a)huawei.com>
mm: memory-failure: kill soft_offline_free_page()
Dan Williams <dan.j.williams(a)intel.com>
mm: fix page reference leak in soft_offline_page()
Oscar Salvador <osalvador(a)suse.de>
mm,hwpoison: refactor get_any_page
Rob Clark <robdclark(a)chromium.org>
dma-buf/sw_sync: Avoid recursive lock during fence signal
Biju Das <biju.das.jz(a)bp.renesas.com>
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
Biju Das <biju.das.jz(a)bp.renesas.com>
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: raid_class: Remove raid_component_add()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix double free in snic_tgt_create()
Shuming Fan <shumingf(a)realtek.com>
ASoC: rt711: add two jack detection modes
Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
drm/i915: Fix premature release of request's reusable memory
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
cgroup/cpuset: Free DL BW in case can_attach() fails
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
sched/deadline: Create DL BW alloc, free & check overflow interface
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Iterate only if DEADLINE tasks are present
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Bring back cpuset_mutex
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Nicholas Piggin <npiggin(a)gmail.com>
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
Frederic Weisbecker <frederic(a)kernel.org>
tick: Detect and fix jiffies update stall
Joel Fernandes (Google) <joel(a)joelfernandes.org>
torture: Fix hang during kthread shutdown phase
Feng Tang <feng.tang(a)intel.com>
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Ankit Nautiyal <ankit.k.nautiyal(a)intel.com>
drm/display/dp: Fix the DP DSC Receiver cap size
Zack Rusin <zackr(a)vmware.com>
drm/vmwgfx: Fix shader stage validation
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Wei Chen <harperchen1110(a)gmail.com>
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Rob Herring <robh(a)kernel.org>
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
Arnd Bergmann <arnd(a)arndb.de>
radix tree: remove unused variable
Helge Deller <deller(a)gmx.de>
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Hold rtnl lock during MTU update via netlink
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't increase MTU when set by user
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Trigger events for auto adjusted MTU
Christian Göttsche <cgzones(a)googlemail.com>
selinux: set next pointer before attaching to list
Benjamin Coddington <bcodding(a)redhat.com>
nfsd: Fix race to FREE_STATEID and cl_revoked
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix a use after free in nfs_direct_join_group()
Alexandre Ghiti <alexghiti(a)rivosinc.com>
mm: add a call to flush_cache_vmap() in vmap_pfn()
Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
clk: Fix slab-out-of-bounds error in devm_clk_release()
Benjamin Coddington <bcodding(a)redhat.com>
NFSv4: Fix dropped lock for racing OPEN and delegation return
Michael Ellerman <mpe(a)ellerman.id.au>
ibmveth: Use dcbf rather than dcbfl
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: fix macvlan over alb bond support
Jakub Kicinski <kuba(a)kernel.org>
net: remove bond_slave_has_mac_rcu()
Ido Schimmel <idosch(a)nvidia.com>
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Florent Fourcot <florent.fourcot(a)wifirst.fr>
rtnetlink: return ENODEV when ifname does not exist and group is given
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix out of memory error handling
Jamal Hadi Salim <jhs(a)mojatatu.com>
net/sched: fix a qdisc modification with ambiguous command request
Alessio Igor Bogani <alessio.bogani(a)elettra.eu>
igb: Avoid starting unnecessary workqueues
Jesse Brandeburg <jesse.brandeburg(a)intel.com>
ice: fix receive buffer size miscalculation
Jakub Kicinski <kuba(a)kernel.org>
net: validate veth and vxcan peer ifindexes
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bcmgenet: Fix return value check for fixed_phy_register()
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bgmac: Fix return value check for fixed_phy_register()
Lu Wei <luwei32(a)huawei.com>
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
Eric Dumazet <edumazet(a)google.com>
dccp: annotate data-races in dccp_poll()
Eric Dumazet <edumazet(a)google.com>
sock: annotate data-races around prot->memory_pressure
Hariprasad Kelam <hkelam(a)marvell.com>
octeontx2-af: SDP: fix receive link config
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix memleak due to race between current_tracer and trace
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Ilya Dryomov <idryomov(a)gmail.com>
rbd: prevent busy loop when requesting exclusive lock
Ilya Dryomov <idryomov(a)gmail.com>
rbd: retrieve and check lock owner twice before blocklisting
Ilya Dryomov <idryomov(a)gmail.com>
rbd: make get_lock_owner_info() return a single locker or NULL
Ilya Dryomov <idryomov(a)gmail.com>
libceph, rbd: ignore addr->type while comparing in some cases
Taimur Hassan <syed.hassan(a)amd.com>
drm/amd/display: check TG is non-null before checking if enabled
Josip Pavic <Josip.Pavic(a)amd.com>
drm/amd/display: do not wait for mpc idle if tg is disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
Mikulas Patocka <mpatocka(a)redhat.com>
dm integrity: reduce vmalloc space footprint on 32-bit architectures
Mikulas Patocka <mpatocka(a)redhat.com>
dm integrity: increase RECALC_SECTORS to improve recalculate speed
Zhang Shurong <zhang_shurong(a)foxmail.com>
fbdev: fix potential OOB read in fast_imageblit()
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Fix sys_imageblit() for arbitrary image widths
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Improve performance of sys_imageblit()
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Enable octeon_cache by cpu_type
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: fix mismatch of plock results from userspace
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: use dlm_plock_info for do_unlock_close
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: change plock interrupted message to debug again
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: add pid to debug log
Jakob Koschel <jakobkoschel(a)gmail.com>
dlm: replace usage of found with dedicated list iterator variable
Alexander Aring <aahringo(a)redhat.com>
dlm: improve plock logging if interrupted
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Reassign resources on bridge if necessary
Chuck Lever <chuck.lever(a)oracle.com>
xprtrdma: Remap Receive buffers after a reconnect
Fedor Pchelkin <pchelkin(a)ispras.ru>
NFSv4: fix out path in __nfs4_get_acl_uncached
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fix SRSO mess
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/cpu-features.h | 21 ++-
arch/x86/kernel/fpu/xstate.c | 8 +
drivers/block/rbd.c | 139 ++++++++++++------
drivers/clk/clk-devres.c | 13 +-
drivers/dma-buf/sw_sync.c | 18 +--
.../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 +-
drivers/gpu/drm/i915/i915_active.c | 99 +++++++++----
drivers/gpu/drm/i915/i915_request.c | 2 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 13 ++
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 29 ++--
drivers/md/dm-integrity.c | 4 +-
drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 2 +
drivers/net/bonding/bond_alb.c | 6 +-
drivers/net/can/vxcan.c | 7 +-
drivers/net/ethernet/broadcom/bgmac.c | 2 +-
drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 3 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 24 +--
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +-
drivers/net/ipvlan/ipvlan_main.c | 3 +-
drivers/net/veth.c | 5 +-
drivers/of/dynamic.c | 31 ++--
drivers/pci/hotplug/acpiphp_glue.c | 9 +-
drivers/pinctrl/renesas/pinctrl-rza2.c | 17 ++-
drivers/scsi/raid_class.c | 48 ------
drivers/scsi/snic/snic_disc.c | 3 +-
drivers/video/fbdev/core/sysimgblt.c | 64 +++++++-
fs/dlm/lock.c | 53 ++++---
fs/dlm/plock.c | 89 ++++++++----
fs/dlm/recover.c | 39 +++--
fs/nfs/direct.c | 26 ++--
fs/nfs/nfs4proc.c | 14 +-
fs/nfsd/nfs4state.c | 2 +-
include/drm/drm_dp_helper.h | 2 +-
include/linux/ceph/msgr.h | 9 +-
include/linux/clk.h | 80 +++++-----
include/linux/cpuset.h | 12 +-
include/linux/raid_class.h | 4 -
include/linux/sched.h | 4 +-
include/net/bonding.h | 25 +---
include/net/rtnetlink.h | 4 +-
include/net/sock.h | 7 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 161 +++++++++++++++------
kernel/sched/core.c | 41 +++---
kernel/sched/deadline.c | 66 +++++++--
kernel/sched/sched.h | 2 +-
kernel/time/tick-sched.c | 29 +++-
kernel/time/tick-sched.h | 4 +
kernel/torture.c | 2 +-
kernel/trace/trace.c | 15 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +
lib/clz_ctz.c | 32 +---
lib/radix-tree.c | 1 -
mm/memory-failure.c | 134 ++++++++---------
mm/vmalloc.c | 4 +
net/batman-adv/bat_v_elp.c | 3 +-
net/batman-adv/bat_v_ogm.c | 7 +-
net/batman-adv/hard-interface.c | 14 +-
net/batman-adv/netlink.c | 3 +
net/batman-adv/soft-interface.c | 3 +
net/batman-adv/translation-table.c | 1 -
net/batman-adv/types.h | 6 +
net/ceph/mon_client.c | 6 +-
net/core/rtnetlink.c | 43 +++++-
net/dccp/proto.c | 20 ++-
net/netfilter/nft_set_pipapo.c | 13 +-
net/sched/sch_api.c | 53 +++++--
net/sctp/socket.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 9 +-
security/selinux/ss/policydb.c | 2 +-
sound/core/pcm_memory.c | 44 +++++-
sound/soc/codecs/rt711-sdw.h | 2 +
sound/soc/codecs/rt711.c | 30 ++++
sound/soc/codecs/rt711.h | 29 +++-
sound/soc/intel/boards/sof_sdw.c | 23 +--
sound/soc/intel/boards/sof_sdw_common.h | 5 -
tools/objtool/arch.h | 1 +
tools/objtool/arch/x86/decode.c | 11 +-
tools/objtool/check.c | 22 ++-
tools/objtool/elf.h | 1 +
84 files changed, 1140 insertions(+), 668 deletions(-)
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> vmlinuz-6.3.7-200.fc38.x86_64 and vmlinuz-6.3.8-200.fc38.x86_64
>
> in the switch tv channel to another vlc stay blocked and and the loading of firmware newer ending. power must be switch to off.
> from dmesg:
>
> dvb-tuner-si2141-a10-01.fw
> dvb-demod-si2168-d60-01.fw
> firmware version: D 6.0.13
>
> restart of dvb modules is impossible, They stay working.
>
> kernel 6.3.5-200.fc38.x86_64 works OK. I don't how works kernel 6.3.6-200.fc38.x86_64
>
> this same error is in vmlinuz-6.3.5-200.fc38.x86_64 and earliers but there is the occurence accesptable.
>
> listing from 6.3.5 kernel:
>
> [ 3378.026035] si2168 7-0064: firmware version: D 6.0.13
> [ 3431.891579] si2168 7-0064: downloading firmware from file 'dvb-demod-si2168-d60-01.fw'
> [ 3434.037278] usb 1-4: dvb_usb_v2: 2nd usb_bulk_msg() failed=-110
> [ 3434.037289] si2168 7-0064: firmware download failed -110
> [ 3436.085265] usb 1-4: dvb_usb_v2: usb_bulk_msg() failed=-110
>
> lsmod |grep dvb
> dvb_usb_dvbsky 32768 2
> dvb_usb_v2 57344 1 dvb_usb_dvbsky
> m88ds3103 49152 1 dvb_usb_dvbsky
> dvb_core 192512 3 dvb_usb_v2,m88ds3103,dvb_usb_dvbsky
> m
>
> modprobe -r dvb_usb_dvbsky
> modprobe: FATAL: Module dvb_usb_dvbsky is in use.
>
> ?????? The problem in dvb_usb_dvbsky or firmware ?????
>
> The answer from https://bugzilla.rpmfusion.org was:
>
> If it was working with previous kernel this is likely a kernel regression.
> please forward to the dvb sub-system maintainers.
>
> this problen insist a long time (years), but not too horible
See Bugzilla for the full thread and attached dmesg and lsusb.
Thorsten: The reporter can't perform bisection for he has problem
compiling his own vanilla kernel (see comment 8 on the Bugzilla
ticket, albeit with translated make(1) output). Can you take a look on
it?
Anyway, I'm adding it to regzbot (as stable-specific regression
for now):
#regzbot introduced: v6.3.5..v6.3.7 https://bugzilla.kernel.org/show_bug.cgi?id=217566
#regzbot title: switching TV channel causes VLC and firmware loading hang
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217566
--
An old man doll... just what I always wanted! - Clara
Hi Greg,
would you please queue up this upstream patch:
0a6b58c5cd0d ("lockdep: fix static memory detection even more")
to stable kernels 6.1, 6.4 and 6.5 ?
Thanks,
Helge
From 0a6b58c5cd0dfd7961e725212f0fc8dfc5d96195 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 15 Aug 2023 00:31:09 +0200
Subject: [PATCH] lockdep: fix static memory detection even more
On the parisc architecture, lockdep reports for all static objects which
are in the __initdata section (e.g. "setup_done" in devtmpfs,
"kthreadd_done" in init/main.c) this warning:
INFO: trying to register non-static key.
The warning itself is wrong, because those objects are in the __initdata
section, but the section itself is on parisc outside of range from
_stext to _end, which is why the static_obj() functions returns a wrong
answer.
While fixing this issue, I noticed that the whole existing check can
be simplified a lot.
Instead of checking against the _stext and _end symbols (which include
code areas too) just check for the .data and .bss segments (since we check a
data object). This can be done with the existing is_kernel_core_data()
macro.
In addition objects in the __initdata section can be checked with
init_section_contains(), and is_kernel_rodata() allows keys to be in the
_ro_after_init section.
This partly reverts and simplifies commit bac59d18c701 ("x86/setup: Fix static
memory detection").
Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100
Fixes: bac59d18c701 ("x86/setup: Fix static memory detection")
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index a6e8373a5170..3fa87e5e11ab 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -2,8 +2,6 @@
#ifndef _ASM_X86_SECTIONS_H
#define _ASM_X86_SECTIONS_H
-#define arch_is_kernel_initmem_freed arch_is_kernel_initmem_freed
-
#include <asm-generic/sections.h>
#include <asm/extable.h>
@@ -18,20 +16,4 @@ extern char __end_of_kernel_reserve[];
extern unsigned long _brk_start, _brk_end;
-static inline bool arch_is_kernel_initmem_freed(unsigned long addr)
-{
- /*
- * If _brk_start has not been cleared, brk allocation is incomplete,
- * and we can not make assumptions about its use.
- */
- if (_brk_start)
- return 0;
-
- /*
- * After brk allocation is complete, space between _brk_end and _end
- * is available for allocation.
- */
- return addr >= _brk_end && addr < (unsigned long)&_end;
-}
-
#endif /* _ASM_X86_SECTIONS_H */
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 111607d91489..e85b5ad3e206 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -819,34 +819,26 @@ static int very_verbose(struct lock_class *class)
* Is this the address of a static object:
*/
#ifdef __KERNEL__
-/*
- * Check if an address is part of freed initmem. After initmem is freed,
- * memory can be allocated from it, and such allocations would then have
- * addresses within the range [_stext, _end].
- */
-#ifndef arch_is_kernel_initmem_freed
-static int arch_is_kernel_initmem_freed(unsigned long addr)
-{
- if (system_state < SYSTEM_FREEING_INITMEM)
- return 0;
-
- return init_section_contains((void *)addr, 1);
-}
-#endif
-
static int static_obj(const void *obj)
{
- unsigned long start = (unsigned long) &_stext,
- end = (unsigned long) &_end,
- addr = (unsigned long) obj;
+ unsigned long addr = (unsigned long) obj;
- if (arch_is_kernel_initmem_freed(addr))
- return 0;
+ if (is_kernel_core_data(addr))
+ return 1;
+
+ /*
+ * keys are allowed in the __ro_after_init section.
+ */
+ if (is_kernel_rodata(addr))
+ return 1;
/*
- * static variable?
+ * in initdata section and used during bootup only?
+ * NOTE: On some platforms the initdata section is
+ * outside of the _stext ... _end range.
*/
- if ((addr >= start) && (addr < end))
+ if (system_state < SYSTEM_FREEING_INITMEM &&
+ init_section_contains((void *)addr, 1))
return 1;
/*
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 147f04b14adde831eb4a0a1e378667429732f9e8 ]
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS: 0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tick_nohz_dep_set_cpu+0x59/0x70
rcu_exp_wait_wake+0x54e/0x870
? sync_rcu_exp_select_cpus+0x1fc/0x390
process_one_work+0x1ef/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x28/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0x115/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---
This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree_exp.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 401c1f331caf..07a284a18645 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -507,7 +507,10 @@ static void synchronize_rcu_expedited_wait(void)
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_disable();
+ if (cpu_online(cpu))
+ tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_enable();
}
}
j = READ_ONCE(jiffies_till_first_fqs);
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Hi!
On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
>
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.
Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.
If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).
> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
>
> ~# uname -r
> 6.1.0-10-amd64
>
> demesg shows the following error message:
>
> [ 3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [ 3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1 Debian 6.1.38-2
> [ 3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [ 3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [ 3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [ 3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246
> [ 3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX:
> 0000000000041eb0
> [ 3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI:
> ffff9899c048bcf0
> [ 3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09:
> 0000000000005b76
> [ 3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12:
> 00000000ffffffea
> [ 3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15:
> 0000000000000000
> [ 3.560549] FS: 0000000000000000(0000) GS:ffff88e123c00000(0000)
> knlGS:0000000000000000
> [ 3.560551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4:
> 00000000000006f0
> [ 3.560554] Call Trace:
> [ 3.560558] <TASK>
> [ 3.560560] ? __warn+0x7d/0xc0
> [ 3.560566] ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560671] ? report_bug+0xe6/0x170
> [ 3.560675] ? handle_bug+0x41/0x70
> [ 3.560679] ? exc_invalid_op+0x13/0x60
> [ 3.560681] ? asm_exc_invalid_op+0x16/0x20
> [ 3.560685] ? init_reset_begun+0x20/0x20 [nouveau]
> [ 3.560769] ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560888] nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [ 3.560997] nv50_disp_super+0x113/0x210 [nouveau]
> [ 3.561103] process_one_work+0x1c7/0x380
> [ 3.561109] worker_thread+0x4d/0x380
> [ 3.561113] ? rescuer_thread+0x3a0/0x3a0
> [ 3.561116] kthread+0xe9/0x110
> [ 3.561120] ? kthread_complete_and_exit+0x20/0x20
> [ 3.561122] ret_from_fork+0x22/0x30
> [ 3.561130] </TASK>
>
> Further information:
>
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
> Subsystem: Dell Latitude E6510
> Flags: bus master, fast devsel, latency 0, IRQ 27
> Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
> Memory at d0000000 (64-bit, prefetchable) [size=256M]
> Memory at e0000000 (64-bit, prefetchable) [size=32M]
> I/O ports at 7000 [size=128]
> Expansion ROM at 000c0000 [disabled] [size=128K]
> Capabilities: <access denied>
> Kernel driver in use: nouveau
> Kernel modules: nouveau
>
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
>
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
>
> I build a new 6.1.38 kernel with these commits reverted:
>
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> 5a144bad3e75 nouveau: fix client work fence deletion race
>
> With that kernel the graphic works again.
>
> Please inform me if further tests are required.
FWIW, to be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.1.27..v6.1.38
#regzbot title drm/nouveau: display stays black
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
Hi Greg,
On 06.06.23 08:45, Greg KH wrote:
>>
>> Lino, it looks like this regression is caused by (backported) commit of yours.
>> Would you like to take a look on it?
>>
>> Anyway, telling regzbot:
>>
>> #regzbot introduced: 51162b05a44cb5
>
> There's some tpm backports to 5.15.y that were suspect and I'll look
> into reverting them and see if this was one of the ones that was on that
> list. Give me a few days...
>
Could you please consider to apply (mainline) commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded
interrupt handler") to 5.15.y?
As Chris confirmed it fixes the regression caused by 51162b05a44cb5 ("tpm, tpm_tis: Claim locality
before writing interrupt registers").
Commit 0c7e66e5fd69 is also needed for 5.10.y, 6.1.y and 6.3.y.
Best regards,
Lino
From: Walter Chang <walter.chang(a)mediatek.com>
Due to the fact that the use of `writeq_relaxed()` to program CVAL is
not guaranteed to be atomic, it is necessary to disable the timer before
programming CVAL.
However, if the MMIO timer is already enabled and has not yet expired,
there is a possibility of unexpected behavior occurring: when the CPU
enters the idle state during this period, and if the CPU's local event
is earlier than the broadcast event, the following process occurs:
tick_broadcast_enter()
tick_broadcast_oneshot_control(TICK_BROADCAST_ENTER)
__tick_broadcast_oneshot_control()
___tick_broadcast_oneshot_control()
tick_broadcast_set_event()
clockevents_program_event()
set_next_event_mem()
During this process, the MMIO timer remains enabled while programming
CVAL. To prevent such behavior, disable timer explicitly prior to
programming CVAL.
Fixes: 8b82c4f883a7 ("clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to CVAL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Walter Chang <walter.chang(a)mediatek.com>
---
drivers/clocksource/arm_arch_timer.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index e733a2a1927a..7dd2c615bce2 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -792,6 +792,13 @@ static __always_inline void set_next_event_mem(const int access, unsigned long e
u64 cnt;
ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
+
+ /* Timer must be disabled before programming CVAL */
+ if (ctrl & ARCH_TIMER_CTRL_ENABLE) {
+ ctrl &= ~ARCH_TIMER_CTRL_ENABLE;
+ arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
+ }
+
ctrl |= ARCH_TIMER_CTRL_ENABLE;
ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
--
2.18.0
Hi Greg and Sasha,
Please consider applying commit 9451c79bc39e ("powerpc/pmac/smp: Avoid
unused-variable warnings") to 5.4, as it resolves a build failure that
we see building ppc64_guest_defconfig with clang due to arch/powerpc
compiling with -Werror by default:
arch/powerpc/platforms/powermac/smp.c:664:26: error: unused variable 'core99_l2_cache' [-Werror,-Wunused-variable]
664 | volatile static long int core99_l2_cache;
| ^~~~~~~~~~~~~~~
arch/powerpc/platforms/powermac/smp.c:665:26: error: unused variable 'core99_l3_cache' [-Werror,-Wunused-variable]
665 | volatile static long int core99_l3_cache;
| ^~~~~~~~~~~~~~~
2 errors generated.
I have verified that it applies cleanly and does not appear to have any
direct follow up fixes, although commit a4037d1f1fc4 ("powerpc/pmac/smp:
Drop unnecessary volatile qualifier") was in the same area around the
same time so maybe it makes sense to take that one as well but I don't
think it has any functional impact.
Cheers,
Nathan
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 253f3399f4c09ce6f4e67350f839be0361b4d5ff
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083002-unguarded-radiance-bab8@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 253f3399f4c09ce6f4e67350f839be0361b4d5ff Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Tue, 22 Aug 2023 12:02:03 -0700
Subject: [PATCH] Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED
This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate
that LE Coded PHY shall not be used, it is then set for some Intel
models that claim to support it but when used causes many problems.
Cc: stable(a)vger.kernel.org # 6.4.y+
Link: https://github.com/bluez/bluez/issues/577
Link: https://github.com/bluez/bluez/issues/582
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=d…
Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c
index 9b239ce96fa4..2462796a512a 100644
--- a/drivers/bluetooth/btintel.c
+++ b/drivers/bluetooth/btintel.c
@@ -2787,6 +2787,9 @@ static int btintel_setup_combined(struct hci_dev *hdev)
set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED,
&hdev->quirks);
+ /* These variants don't seem to support LE Coded PHY */
+ set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+
/* Setup MSFT Extension support */
btintel_set_msft_opcode(hdev, ver.hw_variant);
@@ -2858,6 +2861,9 @@ static int btintel_setup_combined(struct hci_dev *hdev)
*/
set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
+ /* These variants don't seem to support LE Coded PHY */
+ set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+
/* Set Valid LE States quirk */
set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks);
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index c58425d4c592..87d92accc26e 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -319,6 +319,16 @@ enum {
* This quirk must be set before hci_register_dev is called.
*/
HCI_QUIRK_USE_MSFT_EXT_ADDRESS_FILTER,
+
+ /*
+ * When this quirk is set, LE Coded PHY shall not be used. This is
+ * required for some Intel controllers which erroneously claim to
+ * support it but it causes problems with extended scanning.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_BROKEN_LE_CODED,
};
/* HCI device flags */
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 6e2988b11f99..e6359f7346f1 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1817,7 +1817,9 @@ void hci_conn_del_sysfs(struct hci_conn *conn);
#define scan_2m(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_2M) || \
((dev)->le_rx_def_phys & HCI_LE_SET_PHY_2M))
-#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED))
+#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED) && \
+ !test_bit(HCI_QUIRK_BROKEN_LE_CODED, \
+ &(dev)->quirks))
#define scan_coded(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_CODED) || \
((dev)->le_rx_def_phys & HCI_LE_SET_PHY_CODED))
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 0cb780817198..9b93653c6197 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4668,7 +4668,10 @@ static const struct {
"advertised, but not supported."),
HCI_QUIRK_BROKEN(SET_RPA_TIMEOUT,
"HCI LE Set Random Private Address Timeout command is "
- "advertised, but not supported.")
+ "advertised, but not supported."),
+ HCI_QUIRK_BROKEN(LE_CODED,
+ "HCI LE Coded PHY feature bit is set, "
+ "but its usage is not supported.")
};
/* This function handles hdev setup stage:
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 253f3399f4c09ce6f4e67350f839be0361b4d5ff
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083002-resilient-washed-2807@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 253f3399f4c09ce6f4e67350f839be0361b4d5ff Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Tue, 22 Aug 2023 12:02:03 -0700
Subject: [PATCH] Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED
This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate
that LE Coded PHY shall not be used, it is then set for some Intel
models that claim to support it but when used causes many problems.
Cc: stable(a)vger.kernel.org # 6.4.y+
Link: https://github.com/bluez/bluez/issues/577
Link: https://github.com/bluez/bluez/issues/582
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=d…
Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c
index 9b239ce96fa4..2462796a512a 100644
--- a/drivers/bluetooth/btintel.c
+++ b/drivers/bluetooth/btintel.c
@@ -2787,6 +2787,9 @@ static int btintel_setup_combined(struct hci_dev *hdev)
set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED,
&hdev->quirks);
+ /* These variants don't seem to support LE Coded PHY */
+ set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+
/* Setup MSFT Extension support */
btintel_set_msft_opcode(hdev, ver.hw_variant);
@@ -2858,6 +2861,9 @@ static int btintel_setup_combined(struct hci_dev *hdev)
*/
set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
+ /* These variants don't seem to support LE Coded PHY */
+ set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+
/* Set Valid LE States quirk */
set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks);
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index c58425d4c592..87d92accc26e 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -319,6 +319,16 @@ enum {
* This quirk must be set before hci_register_dev is called.
*/
HCI_QUIRK_USE_MSFT_EXT_ADDRESS_FILTER,
+
+ /*
+ * When this quirk is set, LE Coded PHY shall not be used. This is
+ * required for some Intel controllers which erroneously claim to
+ * support it but it causes problems with extended scanning.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_BROKEN_LE_CODED,
};
/* HCI device flags */
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 6e2988b11f99..e6359f7346f1 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1817,7 +1817,9 @@ void hci_conn_del_sysfs(struct hci_conn *conn);
#define scan_2m(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_2M) || \
((dev)->le_rx_def_phys & HCI_LE_SET_PHY_2M))
-#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED))
+#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED) && \
+ !test_bit(HCI_QUIRK_BROKEN_LE_CODED, \
+ &(dev)->quirks))
#define scan_coded(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_CODED) || \
((dev)->le_rx_def_phys & HCI_LE_SET_PHY_CODED))
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 0cb780817198..9b93653c6197 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4668,7 +4668,10 @@ static const struct {
"advertised, but not supported."),
HCI_QUIRK_BROKEN(SET_RPA_TIMEOUT,
"HCI LE Set Random Private Address Timeout command is "
- "advertised, but not supported.")
+ "advertised, but not supported."),
+ HCI_QUIRK_BROKEN(LE_CODED,
+ "HCI LE Coded PHY feature bit is set, "
+ "but its usage is not supported.")
};
/* This function handles hdev setup stage:
check_clock doesn't account for vfe_lite which means that vfe_lite will
never get validated by this routine. Add the clock name to the expected set
to remediate.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
---
drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
index 938f373bcd1fd..b021f81cef123 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe.c
@@ -535,7 +535,8 @@ static int vfe_check_clock_rates(struct vfe_device *vfe)
struct camss_clock *clock = &vfe->clock[i];
if (!strcmp(clock->name, "vfe0") ||
- !strcmp(clock->name, "vfe1")) {
+ !strcmp(clock->name, "vfe1") ||
+ !strcmp(clock->name, "vfe_lite")) {
u64 min_rate = 0;
unsigned long rate;
--
2.41.0
There are two problems with the current vfe_disable_output() routine.
Firstly we rightly use a spinlock to protect output->gen2.active_num
everywhere except for in the IDLE timeout path of vfe_disable_output().
Even if that is not racy "in practice" somehow it is by happenstance not
by design.
Secondly we do not get consistent behaviour from this routine. On
sc8280xp 50% of the time I get "VFE idle timeout - resetting". In this
case the subsequent capture will succeed. The other 50% of the time, we
don't hit the idle timeout, never do the VFE reset and subsequent
captures stall indefinitely.
Rewrite the vfe_disable_output() routine to
- Quiesce write masters with vfe_wm_stop()
- Set active_num = 0
remembering to hold the spinlock when we do so followed by
- Reset the VFE
Testing on sc8280xp and sdm845 shows this to be a valid fix.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
.../media/platform/qcom/camss/camss-vfe-170.c | 22 +++----------------
1 file changed, 3 insertions(+), 19 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe-170.c b/drivers/media/platform/qcom/camss/camss-vfe-170.c
index 02494c89da91c..168baaa80d4e6 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe-170.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe-170.c
@@ -7,7 +7,6 @@
* Copyright (C) 2020-2021 Linaro Ltd.
*/
-#include <linux/delay.h>
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/iopoll.h>
@@ -494,35 +493,20 @@ static int vfe_enable_output(struct vfe_line *line)
return 0;
}
-static int vfe_disable_output(struct vfe_line *line)
+static void vfe_disable_output(struct vfe_line *line)
{
struct vfe_device *vfe = to_vfe(line);
struct vfe_output *output = &line->output;
unsigned long flags;
unsigned int i;
- bool done;
- int timeout = 0;
-
- do {
- spin_lock_irqsave(&vfe->output_lock, flags);
- done = !output->gen2.active_num;
- spin_unlock_irqrestore(&vfe->output_lock, flags);
- usleep_range(10000, 20000);
-
- if (timeout++ == 100) {
- dev_err(vfe->camss->dev, "VFE idle timeout - resetting\n");
- vfe_reset(vfe);
- output->gen2.active_num = 0;
- return 0;
- }
- } while (!done);
spin_lock_irqsave(&vfe->output_lock, flags);
for (i = 0; i < output->wm_num; i++)
vfe_wm_stop(vfe, output->wm_idx[i]);
+ output->gen2.active_num = 0;
spin_unlock_irqrestore(&vfe->output_lock, flags);
- return 0;
+ vfe_reset(vfe);
}
/*
--
2.41.0
We need to make sure camss_configure_pd() happens before
camss_register_entities() as the vfe_get() path relies on the pointer
provided by camss_configure_pd().
Fix the ordering sequence in probe to ensure the pointers vfe_get() demands
are present by the time camss_register_entities() runs.
In order to facilitate backporting to stable kernels I've moved the
configure_pd() call pretty early on the probe() function so that
irrespective of the existence of the old error handling jump labels this
patch should still apply to -next circa Aug 2023 to v5.13 inclusive.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
---
drivers/media/platform/qcom/camss/camss.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index f11dc59135a5a..75991d849b571 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -1619,6 +1619,12 @@ static int camss_probe(struct platform_device *pdev)
if (ret < 0)
goto err_cleanup;
+ ret = camss_configure_pd(camss);
+ if (ret < 0) {
+ dev_err(dev, "Failed to configure power domains: %d\n", ret);
+ goto err_cleanup;
+ }
+
ret = camss_init_subdevices(camss);
if (ret < 0)
goto err_cleanup;
@@ -1678,12 +1684,6 @@ static int camss_probe(struct platform_device *pdev)
}
}
- ret = camss_configure_pd(camss);
- if (ret < 0) {
- dev_err(dev, "Failed to configure power domains: %d\n", ret);
- return ret;
- }
-
pm_runtime_enable(dev);
return 0;
--
2.41.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5f641174a12b8a876a4101201a21ef4675ecc014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083048-startup-unreached-148d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5f641174a12b ("ACPI: thermal: Drop nocrt parameter")
f86b15a1e654 ("ACPI: thermal: Clean up printing messages")
7f49a5cb94e6 ("ACPI: thermal: switch to use <linux/units.h> helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f641174a12b8a876a4101201a21ef4675ecc014 Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Wed, 12 Jul 2023 12:24:59 -0500
Subject: [PATCH] ACPI: thermal: Drop nocrt parameter
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce3078 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..2de235d52fac 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6243,10 +6243,6 @@
-1: disable all critical trip points in all thermal zones
<degrees C>: override all critical trip points
- thermal.nocrt= [HW,ACPI]
- Set to disable actions on ACPI thermal zone
- critical and hot trip points.
-
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index f9f6ebb08fdb..3163a40f02e3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -82,10 +82,6 @@ static int tzp;
module_param(tzp, int, 0444);
MODULE_PARM_DESC(tzp, "Thermal zone polling frequency, in 1/10 seconds.");
-static int nocrt;
-module_param(nocrt, int, 0);
-MODULE_PARM_DESC(nocrt, "Set to take no action upon ACPI thermal zone critical trips points.");
-
static int off;
module_param(off, int, 0);
MODULE_PARM_DESC(off, "Set to disable ACPI thermal support.");
@@ -1094,7 +1090,7 @@ static int thermal_act(const struct dmi_system_id *d) {
static int thermal_nocrt(const struct dmi_system_id *d) {
pr_notice("%s detected: disabling all critical thermal trip point actions.\n",
d->ident);
- nocrt = 1;
+ crt = -1;
return 0;
}
static int thermal_tzp(const struct dmi_system_id *d) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5f641174a12b8a876a4101201a21ef4675ecc014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083047-alive-reburial-8310@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5f641174a12b ("ACPI: thermal: Drop nocrt parameter")
f86b15a1e654 ("ACPI: thermal: Clean up printing messages")
7f49a5cb94e6 ("ACPI: thermal: switch to use <linux/units.h> helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f641174a12b8a876a4101201a21ef4675ecc014 Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Wed, 12 Jul 2023 12:24:59 -0500
Subject: [PATCH] ACPI: thermal: Drop nocrt parameter
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce3078 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..2de235d52fac 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6243,10 +6243,6 @@
-1: disable all critical trip points in all thermal zones
<degrees C>: override all critical trip points
- thermal.nocrt= [HW,ACPI]
- Set to disable actions on ACPI thermal zone
- critical and hot trip points.
-
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index f9f6ebb08fdb..3163a40f02e3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -82,10 +82,6 @@ static int tzp;
module_param(tzp, int, 0444);
MODULE_PARM_DESC(tzp, "Thermal zone polling frequency, in 1/10 seconds.");
-static int nocrt;
-module_param(nocrt, int, 0);
-MODULE_PARM_DESC(nocrt, "Set to take no action upon ACPI thermal zone critical trips points.");
-
static int off;
module_param(off, int, 0);
MODULE_PARM_DESC(off, "Set to disable ACPI thermal support.");
@@ -1094,7 +1090,7 @@ static int thermal_act(const struct dmi_system_id *d) {
static int thermal_nocrt(const struct dmi_system_id *d) {
pr_notice("%s detected: disabling all critical thermal trip point actions.\n",
d->ident);
- nocrt = 1;
+ crt = -1;
return 0;
}
static int thermal_tzp(const struct dmi_system_id *d) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5f641174a12b8a876a4101201a21ef4675ecc014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083046-dawn-shaking-47c2@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5f641174a12b ("ACPI: thermal: Drop nocrt parameter")
f86b15a1e654 ("ACPI: thermal: Clean up printing messages")
7f49a5cb94e6 ("ACPI: thermal: switch to use <linux/units.h> helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f641174a12b8a876a4101201a21ef4675ecc014 Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Wed, 12 Jul 2023 12:24:59 -0500
Subject: [PATCH] ACPI: thermal: Drop nocrt parameter
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce3078 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..2de235d52fac 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6243,10 +6243,6 @@
-1: disable all critical trip points in all thermal zones
<degrees C>: override all critical trip points
- thermal.nocrt= [HW,ACPI]
- Set to disable actions on ACPI thermal zone
- critical and hot trip points.
-
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index f9f6ebb08fdb..3163a40f02e3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -82,10 +82,6 @@ static int tzp;
module_param(tzp, int, 0444);
MODULE_PARM_DESC(tzp, "Thermal zone polling frequency, in 1/10 seconds.");
-static int nocrt;
-module_param(nocrt, int, 0);
-MODULE_PARM_DESC(nocrt, "Set to take no action upon ACPI thermal zone critical trips points.");
-
static int off;
module_param(off, int, 0);
MODULE_PARM_DESC(off, "Set to disable ACPI thermal support.");
@@ -1094,7 +1090,7 @@ static int thermal_act(const struct dmi_system_id *d) {
static int thermal_nocrt(const struct dmi_system_id *d) {
pr_notice("%s detected: disabling all critical thermal trip point actions.\n",
d->ident);
- nocrt = 1;
+ crt = -1;
return 0;
}
static int thermal_tzp(const struct dmi_system_id *d) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5f641174a12b8a876a4101201a21ef4675ecc014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023083040-spiritism-sphere-d595@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5f641174a12b ("ACPI: thermal: Drop nocrt parameter")
f86b15a1e654 ("ACPI: thermal: Clean up printing messages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f641174a12b8a876a4101201a21ef4675ecc014 Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Wed, 12 Jul 2023 12:24:59 -0500
Subject: [PATCH] ACPI: thermal: Drop nocrt parameter
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce3078 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..2de235d52fac 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6243,10 +6243,6 @@
-1: disable all critical trip points in all thermal zones
<degrees C>: override all critical trip points
- thermal.nocrt= [HW,ACPI]
- Set to disable actions on ACPI thermal zone
- critical and hot trip points.
-
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index f9f6ebb08fdb..3163a40f02e3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -82,10 +82,6 @@ static int tzp;
module_param(tzp, int, 0444);
MODULE_PARM_DESC(tzp, "Thermal zone polling frequency, in 1/10 seconds.");
-static int nocrt;
-module_param(nocrt, int, 0);
-MODULE_PARM_DESC(nocrt, "Set to take no action upon ACPI thermal zone critical trips points.");
-
static int off;
module_param(off, int, 0);
MODULE_PARM_DESC(off, "Set to disable ACPI thermal support.");
@@ -1094,7 +1090,7 @@ static int thermal_act(const struct dmi_system_id *d) {
static int thermal_nocrt(const struct dmi_system_id *d) {
pr_notice("%s detected: disabling all critical thermal trip point actions.\n",
d->ident);
- nocrt = 1;
+ crt = -1;
return 0;
}
static int thermal_tzp(const struct dmi_system_id *d) {
This is the start of the stable review cycle for the 5.15.129 release.
There are 89 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.129-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.129-rc1
Rik van Riel <riel(a)surriel.com>
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Miaohe Lin <linmiaohe(a)huawei.com>
mm: memory-failure: fix unexpected return value in soft_offline_page()
Kefeng Wang <wangkefeng.wang(a)huawei.com>
mm: memory-failure: kill soft_offline_free_page()
Rob Clark <robdclark(a)chromium.org>
dma-buf/sw_sync: Avoid recursive lock during fence signal
Biju Das <biju.das.jz(a)bp.renesas.com>
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
Biju Das <biju.das.jz(a)bp.renesas.com>
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: raid_class: Remove raid_component_add()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix double free in snic_tgt_create()
Oliver Hartkopp <socketcan(a)hartkopp.net>
can: raw: add missing refcount for memory leak fix
Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
drm/i915: Fix premature release of request's reusable memory
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
cgroup/cpuset: Free DL BW in case can_attach() fails
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
sched/deadline: Create DL BW alloc, free & check overflow interface
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Iterate only if DEADLINE tasks are present
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Bring back cpuset_mutex
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Joel Fernandes (Google) <joel(a)joelfernandes.org>
torture: Fix hang during kthread shutdown phase
Christian Brauner <brauner(a)kernel.org>
nfsd: use vfs setgid helper
Christian Brauner <brauner(a)kernel.org>
nfs: use vfs setgid helper
Feng Tang <feng.tang(a)intel.com>
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Rick Edgecombe <rick.p.edgecombe(a)intel.com>
x86/fpu: Invalidate FPU state correctly on exec()
Ankit Nautiyal <ankit.k.nautiyal(a)intel.com>
drm/display/dp: Fix the DP DSC Receiver cap size
Zack Rusin <zackr(a)vmware.com>
drm/vmwgfx: Fix shader stage validation
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Wei Chen <harperchen1110(a)gmail.com>
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Rob Herring <robh(a)kernel.org>
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
Rob Herring <robh(a)kernel.org>
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
Arnd Bergmann <arnd(a)arndb.de>
radix tree: remove unused variable
Helge Deller <deller(a)gmx.de>
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Hold rtnl lock during MTU update via netlink
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't increase MTU when set by user
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Trigger events for auto adjusted MTU
Christian Göttsche <cgzones(a)googlemail.com>
selinux: set next pointer before attaching to list
Benjamin Coddington <bcodding(a)redhat.com>
nfsd: Fix race to FREE_STATEID and cl_revoked
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix a use after free in nfs_direct_join_group()
Alexandre Ghiti <alexghiti(a)rivosinc.com>
mm: add a call to flush_cache_vmap() in vmap_pfn()
Takashi Iwai <tiwai(a)suse.de>
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
clk: Fix slab-out-of-bounds error in devm_clk_release()
Benjamin Coddington <bcodding(a)redhat.com>
NFSv4: Fix dropped lock for racing OPEN and delegation return
Michael Ellerman <mpe(a)ellerman.id.au>
ibmveth: Use dcbf rather than dcbfl
Sean Christopherson <seanjc(a)google.com>
Revert "KVM: x86: enable TDP MMU by default"
Ivan Mikhaylov <fr0st61te(a)gmail.com>
net/ncsi: change from ndo_set_mac_address to dev_set_mac_address
Ivan Mikhaylov <fr0st61te(a)gmail.com>
net/ncsi: make one oem_gma function for all mfr id
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: fix macvlan over alb bond support
Jakub Kicinski <kuba(a)kernel.org>
net: remove bond_slave_has_mac_rcu()
Ido Schimmel <idosch(a)nvidia.com>
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Florent Fourcot <florent.fourcot(a)wifirst.fr>
rtnetlink: return ENODEV when ifname does not exist and group is given
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix out of memory error handling
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: flush pending destroy work before netlink notifier
Jamal Hadi Salim <jhs(a)mojatatu.com>
net/sched: fix a qdisc modification with ambiguous command request
Sasha Neftin <sasha.neftin(a)intel.com>
igc: Fix the typo in the PTM Control macro
Alessio Igor Bogani <alessio.bogani(a)elettra.eu>
igb: Avoid starting unnecessary workqueues
Jesse Brandeburg <jesse.brandeburg(a)intel.com>
ice: fix receive buffer size miscalculation
Jakub Kicinski <kuba(a)kernel.org>
net: validate veth and vxcan peer ifindexes
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bcmgenet: Fix return value check for fixed_phy_register()
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bgmac: Fix return value check for fixed_phy_register()
Lu Wei <luwei32(a)huawei.com>
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
Eric Dumazet <edumazet(a)google.com>
dccp: annotate data-races in dccp_poll()
Eric Dumazet <edumazet(a)google.com>
sock: annotate data-races around prot->memory_pressure
Hariprasad Kelam <hkelam(a)marvell.com>
octeontx2-af: SDP: fix receive link config
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix memleak due to race between current_tracer and trace
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Eric Dumazet <edumazet(a)google.com>
can: raw: fix lockdep issue in raw_release()
Taimur Hassan <syed.hassan(a)amd.com>
drm/amd/display: check TG is non-null before checking if enabled
Josip Pavic <Josip.Pavic(a)amd.com>
drm/amd/display: do not wait for mpc idle if tg is disabled
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: raw: fix receiver memory leak
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: fix a race when checking checkpoint buffer busy
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: remove journal_clean_one_cp_list()
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: remove t_checkpoint_io_list
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
Zhang Shurong <zhang_shurong(a)foxmail.com>
fbdev: fix potential OOB read in fast_imageblit()
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Fix sys_imageblit() for arbitrary image widths
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Improve performance of sys_imageblit()
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Enable octeon_cache by cpu_type
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: fix mismatch of plock results from userspace
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: use dlm_plock_info for do_unlock_close
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: change plock interrupted message to debug again
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: add pid to debug log
Jakob Koschel <jakobkoschel(a)gmail.com>
dlm: replace usage of found with dedicated list iterator variable
Alexander Aring <aahringo(a)redhat.com>
dlm: improve plock logging if interrupted
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Reassign resources on bridge if necessary
Chuck Lever <chuck.lever(a)oracle.com>
xprtrdma: Remap Receive buffers after a reconnect
Fedor Pchelkin <pchelkin(a)ispras.ru>
NFSv4: fix out path in __nfs4_get_acl_uncached
Fedor Pchelkin <pchelkin(a)ispras.ru>
NFSv4.2: fix error handling in nfs42_proc_getxattr
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fix SRSO mess
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/cpu-features.h | 21 +-
arch/x86/include/asm/fpu/internal.h | 3 +-
arch/x86/kernel/fpu/core.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 7 +
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
drivers/clk/clk-devres.c | 13 +-
drivers/dma-buf/sw_sync.c | 18 +-
.../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 +-
drivers/gpu/drm/i915/i915_active.c | 99 ++++++---
drivers/gpu/drm/i915/i915_request.c | 2 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 ++
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 29 +--
drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 2 +
drivers/net/bonding/bond_alb.c | 6 +-
drivers/net/can/vxcan.c | 7 +-
drivers/net/ethernet/broadcom/bgmac.c | 2 +-
drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 3 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 24 +--
drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +-
drivers/net/ipvlan/ipvlan_main.c | 3 +-
drivers/net/veth.c | 5 +-
drivers/of/dynamic.c | 31 +--
drivers/of/kexec.c | 4 +-
drivers/of/unittest.c | 4 +-
drivers/pci/hotplug/acpiphp_glue.c | 9 +-
drivers/pinctrl/renesas/pinctrl-rza2.c | 17 +-
drivers/scsi/raid_class.c | 48 -----
drivers/scsi/snic/snic_disc.c | 3 +-
drivers/video/fbdev/core/sysimgblt.c | 64 +++++-
fs/attr.c | 1 +
fs/dlm/lock.c | 53 +++--
fs/dlm/plock.c | 89 +++++---
fs/dlm/recover.c | 39 ++--
fs/internal.h | 2 -
fs/jbd2/checkpoint.c | 165 ++++++---------
fs/jbd2/commit.c | 3 +-
fs/jbd2/transaction.c | 17 +-
fs/nfs/direct.c | 26 ++-
fs/nfs/inode.c | 4 +-
fs/nfs/nfs42proc.c | 5 +-
fs/nfs/nfs4proc.c | 14 +-
fs/nfsd/nfs4state.c | 2 +-
fs/nfsd/vfs.c | 4 +-
include/drm/drm_dp_helper.h | 2 +-
include/linux/clk.h | 80 +++----
include/linux/cpuset.h | 12 +-
include/linux/fs.h | 2 +
include/linux/jbd2.h | 7 +-
include/linux/raid_class.h | 4 -
include/linux/sched.h | 4 +-
include/net/bonding.h | 25 +--
include/net/rtnetlink.h | 4 +-
include/net/sock.h | 7 +-
include/trace/events/jbd2.h | 12 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 232 ++++++++++++++-------
kernel/sched/core.c | 41 ++--
kernel/sched/deadline.c | 66 ++++--
kernel/sched/sched.h | 2 +-
kernel/torture.c | 2 +-
kernel/trace/trace.c | 15 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +
lib/clz_ctz.c | 32 +--
lib/radix-tree.c | 1 -
mm/memory-failure.c | 21 +-
mm/vmalloc.c | 4 +
net/batman-adv/bat_v_elp.c | 3 +-
net/batman-adv/bat_v_ogm.c | 7 +-
net/batman-adv/hard-interface.c | 14 +-
net/batman-adv/netlink.c | 3 +
net/batman-adv/soft-interface.c | 3 +
net/batman-adv/translation-table.c | 1 -
net/batman-adv/types.h | 6 +
net/can/raw.c | 76 ++++---
net/core/rtnetlink.c | 43 +++-
net/dccp/proto.c | 20 +-
net/ncsi/ncsi-rsp.c | 93 ++-------
net/netfilter/nf_tables_api.c | 2 +-
net/netfilter/nft_set_pipapo.c | 13 +-
net/sched/sch_api.c | 53 +++--
net/sctp/socket.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 9 +-
security/selinux/ss/policydb.c | 2 +-
sound/core/pcm_memory.c | 44 +++-
sound/pci/ymfpci/ymfpci.c | 10 +-
tools/objtool/arch/x86/decode.c | 11 +-
tools/objtool/check.c | 22 +-
tools/objtool/include/objtool/arch.h | 1 +
tools/objtool/include/objtool/elf.h | 1 +
94 files changed, 1070 insertions(+), 834 deletions(-)
The following commit has been merged into the smp/urgent branch of tip:
Commit-ID: 2b8272ff4a70b866106ae13c36be7ecbef5d5da2
Gitweb: https://git.kernel.org/tip/2b8272ff4a70b866106ae13c36be7ecbef5d5da2
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 23 Aug 2023 10:47:02 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 30 Aug 2023 12:24:22 +02:00
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
CPU1 CPU2
T1 sets cfs_quota
starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
...
'period_timer' expires and is re-enqueued on CPU1
...
take_cpu_down()
CPU1 shuts down and does not handle timers
anymore. They have to be migrated in the
post dead hotplug steps by the control task.
T1 runs the post dead offline operation
T1 is scheduled out
T1 waits for 'period_timer' to expire
T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().
Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.
Reported-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Yu Liao <liaoyu15(a)huawei.com>
Acked-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
---
kernel/cpu.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f6811c8..6de7c6b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1487,8 +1487,22 @@ out:
return ret;
}
+struct cpu_down_work {
+ unsigned int cpu;
+ enum cpuhp_state target;
+};
+
+static long __cpu_down_maps_locked(void *arg)
+{
+ struct cpu_down_work *work = arg;
+
+ return _cpu_down(work->cpu, 0, work->target);
+}
+
static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
{
+ struct cpu_down_work work = { .cpu = cpu, .target = target, };
+
/*
* If the platform does not support hotplug, report it explicitly to
* differentiate it from a transient offlining failure.
@@ -1497,7 +1511,15 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
return -EOPNOTSUPP;
if (cpu_hotplug_disabled)
return -EBUSY;
- return _cpu_down(cpu, 0, target);
+
+ /*
+ * Ensure that the control task does not run on the to be offlined
+ * CPU to prevent a deadlock against cfs_b->period_timer.
+ */
+ cpu = cpumask_any_but(cpu_online_mask, cpu);
+ if (cpu >= nr_cpu_ids)
+ return -EBUSY;
+ return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
}
static int cpu_down(unsigned int cpu, enum cpuhp_state target)
This is the start of the stable review cycle for the 4.14.324 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.324-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.324-rc1
Rob Clark <robdclark(a)chromium.org>
dma-buf/sw_sync: Avoid recursive lock during fence signal
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: raid_class: Remove raid_component_add()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix double free in snic_tgt_create()
Ido Schimmel <idosch(a)nvidia.com>
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Feng Tang <feng.tang(a)intel.com>
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Wei Chen <harperchen1110(a)gmail.com>
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Helge Deller <deller(a)gmx.de>
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Trigger events for auto adjusted MTU
Michael Ellerman <mpe(a)ellerman.id.au>
ibmveth: Use dcbf rather than dcbfl
Sishuai Gong <sishuai.system(a)gmail.com>
ipvs: fix racy memcpy in proc_do_sync_threshold
Junwei Hu <hujunwei4(a)huawei.com>
ipvs: Improve robustness to the ipvs sysctl
Alessio Igor Bogani <alessio.bogani(a)elettra.eu>
igb: Avoid starting unnecessary workqueues
Eric Dumazet <edumazet(a)google.com>
sock: annotate data-races around prot->memory_pressure
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix memleak due to race between current_tracer and trace
Justin Chen <justin.chen(a)broadcom.com>
net: phy: broadcom: stub c45 read/write for 54810
Lin Ma <linma(a)zju.edu.cn>
net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
Jason Xing <kernelxing(a)tencent.com>
net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled
Kuniyuki Iwashima <kuniyu(a)amazon.com>
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
Zhang Shurong <zhang_shurong(a)foxmail.com>
ASoC: rt5665: add missed regulator_bulk_disable
Xin Long <lucien.xin(a)gmail.com>
netfilter: set default timeout to 3 secs for sctp shutdown send and recv state
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
test_firmware: prevent race conditions by a correct implementation of locking
Qi Zheng <zhengqi.arch(a)bytedance.com>
binder: fix memory leak in binder_init()
Tony Lindgren <tony(a)atomide.com>
serial: 8250: Fix oops for port->pm on uart_change_pm()
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
Russell Harmon via samba-technical <samba-technical(a)lists.samba.org>
cifs: Release folio lock on fscache read hit.
dengxiang <dengxiang(a)nfschina.com>
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
Eric Dumazet <edumazet(a)google.com>
net: do not allow gso_size to be set to GSO_BY_FRAGS
Abel Wu <wuyun.abel(a)bytedance.com>
sock: Fix misuse of sk_under_memory_pressure()
Andrii Staikov <andrii.staikov(a)intel.com>
i40e: fix misleading debug logs
Ziyang Xuan <william.xuanziyang(a)huawei.com>
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_dynset: disallow object maps
Lin Ma <linma(a)zju.edu.cn>
xfrm: add NULL check in xfrm_update_ae_params
Zhengchao Shao <shaozhengchao(a)huawei.com>
ip_vti: fix potential slab-use-after-free in decode_session6
Zhengchao Shao <shaozhengchao(a)huawei.com>
ip6_vti: fix slab-use-after-free in decode_session6
Lin Ma <linma(a)zju.edu.cn>
net: af_key: fix sadb_x_filter validation
Lin Ma <linma(a)zju.edu.cn>
net: xfrm: Fix xfrm_address_filter OOB read
Nathan Lynch <nathanl(a)linux.ibm.com>
powerpc/rtas_flash: allow user copy to flash block cache objects
Yuanjun Gong <ruc_gongyuanjun(a)163.com>
fbdev: mmp: fix value check in mmphw_probe()
shanzhulig <shanzhulig(a)gmail.com>
drm/amdgpu: Fix potential fence use-after-free v2
Zhengping Jiang <jiangzp(a)google.com>
Bluetooth: L2CAP: Fix use-after-free
Armin Wolf <W_Armin(a)gmx.de>
pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db()
Tuo Li <islituo(a)gmail.com>
gfs2: Fix possible data races in gfs2_show_options()
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: platform: mediatek: vpu: fix NULL ptr dereference
Yunfei Dong <yunfei.dong(a)mediatek.com>
media: v4l2-mem2mem: add lock to protect parameter num_rdy
Immad Mir <mirimmad17(a)gmail.com>
FS: JFS: Check for read-only mounted filesystem in txBegin
Immad Mir <mirimmad17(a)gmail.com>
FS: JFS: Fix null-ptr-deref Read in txBegin
Gustavo A. R. Silva <gustavoars(a)kernel.org>
MIPS: dec: prom: Address -Warray-bounds warning
Yogesh <yogi.kernel(a)gmail.com>
fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev
Jan Kara <jack(a)suse.cz>
udf: Fix uninitialized array access for some pathnames
Ye Bin <yebin10(a)huawei.com>
quota: fix warning in dqgrab()
Jan Kara <jack(a)suse.cz>
quota: Properly disable quotas when add_dquot_ref() fails
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: roll up loops in DSP setup code for Audigy
hackyzh002 <hackyzh002(a)gmail.com>
drm/radeon: Fix integer overflow in radeon_cs_parser_init
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Eliminate unused umul_ppmm definitions for MIPS
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/dec/prom.h | 2 +-
arch/powerpc/kernel/rtas_flash.c | 6 +-
arch/x86/kernel/fpu/xstate.c | 8 ++
drivers/android/binder.c | 1 +
drivers/android/binder_alloc.c | 6 ++
drivers/android/binder_alloc.h | 1 +
drivers/dma-buf/sw_sync.c | 18 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +
drivers/gpu/drm/radeon/radeon_cs.c | 3 +-
drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 2 +
drivers/media/platform/mtk-vpu/mtk_vpu.c | 6 +-
drivers/mmc/host/wbsd.c | 2 -
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_nvm.c | 16 +--
drivers/net/ethernet/intel/igb/igb_ptp.c | 24 ++---
drivers/net/phy/broadcom.c | 13 +++
drivers/net/team/team.c | 4 +-
drivers/pcmcia/rsrc_nonstatic.c | 2 +
drivers/scsi/raid_class.c | 48 ---------
drivers/scsi/snic/snic_disc.c | 3 +-
drivers/tty/serial/8250/8250_port.c | 1 +
drivers/video/fbdev/mmp/hw/mmp_ctrl.c | 4 +-
fs/cifs/file.c | 2 +-
fs/gfs2/super.c | 26 +++--
fs/jfs/jfs_dmap.c | 3 +
fs/jfs/jfs_txnmgr.c | 5 +
fs/jfs/namei.c | 5 +
fs/quota/dquot.c | 5 +-
fs/udf/unicode.c | 2 +-
include/linux/raid_class.h | 4 -
include/linux/virtio_net.h | 4 +
include/media/v4l2-mem2mem.h | 18 +++-
include/net/sock.h | 11 +-
kernel/trace/trace.c | 9 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +
lib/clz_ctz.c | 32 ++----
lib/mpi/longlong.h | 36 +------
lib/test_firmware.c | 39 +++++--
net/batman-adv/bat_v_elp.c | 3 +-
net/batman-adv/bat_v_ogm.c | 7 +-
net/batman-adv/hard-interface.c | 2 +-
net/batman-adv/translation-table.c | 1 -
net/bluetooth/l2cap_core.c | 5 +
net/core/rtnetlink.c | 5 +-
net/core/sock.c | 2 +-
net/ipv4/ip_vti.c | 4 +-
net/ipv4/tcp_timer.c | 4 +-
net/ipv6/ip6_vti.c | 4 +-
net/key/af_key.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 74 +++++++-------
net/netfilter/nf_conntrack_proto_sctp.c | 6 +-
net/netfilter/nft_dynset.c | 3 +
net/sctp/socket.c | 2 +-
net/unix/af_unix.c | 9 +-
net/xfrm/xfrm_user.c | 13 ++-
sound/pci/emu10k1/emufx.c | 112 ++-------------------
sound/soc/codecs/rt5665.c | 2 +
sound/usb/quirks-table.h | 29 ++++++
60 files changed, 325 insertions(+), 351 deletions(-)
From: Linus Torvalds <torvalds(a)linux-foundation.org>
[ upstream commit 5ef64cc8987a9211d3f3667331ba3411a94ddc79 ]
Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.
That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.
It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.
Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to. But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.
There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.
The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.
This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks. And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.
Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@…
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@…
Reported-and-tested-by: Michael Larabel <Michael(a)michaellarabel.com>
Tested-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Chris Mason <clm(a)fb.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
CC: <stable(a)vger.kernel.org> # 5.4
[ mheyne: fixed contextual conflict in mm/filemap.c due to missing
commit c7510ab2cf5c ("mm: abstract out wake_page_match() from
wake_page_function()"). Added WQ_FLAG_CUSTOM due to missing commit
7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem") ]
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
---
include/linux/mm.h | 2 +
include/linux/wait.h | 2 +
kernel/sysctl.c | 8 +++
mm/filemap.c | 160 ++++++++++++++++++++++++++++++++++---------
4 files changed, 141 insertions(+), 31 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d35c29d322d8..d14aba548ff4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -37,6 +37,8 @@ struct user_struct;
struct writeback_control;
struct bdi_writeback;
+extern int sysctl_page_lock_unfairness;
+
void init_mm_internals(void);
#ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7d04c1b588c7..03bff85e365f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -20,6 +20,8 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int
#define WQ_FLAG_EXCLUSIVE 0x01
#define WQ_FLAG_WOKEN 0x02
#define WQ_FLAG_BOOKMARK 0x04
+#define WQ_FLAG_CUSTOM 0x08
+#define WQ_FLAG_DONE 0x10
/*
* A single wait-queue entry structure:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index decabf5714c0..4f85f7ed42fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1563,6 +1563,14 @@ static struct ctl_table vm_table[] = {
.proc_handler = percpu_pagelist_fraction_sysctl_handler,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "page_lock_unfairness",
+ .data = &sysctl_page_lock_unfairness,
+ .maxlen = sizeof(sysctl_page_lock_unfairness),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
#ifdef CONFIG_MMU
{
.procname = "max_map_count",
diff --git a/mm/filemap.c b/mm/filemap.c
index adc27af737c6..f1ed0400c37c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1044,9 +1044,43 @@ struct wait_page_queue {
wait_queue_entry_t wait;
};
+/*
+ * The page wait code treats the "wait->flags" somewhat unusually, because
+ * we have multiple different kinds of waits, not just he usual "exclusive"
+ * one.
+ *
+ * We have:
+ *
+ * (a) no special bits set:
+ *
+ * We're just waiting for the bit to be released, and when a waker
+ * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up,
+ * and remove it from the wait queue.
+ *
+ * Simple and straightforward.
+ *
+ * (b) WQ_FLAG_EXCLUSIVE:
+ *
+ * The waiter is waiting to get the lock, and only one waiter should
+ * be woken up to avoid any thundering herd behavior. We'll set the
+ * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue.
+ *
+ * This is the traditional exclusive wait.
+ *
+ * (b) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM:
+ *
+ * The waiter is waiting to get the bit, and additionally wants the
+ * lock to be transferred to it for fair lock behavior. If the lock
+ * cannot be taken, we stop walking the wait queue without waking
+ * the waiter.
+ *
+ * This is the "fair lock handoff" case, and in addition to setting
+ * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see
+ * that it now has the lock.
+ */
static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *arg)
{
- int ret;
+ unsigned int flags;
struct wait_page_key *key = arg;
struct wait_page_queue *wait_page
= container_of(wait, struct wait_page_queue, wait);
@@ -1059,35 +1093,44 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
return 0;
/*
- * If it's an exclusive wait, we get the bit for it, and
- * stop walking if we can't.
- *
- * If it's a non-exclusive wait, then the fact that this
- * wake function was called means that the bit already
- * was cleared, and we don't care if somebody then
- * re-took it.
+ * If it's a lock handoff wait, we get the bit for it, and
+ * stop walking (and do not wake it up) if we can't.
*/
- ret = 0;
- if (wait->flags & WQ_FLAG_EXCLUSIVE) {
- if (test_and_set_bit(key->bit_nr, &key->page->flags))
+ flags = wait->flags;
+ if (flags & WQ_FLAG_EXCLUSIVE) {
+ if (test_bit(key->bit_nr, &key->page->flags))
return -1;
- ret = 1;
+ if (flags & WQ_FLAG_CUSTOM) {
+ if (test_and_set_bit(key->bit_nr, &key->page->flags))
+ return -1;
+ flags |= WQ_FLAG_DONE;
+ }
}
- wait->flags |= WQ_FLAG_WOKEN;
+ /*
+ * We are holding the wait-queue lock, but the waiter that
+ * is waiting for this will be checking the flags without
+ * any locking.
+ *
+ * So update the flags atomically, and wake up the waiter
+ * afterwards to avoid any races. This store-release pairs
+ * with the load-acquire in wait_on_page_bit_common().
+ */
+ smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
wake_up_state(wait->private, mode);
/*
* Ok, we have successfully done what we're waiting for,
* and we can unconditionally remove the wait entry.
*
- * Note that this has to be the absolute last thing we do,
- * since after list_del_init(&wait->entry) the wait entry
+ * Note that this pairs with the "finish_wait()" in the
+ * waiter, and has to be the absolute last thing we do.
+ * After this list_del_init(&wait->entry) the wait entry
* might be de-allocated and the process might even have
* exited.
*/
list_del_init_careful(&wait->entry);
- return ret;
+ return (flags & WQ_FLAG_EXCLUSIVE) != 0;
}
static void wake_up_page_bit(struct page *page, int bit_nr)
@@ -1167,8 +1210,8 @@ enum behavior {
};
/*
- * Attempt to check (or get) the page bit, and mark the
- * waiter woken if successful.
+ * Attempt to check (or get) the page bit, and mark us done
+ * if successful.
*/
static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
struct wait_queue_entry *wait)
@@ -1179,13 +1222,17 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
} else if (test_bit(bit_nr, &page->flags))
return false;
- wait->flags |= WQ_FLAG_WOKEN;
+ wait->flags |= WQ_FLAG_WOKEN | WQ_FLAG_DONE;
return true;
}
+/* How many times do we accept lock stealing from under a waiter? */
+int sysctl_page_lock_unfairness = 5;
+
static inline int wait_on_page_bit_common(wait_queue_head_t *q,
struct page *page, int bit_nr, int state, enum behavior behavior)
{
+ int unfairness = sysctl_page_lock_unfairness;
struct wait_page_queue wait_page;
wait_queue_entry_t *wait = &wait_page.wait;
bool thrashing = false;
@@ -1203,11 +1250,18 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
}
init_wait(wait);
- wait->flags = behavior == EXCLUSIVE ? WQ_FLAG_EXCLUSIVE : 0;
wait->func = wake_page_function;
wait_page.page = page;
wait_page.bit_nr = bit_nr;
+repeat:
+ wait->flags = 0;
+ if (behavior == EXCLUSIVE) {
+ wait->flags = WQ_FLAG_EXCLUSIVE;
+ if (--unfairness < 0)
+ wait->flags |= WQ_FLAG_CUSTOM;
+ }
+
/*
* Do one last check whether we can get the
* page bit synchronously.
@@ -1230,27 +1284,63 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
/*
* From now on, all the logic will be based on
- * the WQ_FLAG_WOKEN flag, and the and the page
- * bit testing (and setting) will be - or has
- * already been - done by the wake function.
+ * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to
+ * see whether the page bit testing has already
+ * been done by the wake function.
*
* We can drop our reference to the page.
*/
if (behavior == DROP)
put_page(page);
+ /*
+ * Note that until the "finish_wait()", or until
+ * we see the WQ_FLAG_WOKEN flag, we need to
+ * be very careful with the 'wait->flags', because
+ * we may race with a waker that sets them.
+ */
for (;;) {
+ unsigned int flags;
+
set_current_state(state);
- if (signal_pending_state(state, current))
+ /* Loop until we've been woken or interrupted */
+ flags = smp_load_acquire(&wait->flags);
+ if (!(flags & WQ_FLAG_WOKEN)) {
+ if (signal_pending_state(state, current))
+ break;
+
+ io_schedule();
+ continue;
+ }
+
+ /* If we were non-exclusive, we're done */
+ if (behavior != EXCLUSIVE)
break;
- if (wait->flags & WQ_FLAG_WOKEN)
+ /* If the waker got the lock for us, we're done */
+ if (flags & WQ_FLAG_DONE)
break;
- io_schedule();
+ /*
+ * Otherwise, if we're getting the lock, we need to
+ * try to get it ourselves.
+ *
+ * And if that fails, we'll have to retry this all.
+ */
+ if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+ goto repeat;
+
+ wait->flags |= WQ_FLAG_DONE;
+ break;
}
+ /*
+ * If a signal happened, this 'finish_wait()' may remove the last
+ * waiter from the wait-queues, but the PageWaiters bit will remain
+ * set. That's ok. The next wakeup will take care of it, and trying
+ * to do it here would be difficult and prone to races.
+ */
finish_wait(q, wait);
if (thrashing) {
@@ -1260,12 +1350,20 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
}
/*
- * A signal could leave PageWaiters set. Clearing it here if
- * !waitqueue_active would be possible (by open-coding finish_wait),
- * but still fail to catch it in the case of wait hash collision. We
- * already can fail to clear wait hash collision cases, so don't
- * bother with signals either.
+ * NOTE! The wait->flags weren't stable until we've done the
+ * 'finish_wait()', and we could have exited the loop above due
+ * to a signal, and had a wakeup event happen after the signal
+ * test but before the 'finish_wait()'.
+ *
+ * So only after the finish_wait() can we reliably determine
+ * if we got woken up or not, so we can now figure out the final
+ * return value based on that state without races.
+ *
+ * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive
+ * waiter, but an exclusive one requires WQ_FLAG_DONE.
*/
+ if (behavior == EXCLUSIVE)
+ return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
}
--
2.40.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
From: Song Shuai <suagrfillet(a)gmail.com>
The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names.
But if page mode is downgraded from kernel cmdline or restricted by
the hardware in 64BIT, it will give a wrong name.
Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD"
that should be "PGD":
0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V
So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it.
Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Signed-off-by: Song Shuai <suagrfillet(a)gmail.com>
Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
arch/riscv/mm/ptdump.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 20a9f991a6d7..e9090b38f811 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -384,6 +384,9 @@ static int __init ptdump_init(void)
kernel_ptd_info.base_addr = KERN_VIRT_START;
+ pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
+ pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
+
for (i = 0; i < ARRAY_SIZE(pg_level); i++)
for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
pg_level[i].mask |= pte_bits[j].mask;
--
2.41.0
There are instances where rcu_cpu_stall_reset() is called when jiffies
did not get a chance to update for a long time. Before jiffies is
updated, the CPU stall detector can go off triggering false-positives
where a just-started grace period appears to be ages old. In the past,
we disabled stall detection in rcu_cpu_stall_reset() however this got
changed [1]. This is resulting in false-positives in KGDB usecase [2].
Fix this by deferring the update of jiffies to the third run of the FQS
loop. This is more robust, as, even if rcu_cpu_stall_reset() is called
just before jiffies is read, we would end up pushing out the jiffies
read by 3 more FQS loops. Meanwhile the CPU stall detection will be
delayed and we will not get any false positives.
[1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.or…
[2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/
Tested with rcutorture.cpu_stall option as well to verify stall behavior
with/without patch.
Tested-by: Huacai Chen <chenhuacai(a)loongson.cn>
Reported-by: Huacai Chen <chenhuacai(a)loongson.cn>
Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/
Suggested-by: Paul McKenney <paulmck(a)kernel.org>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()")
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree.c | 12 ++++++++++++
kernel/rcu/tree.h | 4 ++++
kernel/rcu/tree_stall.h | 20 ++++++++++++++++++--
3 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1449cb69a0e0..b695c0eb515a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1552,10 +1552,22 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
*/
static void rcu_gp_fqs(bool first_time)
{
+ int nr_fqs = READ_ONCE(rcu_state.nr_fqs_jiffies_stall);
struct rcu_node *rnp = rcu_get_root();
WRITE_ONCE(rcu_state.gp_activity, jiffies);
WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1);
+
+ WARN_ON_ONCE(nr_fqs > 3);
+ /* Only countdown nr_fqs for stall purposes if jiffies moves. */
+ if (nr_fqs) {
+ if (nr_fqs == 1) {
+ WRITE_ONCE(rcu_state.jiffies_stall,
+ jiffies + rcu_jiffies_till_stall_check());
+ }
+ WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs);
+ }
+
if (first_time) {
/* Collect dyntick-idle snapshots. */
force_qs_rnp(dyntick_save_progress_counter);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 192536916f9a..e9821a8422db 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -386,6 +386,10 @@ struct rcu_state {
/* in jiffies. */
unsigned long jiffies_stall; /* Time at which to check */
/* for CPU stalls. */
+ int nr_fqs_jiffies_stall; /* Number of fqs loops after
+ * which read jiffies and set
+ * jiffies_stall. Stall
+ * warnings disabled if !0. */
unsigned long jiffies_resched; /* Time at which to resched */
/* a reluctant CPU. */
unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b10b8349bb2a..a2fa6b22e248 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -149,12 +149,17 @@ static void panic_on_rcu_stall(void)
/**
* rcu_cpu_stall_reset - restart stall-warning timeout for current grace period
*
+ * To perform the reset request from the caller, disable stall detection until
+ * 3 fqs loops have passed. This is required to ensure a fresh jiffies is
+ * loaded. It should be safe to do from the fqs loop as enough timer
+ * interrupts and context switches should have passed.
+ *
* The caller must disable hard irqs.
*/
void rcu_cpu_stall_reset(void)
{
- WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
+ WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3);
+ WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX);
}
//////////////////////////////////////////////////////////////////////////////
@@ -170,6 +175,7 @@ static void record_gp_stall_check_time(void)
WRITE_ONCE(rcu_state.gp_start, j);
j1 = rcu_jiffies_till_stall_check();
smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq.
+ WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0);
WRITE_ONCE(rcu_state.jiffies_stall, j + j1);
rcu_state.jiffies_resched = j + j1 / 2;
rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs);
@@ -725,6 +731,16 @@ static void check_cpu_stall(struct rcu_data *rdp)
!rcu_gp_in_progress())
return;
rcu_stall_kick_kthreads();
+
+ /*
+ * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS
+ * loop has to set jiffies to ensure a non-stale jiffies value. This
+ * is required to have good jiffies value after coming out of long
+ * breaks of jiffies updates. Not doing so can cause false positives.
+ */
+ if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0)
+ return;
+
j = jiffies;
/*
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Dear friend
my name is Mohamed Abdul please send me your what-sap number for easy
communication i have millions of dollars to invest
thanks
Mohamed Abdul
In the line 910, the value of m1 may be zero, so there is a possibility
of dividing by zero, we fixed it by checking the values before dividing
(found with SVACE). In the same way, after checking and reading the
function, we found that lines 906, 908, 912 have the same situation, so
we fixed them as well.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <deeb.rand(a)confident.ru>
---
drivers/ssb/main.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index 0a26984acb2c..e0776a16d04d 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -903,13 +903,21 @@ u32 ssb_calc_clock_rate(u32 plltype, u32 n, u32 m)
case SSB_CHIPCO_CLK_MC_BYPASS:
return clock;
case SSB_CHIPCO_CLK_MC_M1:
- return (clock / m1);
+ if (m1 != 0)
+ return (clock / m1);
+ break;
case SSB_CHIPCO_CLK_MC_M1M2:
- return (clock / (m1 * m2));
+ if ((m1 * m2) != 0)
+ return (clock / (m1 * m2));
+ break;
case SSB_CHIPCO_CLK_MC_M1M2M3:
- return (clock / (m1 * m2 * m3));
+ if ((m1 * m2 * m3) != 0)
+ return (clock / (m1 * m2 * m3));
+ break;
case SSB_CHIPCO_CLK_MC_M1M3:
- return (clock / (m1 * m3));
+ if ((m1 * m3) != 0)
+ return (clock / (m1 * m3));
+ break;
}
return 0;
case SSB_PLLTYPE_2:
--
2.34.1
v1 -> v2:
- Address the comment to reduce size of queue pointer from queue size
- Consider the data size during memcpy to avoid OOB write
- Use hweight_long() to count the setbits representing the supported codecs
v1: https://lore.kernel.org/all/1690432469-14803-1-git-send-email-quic_vgarodia…
This series primarily adds check at relevant places in venus driver where there are possible OOB
accesses due to unexpected payload from venus firmware. The patches describes the specific OOB
possibility.
Please review and share your feedback.
Vikash Garodia (4):
venus: hfi: add checks to perform sanity on queue pointers
venus: hfi: fix the check to handle session buffer requirement
venus: hfi: add checks to handle capabilities from firmware
venus: hfi_parser: Add check to keep the number of codecs within range
drivers/media/platform/qcom/venus/hfi_msgs.c | 2 +-
drivers/media/platform/qcom/venus/hfi_parser.c | 15 +++++++++++++++
drivers/media/platform/qcom/venus/hfi_venus.c | 10 ++++++++++
3 files changed, 26 insertions(+), 1 deletion(-)
--
2.7.4
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
[ Upstream commit 8183bb7e291b7818f49ea39687c2fafa01a46e27 ]
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning: GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE.
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20230707063335.13317-1-krzysztof.kozlowski@linaro…
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3399-eaidk-610.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-eaidk-610.dts b/arch/arm64/boot/dts/rockchip/rk3399-eaidk-610.dts
index d1f343345f674..6464ef4d113dd 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-eaidk-610.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-eaidk-610.dts
@@ -773,7 +773,7 @@ brcmf: wifi@1 {
compatible = "brcm,bcm4329-fmac";
reg = <1>;
interrupt-parent = <&gpio0>;
- interrupts = <RK_PA3 GPIO_ACTIVE_HIGH>;
+ interrupts = <RK_PA3 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "host-wake";
pinctrl-names = "default";
pinctrl-0 = <&wifi_host_wake_l>;
--
2.40.1
cros_ec_sensors_push_data() reads some `indio_dev` states (e.g.
iio_buffer_enabled() and `indio_dev->active_scan_mask`) without holding
the `mlock`.
An use-after-free on `indio_dev->active_scan_mask` was observed. The
call trace:
[...]
_find_next_bit
cros_ec_sensors_push_data
cros_ec_sensorhub_event
blocking_notifier_call_chain
cros_ec_irq_thread
It was caused by a race condition: one thread just freed
`active_scan_mask` at [1]; while another thread tried to access the
memory at [2].
Fix it by acquiring the `mlock` before accessing the `indio_dev` states.
[1]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/industrialio-buffe…
[2]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/common/cros_ec_sen…
Cc: stable(a)vger.kernel.org
Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO")
Signed-off-by: Tzung-Bi Shih <tzungbi(a)kernel.org>
---
drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
index b72d39fc2434..a514d0dbafc7 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
@@ -182,17 +182,20 @@ int cros_ec_sensors_push_data(struct iio_dev *indio_dev,
s16 *data,
s64 timestamp)
{
+ struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
struct cros_ec_sensors_core_state *st = iio_priv(indio_dev);
s16 *out;
s64 delta;
unsigned int i;
+ mutex_lock(&iio_dev_opaque->mlock);
+
/*
* Ignore samples if the buffer is not set: it is needed if the ODR is
* set but the buffer is not enabled yet.
*/
if (!iio_buffer_enabled(indio_dev))
- return 0;
+ goto exit;
out = (s16 *)st->samples;
for_each_set_bit(i,
@@ -210,6 +213,8 @@ int cros_ec_sensors_push_data(struct iio_dev *indio_dev,
iio_push_to_buffers_with_timestamp(indio_dev, st->samples,
timestamp + delta);
+exit:
+ mutex_unlock(&iio_dev_opaque->mlock);
return 0;
}
EXPORT_SYMBOL_GPL(cros_ec_sensors_push_data);
--
2.42.0.rc1.204.g551eb34607-goog
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 147f04b14adde831eb4a0a1e378667429732f9e8 ]
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS: 0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tick_nohz_dep_set_cpu+0x59/0x70
rcu_exp_wait_wake+0x54e/0x870
? sync_rcu_exp_select_cpus+0x1fc/0x390
process_one_work+0x1ef/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x28/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0x115/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---
This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree_exp.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index f46c0c1a5eb3..407941a2903b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -507,7 +507,10 @@ static void synchronize_rcu_expedited_wait(void)
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_disable();
+ if (cpu_online(cpu))
+ tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_enable();
}
}
j = READ_ONCE(jiffies_till_first_fqs);
--
2.42.0.rc2.253.gd59a3bf2b4-goog
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
changed pci_bridge_d3_possible() so that any vendor's PCIe ports
from modern machines (>=2015) are allowed to be put into D3.
Iain reports that USB devices can't be used to wake a Lenovo Z13
from suspend. This is because the PCIe root port has been put
into D3 and AMD's platform can't handle USB devices waking in this
case.
This behavior is only reported on Linux. Comparing the behavior
on Windows and Linux, Windows doesn't put the root ports into D3.
To fix the issue without regressing existing Intel systems,
limit the >=2015 check to only apply to Intel PCIe ports.
Cc: stable(a)vger.kernel.org
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain(a)orangesquash.org.uk>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-exte…
Reviewed-by:Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
In v14 this series has been split into 3 parts.
part A: Immediate fix for AMD issue.
part B: LPS0 export improvements
part C: Long term solution for all vendors
v13->v14:
* Reword the comment
* add tag
v12->v13:
* New patch
---
drivers/pci/pci.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0c..bfdad2eb36d13 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3037,10 +3037,15 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
return false;
/*
- * It should be safe to put PCIe ports from 2015 or newer
- * to D3.
+ * Allow Intel PCIe ports from 2015 onward to go into D3 to
+ * achieve additional energy conservation on some platforms.
+ *
+ * This is only set for Intel PCIe ports as it causes problems
+ * on both AMD Rembrandt and Phoenix platforms where USB keyboards
+ * can not be used to wake the system from suspend.
*/
- if (dmi_get_bios_year() >= 2015)
+ if (bridge->vendor == PCI_VENDOR_ID_INTEL &&
+ dmi_get_bios_year() >= 2015)
return true;
break;
}
--
2.34.1
Thanks for your response,
Kindly reconfirm your interest to further discuss the investment Thanks for your response,
partnership within your country as I explained earlier so we can have a further discussion to facilitate the process for mutual interest.
Looking forward to your response.
Hou Qijun
Vice President- CNPC
China National Petroleum Corporation
No. 9 Dongzhimen North Street Dongcheng District Beijing.
[My two cents]
stable-rc linux-6.1.y and linux-6.4.y x86 clang-nightly builds fail with
following warnings / errors.
Build errors:
--------------
drivers/net/ethernet/qlogic/qed/qed_main.c:1227:3: error: 'snprintf'
will always be truncated; specified size is 16, but format string
expands to at least 18 [-Werror,-Wfortify-source]
1227 | snprintf(name, NAME_SIZE, "slowpath-%02x:%02x.%02x",
| ^
1 error generated.
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
--
Linaro LKFT
https://lkft.linaro.org
check_clock doesn't account for vfe_lite which means that vfe_lite will
never get validated by this routine. Add the clock name to the expected set
to remediate.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
index 938f373bcd1fd..b021f81cef123 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe.c
@@ -535,7 +535,8 @@ static int vfe_check_clock_rates(struct vfe_device *vfe)
struct camss_clock *clock = &vfe->clock[i];
if (!strcmp(clock->name, "vfe0") ||
- !strcmp(clock->name, "vfe1")) {
+ !strcmp(clock->name, "vfe1") ||
+ !strcmp(clock->name, "vfe_lite")) {
u64 min_rate = 0;
unsigned long rate;
--
2.41.0
There are two problems with the current vfe_disable_output() routine.
Firstly we rightly use a spinlock to protect output->gen2.active_num
everywhere except for in the IDLE timeout path of vfe_disable_output().
Even if that is not racy "in practice" somehow it is by happenstance not
by design.
Secondly we do not get consistent behaviour from this routine. On
sc8280xp 50% of the time I get "VFE idle timeout - resetting". In this
case the subsequent capture will succeed. The other 50% of the time, we
don't hit the idle timeout, never do the VFE reset and subsequent
captures stall indefinitely.
Rewrite the vfe_disable_output() routine to
- Quiesce write masters with vfe_wm_stop()
- Set active_num = 0
remembering to hold the spinlock when we do so followed by
- Reset the VFE
Testing on sc8280xp and sdm845 shows this to be a valid fix.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
.../media/platform/qcom/camss/camss-vfe-170.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe-170.c b/drivers/media/platform/qcom/camss/camss-vfe-170.c
index 02494c89da91c..ae9137633c301 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe-170.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe-170.c
@@ -500,28 +500,15 @@ static int vfe_disable_output(struct vfe_line *line)
struct vfe_output *output = &line->output;
unsigned long flags;
unsigned int i;
- bool done;
- int timeout = 0;
-
- do {
- spin_lock_irqsave(&vfe->output_lock, flags);
- done = !output->gen2.active_num;
- spin_unlock_irqrestore(&vfe->output_lock, flags);
- usleep_range(10000, 20000);
-
- if (timeout++ == 100) {
- dev_err(vfe->camss->dev, "VFE idle timeout - resetting\n");
- vfe_reset(vfe);
- output->gen2.active_num = 0;
- return 0;
- }
- } while (!done);
spin_lock_irqsave(&vfe->output_lock, flags);
for (i = 0; i < output->wm_num; i++)
vfe_wm_stop(vfe, output->wm_idx[i]);
+ output->gen2.active_num = 0;
spin_unlock_irqrestore(&vfe->output_lock, flags);
+ vfe_reset(vfe);
+
return 0;
}
--
2.41.0
We need to make sure camss_configure_pd() happens before
camss_register_entities() as the vfe_get() path relies on the pointer
provided by camss_configure_pd().
Fix the ordering sequence in probe to ensure the pointers vfe_get() demands
are present by the time camss_register_entities() runs.
In order to facilitate backporting to stable kernels I've moved the
configure_pd() call pretty early on the probe() function so that
irrespective of the existence of the old error handling jump labels this
patch should still apply to -next circa Aug 2023 to v5.13 inclusive.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/media/platform/qcom/camss/camss.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index f11dc59135a5a..75991d849b571 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -1619,6 +1619,12 @@ static int camss_probe(struct platform_device *pdev)
if (ret < 0)
goto err_cleanup;
+ ret = camss_configure_pd(camss);
+ if (ret < 0) {
+ dev_err(dev, "Failed to configure power domains: %d\n", ret);
+ goto err_cleanup;
+ }
+
ret = camss_init_subdevices(camss);
if (ret < 0)
goto err_cleanup;
@@ -1678,12 +1684,6 @@ static int camss_probe(struct platform_device *pdev)
}
}
- ret = camss_configure_pd(camss);
- if (ret < 0) {
- dev_err(dev, "Failed to configure power domains: %d\n", ret);
- return ret;
- }
-
pm_runtime_enable(dev);
return 0;
--
2.41.0
Hi,
Looks like we missed a few backports related to MSG_RING, most likely
because they required a bit of hand massaging. Can you queue these up?
Thanks!
--
Jens Axboe
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f0362a253606e2031f8d61c74195d4d6556e12a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082642-catfight-gallantly-8b84@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f0362a253606 ("mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer")
3ecc68349bba ("memblock: rename memblock_free to memblock_phys_free")
fa27717110ae ("memblock: drop memblock_free_early_nid() and memblock_free_early()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0362a253606e2031f8d61c74195d4d6556e12a4 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)surriel.com>
Date: Thu, 17 Aug 2023 13:57:59 -0400
Subject: [PATCH] mm,ima,kexec,of: use memblock_free_late from
ima_free_kexec_buffer
The code calling ima_free_kexec_buffer runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid
that issue.
Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Cc: stable(a)kernel.org
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Suggested-by: Mike Rappoport <rppt(a)kernel.org>
Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
Signed-off-by: Rob Herring <robh(a)kernel.org>
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index f26d2ba8a371..68278340cecf 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -184,7 +184,8 @@ int __init ima_free_kexec_buffer(void)
if (ret)
return ret;
- return memblock_phys_free(addr, size);
+ memblock_free_late(addr, size);
+ return 0;
}
#endif
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 10f84c2cfb5045e37d78cb5d4c8e8321e06ae18f ]
Currently, the various torture tests sometimes react to an early-boot
bug by rebooting. This is almost always counterproductive, needlessly
consuming CPU time and bloating the console log. This commit therefore
adds the "-no-reboot" argument to qemu so that reboot requests will
cause qemu to exit.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index f4c8055dbf7a..c57be9563214 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -9,9 +9,10 @@
#
# Usage: kvm-test-1-run.sh config resdir seconds qemu-args boot_args_in
#
-# qemu-args defaults to "-enable-kvm -nographic", along with arguments
-# specifying the number of CPUs and other options
-# generated from the underlying CPU architecture.
+# qemu-args defaults to "-enable-kvm -nographic -no-reboot", along with
+# arguments specifying the number of CPUs and
+# other options generated from the underlying
+# CPU architecture.
# boot_args_in defaults to value returned by the per_version_boot_params
# shell function.
#
@@ -141,7 +142,7 @@ then
fi
# Generate -smp qemu argument.
-qemu_args="-enable-kvm -nographic $qemu_args"
+qemu_args="-enable-kvm -nographic -no-reboot $qemu_args"
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
cpu_count=`configfrag_boot_cpus "$boot_args_in" "$config_template" "$cpu_count"`
if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
--
2.42.0.rc1.204.g551eb34607-goog
--
My name is Dr.Law Kok Chung, I have very important information that I
would like to pass to you but I was wondering if you are still using
this email ID or not. Please if I reach you on this email as I am
hopeful, Please endeavor to confirm to me so I can pass the detailed
information to you.
I will be waiting for your feedback.
Thanks
Dr.Law Kok Chung
Research Assistant
CV Industrial laboratory Ltd
Dzień dobry,
jakiś czas temu zgłosiła się do nas firma, której strona internetowa nie pozycjonowała się wysoko w wyszukiwarce Google.
Na podstawie wykonanego przez nas audytu SEO zoptymalizowaliśmy treści na stronie pod kątem wcześniej opracowanych słów kluczowych. Nasz wewnętrzny system codziennie analizuje prawidłowe działanie witryny. Dzięki indywidualnej strategii, firma zdobywa coraz więcej Klientów.
Czy chcieliby Państwo zwiększyć liczbę osób odwiedzających stronę internetową firmy? Mógłbym przedstawić ofertę?
Pozdrawiam serdecznie,
Dominik Perkowski
This is the start of the stable review cycle for the 6.1.49 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 28 Aug 2023 15:46:14 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.49-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.49-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: fix to set flush_merge opt and show noflush_merge"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fix SRSO mess
-------------
Diffstat:
Makefile | 4 ++--
fs/f2fs/f2fs.h | 1 +
fs/f2fs/file.c | 5 +++++
fs/f2fs/node.c | 14 ++----------
fs/f2fs/super.c | 43 ++++++++++++------------------------
include/linux/f2fs_fs.h | 1 -
tools/objtool/arch/x86/decode.c | 11 +++++----
tools/objtool/check.c | 22 +++++++++++++++++-
tools/objtool/include/objtool/arch.h | 1 +
tools/objtool/include/objtool/elf.h | 1 +
10 files changed, 54 insertions(+), 49 deletions(-)
Hi Greg,
please revert the following two patches as 5.10.192 fails to build with
them:
asoc-intel-sof_sdw-add-quirk-for-lnl-rvp.patch
asoc-intel-sof_sdw-add-quirk-for-mtl-rvp.patch
Error message: error: ‘RT711_JD2_100K’ undeclared here (not in a function)
2023-08-26T17:46:51.3733116Z sound/soc/intel/boards/sof_sdw.c:208:41:
error: ‘RT711_JD2_100K’ undeclared here (not in a function)
2023-08-26T17:46:51.3744338Z 208 | .driver_data =
(void *)(RT711_JD2_100K),
2023-08-26T17:46:51.3745547Z |
^~~~~~~~~~~~~~
2023-08-26T17:46:51.4620173Z make[4]: *** [scripts/Makefile.build:286:
sound/soc/intel/boards/sof_sdw.o] Error 1
2023-08-26T17:46:51.4625055Z make[3]: *** [scripts/Makefile.build:503:
sound/soc/intel/boards] Error 2
2023-08-26T17:46:51.4626370Z make[2]: *** [scripts/Makefile.build:503:
sound/soc/intel] Error 2
This happened before already:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…
--
Best, Philip
This is a backport of the series that fixes the way deadline bandwidth
restoration is done which is causing noticeable delay on resume path. It also
converts the cpuset lock back into a mutex which some users on Android too.
I lack the details but AFAIU the read/write semaphore was slower on high
contention.
Compile tested against some randconfig for different archs. Only boot tested on
x86 qemu.
Based on v6.4.11
Original series:
https://lore.kernel.org/lkml/20230508075854.17215-1-juri.lelli@redhat.com/
Thanks!
--
Qais Yousef
Dietmar Eggemann (2):
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
Juri Lelli (4):
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
include/linux/cpuset.h | 12 +-
include/linux/sched.h | 4 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 244 ++++++++++++++++++++++++++--------------
kernel/sched/core.c | 41 +++----
kernel/sched/deadline.c | 67 ++++++++---
kernel/sched/sched.h | 2 +-
7 files changed, 246 insertions(+), 128 deletions(-)
--
2.34.1
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 10f84c2cfb5045e37d78cb5d4c8e8321e06ae18f ]
Currently, the various torture tests sometimes react to an early-boot
bug by rebooting. This is almost always counterproductive, needlessly
consuming CPU time and bloating the console log. This commit therefore
adds the "-no-reboot" argument to qemu so that reboot requests will
cause qemu to exit.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 6dc2b49b85ea..bdd747dc61f2 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -9,7 +9,7 @@
#
# Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args
#
-# qemu-args defaults to "-enable-kvm -nographic", along with arguments
+# qemu-args defaults to "-enable-kvm -nographic -no-reboot", along with arguments
# specifying the number of CPUs and other options
# generated from the underlying CPU architecture.
# boot_args defaults to value returned by the per_version_boot_params
@@ -132,7 +132,7 @@ then
fi
# Generate -smp qemu argument.
-qemu_args="-enable-kvm -nographic $qemu_args"
+qemu_args="-enable-kvm -nographic -no-reboot $qemu_args"
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"`
if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
--
2.42.0.rc1.204.g551eb34607-goog
I'm announcing the release of the 6.1.49 kernel.
This upgrade is only for all users of the 6.1 series that use the x86
platform OR the F2FS file system. If that's not you, feel free to
ignore this release.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
fs/f2fs/f2fs.h | 1
fs/f2fs/file.c | 5 ++++
fs/f2fs/node.c | 14 +----------
fs/f2fs/super.c | 43 +++++++++++------------------------
include/linux/f2fs_fs.h | 1
tools/objtool/arch/x86/decode.c | 11 +++++---
tools/objtool/check.c | 22 +++++++++++++++++
tools/objtool/include/objtool/arch.h | 1
tools/objtool/include/objtool/elf.h | 1
10 files changed, 53 insertions(+), 48 deletions(-)
Greg Kroah-Hartman (4):
Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"
Revert "f2fs: fix to set flush_merge opt and show noflush_merge"
Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"
Linux 6.1.49
Peter Zijlstra (1):
objtool/x86: Fix SRSO mess
Cc: stable(a)vger.kernel.org # v5.11+
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308221542.11UpkVfp-lkp@intel.com/
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
arch/x86/kernel/cpu/sgx/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..388350b8f5e3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -732,6 +732,10 @@ int arch_memory_failure(unsigned long pfn, int flags)
}
/**
+ * sgx_calc_section_metric() - Calculate an EPC section metric
+ * @low: low 32-bit word from CPUID:0x12:{2, ...}
+ * @high: high 32-bit word from CPUID:0x12:{2, ...}
+ *
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
* metric.
--
2.39.2
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 2f406263e3e954aa24c1248edcfa9be0c1bb30fa)
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index b5ffbaf616f5..6adee363a9fa 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -375,7 +375,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -447,7 +447,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
* are sure it's worth. Split it if we are only owner.
*/
if (folio_test_large(folio)) {
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
--
2.39.2
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0e0e9bd5f7b9d40fd03b70092367247d52da1db0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082614-choice-mongoose-0731@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:17 +0800
Subject: [PATCH] madvise:madvise_free_pte_range(): don't use mapcount()
against large folio for sharing check
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index 46802b4cf65a..ec30f48f8f2e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0e0e9bd5f7b9d40fd03b70092367247d52da1db0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082616-velocity-mocha-97c0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0e0e9bd5f7b9 ("madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check")
f3cd4ab0aabf ("mm/madvise: clean up pte_offset_map_lock() scans")
07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
fd3b1bc3c86e ("mm/madvise: fix madvise_pageout for private file mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:17 +0800
Subject: [PATCH] madvise:madvise_free_pte_range(): don't use mapcount()
against large folio for sharing check
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index 46802b4cf65a..ec30f48f8f2e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 56b930dcd88c2adc261410501c402c790980bdb5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023081222-chummy-aqueduct-85c2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
56b930dcd88c ("hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report")
19692f17cd13 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream XT")
866e630a3b8b ("hwmon: (aquacomputer_d5next) Add temperature offset control for Aquaero")
6c83ccb10c49 ("hwmon: (aquacomputer_d5next) Add infrastructure for Aquaero control reports")
b29090bac935 ("hwmon: (aquacomputer_d5next) Device dependent control report settings")
7505dab78f58 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream Ultimate")
e0f6c370f0ad ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Poweradjust 3")
3d2e9f582a8e ("hwmon: (aquacomputer_d5next) Add support for reading calculated Aquaero sensors")
2c55211104b4 ("hwmon: (aquacomputer_d5next) Support sensors for Aquacomputer Aquaero")
ad2f0811fbeb ("hwmon: (aquacomputer_d5next) Device dependent serial number and firmware offsets")
249c752110a5 ("hwmon: (aquacomputer_d5next) Add structure for fan layout")
8bcb02bdc638 ("hwmon: (aquacomputer_d5next) Rename AQC_TEMP_SENSOR_SIZE to AQC_SENSOR_SIZE")
6ff838f2877d ("hwmon: (aquacomputer_d5next) Add support for Quadro flow sensor pulses")
d5d896b83822 ("hwmon: (aquacomputer_d5next) Clear up macros and comments")
662d20b3a5af ("hwmon: (aquacomputer_d5next) Add support for temperature sensor offsets")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56b930dcd88c2adc261410501c402c790980bdb5 Mon Sep 17 00:00:00 2001
From: Aleksa Savic <savicaleksa83(a)gmail.com>
Date: Mon, 7 Aug 2023 19:20:03 +0200
Subject: [PATCH] hwmon: (aquacomputer_d5next) Add selective 200ms delay after
sending ctrl report
Add a 200ms delay after sending a ctrl report to Quadro,
Octo, D5 Next and Aquaero to give them enough time to
process the request and save the data to memory. Otherwise,
under heavier userspace loads where multiple sysfs entries
are usually set in quick succession, a new ctrl report could
be requested from the device while it's still processing the
previous one and fail with -EPIPE. The delay is only applied
if two ctrl report operations are near each other in time.
Reported by a user on Github [1] and tested by both of us.
[1] https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82
Fixes: 752b927951ea ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Octo")
Signed-off-by: Aleksa Savic <savicaleksa83(a)gmail.com>
Link: https://lore.kernel.org/r/20230807172004.456968-1-savicaleksa83@gmail.com
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
diff --git a/drivers/hwmon/aquacomputer_d5next.c b/drivers/hwmon/aquacomputer_d5next.c
index a997dbcb563f..023807859be7 100644
--- a/drivers/hwmon/aquacomputer_d5next.c
+++ b/drivers/hwmon/aquacomputer_d5next.c
@@ -13,9 +13,11 @@
#include <linux/crc16.h>
#include <linux/debugfs.h>
+#include <linux/delay.h>
#include <linux/hid.h>
#include <linux/hwmon.h>
#include <linux/jiffies.h>
+#include <linux/ktime.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/seq_file.h>
@@ -63,6 +65,8 @@ static const char *const aqc_device_names[] = {
#define CTRL_REPORT_ID 0x03
#define AQUAERO_CTRL_REPORT_ID 0x0b
+#define CTRL_REPORT_DELAY 200 /* ms */
+
/* The HID report that the official software always sends
* after writing values, currently same for all devices
*/
@@ -527,6 +531,9 @@ struct aqc_data {
int secondary_ctrl_report_size;
u8 *secondary_ctrl_report;
+ ktime_t last_ctrl_report_op;
+ int ctrl_report_delay; /* Delay between two ctrl report operations, in ms */
+
int buffer_size;
u8 *buffer;
int checksum_start;
@@ -611,17 +618,35 @@ static int aqc_aquastreamxt_convert_fan_rpm(u16 val)
return 0;
}
+static void aqc_delay_ctrl_report(struct aqc_data *priv)
+{
+ /*
+ * If previous read or write is too close to this one, delay the current operation
+ * to give the device enough time to process the previous one.
+ */
+ if (priv->ctrl_report_delay) {
+ s64 delta = ktime_ms_delta(ktime_get(), priv->last_ctrl_report_op);
+
+ if (delta < priv->ctrl_report_delay)
+ msleep(priv->ctrl_report_delay - delta);
+ }
+}
+
/* Expects the mutex to be locked */
static int aqc_get_ctrl_data(struct aqc_data *priv)
{
int ret;
+ aqc_delay_ctrl_report(priv);
+
memset(priv->buffer, 0x00, priv->buffer_size);
ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size,
HID_FEATURE_REPORT, HID_REQ_GET_REPORT);
if (ret < 0)
ret = -ENODATA;
+ priv->last_ctrl_report_op = ktime_get();
+
return ret;
}
@@ -631,6 +656,8 @@ static int aqc_send_ctrl_data(struct aqc_data *priv)
int ret;
u16 checksum;
+ aqc_delay_ctrl_report(priv);
+
/* Checksum is not needed for Aquaero */
if (priv->kind != aquaero) {
/* Init and xorout value for CRC-16/USB is 0xffff */
@@ -646,12 +673,16 @@ static int aqc_send_ctrl_data(struct aqc_data *priv)
ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size,
HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
if (ret < 0)
- return ret;
+ goto record_access_and_ret;
/* The official software sends this report after every change, so do it here as well */
ret = hid_hw_raw_request(priv->hdev, priv->secondary_ctrl_report_id,
priv->secondary_ctrl_report, priv->secondary_ctrl_report_size,
HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
+
+record_access_and_ret:
+ priv->last_ctrl_report_op = ktime_get();
+
return ret;
}
@@ -1524,6 +1555,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->buffer_size = AQUAERO_CTRL_REPORT_SIZE;
priv->temp_ctrl_offset = AQUAERO_TEMP_CTRL_OFFSET;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->temp_label = label_temp_sensors;
priv->virtual_temp_label = label_virtual_temp_sensors;
@@ -1547,6 +1579,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = D5NEXT_TEMP_CTRL_OFFSET;
priv->buffer_size = D5NEXT_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = D5NEXT_POWER_CYCLES;
@@ -1597,6 +1630,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = OCTO_TEMP_CTRL_OFFSET;
priv->buffer_size = OCTO_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = OCTO_POWER_CYCLES;
@@ -1624,6 +1658,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = QUADRO_TEMP_CTRL_OFFSET;
priv->buffer_size = QUADRO_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->flow_pulses_ctrl_offset = QUADRO_FLOW_PULSES_CTRL_OFFSET;
priv->power_cycle_count_offset = QUADRO_POWER_CYCLES;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c275a176e4b69868576e543409927ae75e3a3288
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082737-cavity-bloating-1779@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c275a176e4b6 ("can: raw: add missing refcount for memory leak fix")
ee8b94c8510c ("can: raw: fix receiver memory leak")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c275a176e4b69868576e543409927ae75e3a3288 Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Mon, 21 Aug 2023 16:45:47 +0200
Subject: [PATCH] can: raw: add missing refcount for memory leak fix
Commit ee8b94c8510c ("can: raw: fix receiver memory leak") introduced
a new reference to the CAN netdevice that has assigned CAN filters.
But this new ro->dev reference did not maintain its own refcount which
lead to another KASAN use-after-free splat found by Eric Dumazet.
This patch ensures a proper refcount for the CAN nedevice.
Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak")
Reported-by: Eric Dumazet <edumazet(a)google.com>
Cc: Ziyang Xuan <william.xuanziyang(a)huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/can/raw.c b/net/can/raw.c
index e10f59375659..d50c3f3d892f 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -85,6 +85,7 @@ struct raw_sock {
int bound;
int ifindex;
struct net_device *dev;
+ netdevice_tracker dev_tracker;
struct list_head notifier;
int loopback;
int recv_own_msgs;
@@ -285,8 +286,10 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg,
case NETDEV_UNREGISTER:
lock_sock(sk);
/* remove current filters & unregister */
- if (ro->bound)
+ if (ro->bound) {
raw_disable_allfilters(dev_net(dev), dev, sk);
+ netdev_put(dev, &ro->dev_tracker);
+ }
if (ro->count > 1)
kfree(ro->filter);
@@ -391,10 +394,12 @@ static int raw_release(struct socket *sock)
/* remove current filters & unregister */
if (ro->bound) {
- if (ro->dev)
+ if (ro->dev) {
raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk);
- else
+ netdev_put(ro->dev, &ro->dev_tracker);
+ } else {
raw_disable_allfilters(sock_net(sk), NULL, sk);
+ }
}
if (ro->count > 1)
@@ -445,10 +450,10 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
goto out;
}
if (dev->type != ARPHRD_CAN) {
- dev_put(dev);
err = -ENODEV;
- goto out;
+ goto out_put_dev;
}
+
if (!(dev->flags & IFF_UP))
notify_enetdown = 1;
@@ -456,7 +461,9 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
/* filters set by default/setsockopt */
err = raw_enable_allfilters(sock_net(sk), dev, sk);
- dev_put(dev);
+ if (err)
+ goto out_put_dev;
+
} else {
ifindex = 0;
@@ -467,18 +474,28 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
if (!err) {
if (ro->bound) {
/* unregister old filters */
- if (ro->dev)
+ if (ro->dev) {
raw_disable_allfilters(dev_net(ro->dev),
ro->dev, sk);
- else
+ /* drop reference to old ro->dev */
+ netdev_put(ro->dev, &ro->dev_tracker);
+ } else {
raw_disable_allfilters(sock_net(sk), NULL, sk);
+ }
}
ro->ifindex = ifindex;
ro->bound = 1;
+ /* bind() ok -> hold a reference for new ro->dev */
ro->dev = dev;
+ if (ro->dev)
+ netdev_hold(ro->dev, &ro->dev_tracker, GFP_KERNEL);
}
- out:
+out_put_dev:
+ /* remove potential reference from dev_get_by_index() */
+ if (dev)
+ dev_put(dev);
+out:
release_sock(sk);
rtnl_unlock();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082724-deflate-drinkable-54a1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082722-rocking-unbaked-1baf@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082723-bribe-sporty-3c8c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c4d6b5438116c184027b2e911c0f2c7c406fb47c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082711-crown-acuteness-453a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c4d6b5438116 ("tracing/synthetic: Allocate one additional element for size")
fc1a9dc10129 ("tracing/histogram: Don't use strlen to find length of stacktrace variables")
288709c9f3b0 ("tracing: Allow stacktraces to be saved as histogram variables")
5f2e094ed259 ("tracing: Allow multiple hitcount values in histograms")
0934ae9977c2 ("tracing: Fix reading strings from synthetic events")
86087383ec0a ("tracing/hist: Call hist functions directly via a switch statement")
05770dd0ad11 ("tracing: Support __rel_loc relative dynamic data location attribute")
938aa33f1465 ("tracing: Add length protection to histogram string copies")
63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
8b5d46fd7a38 ("tracing/histogram: Optimize division by constants")
f47716b7a955 ("tracing/histogram: Covert expr to const if both operands are constants")
c5eac6ee8bc5 ("tracing/histogram: Simplify handling of .sym-offset in expressions")
9710b2f341a0 ("tracing: Fix operator precedence for hist triggers expression")
bcef04415032 ("tracing: Add division and multiplication support for hist triggers")
52cfb373536a ("tracing: Add support for creating hist trigger variables from literal")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d6b5438116c184027b2e911c0f2c7c406fb47c Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:28 +0200
Subject: [PATCH] tracing/synthetic: Allocate one additional element for size
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 80a2a832f857..9897d0bfcab7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data,
str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) {
- len = *((unsigned long *)str_val);
+ /* reserve one extra element for size */
+ len = *((unsigned long *)str_val) + 1;
len *= sizeof(unsigned long);
} else {
len = fetch_store_strlen((unsigned long)str_val);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c4d6b5438116c184027b2e911c0f2c7c406fb47c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082710-twisty-automatic-966e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c4d6b5438116 ("tracing/synthetic: Allocate one additional element for size")
fc1a9dc10129 ("tracing/histogram: Don't use strlen to find length of stacktrace variables")
288709c9f3b0 ("tracing: Allow stacktraces to be saved as histogram variables")
5f2e094ed259 ("tracing: Allow multiple hitcount values in histograms")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d6b5438116c184027b2e911c0f2c7c406fb47c Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:28 +0200
Subject: [PATCH] tracing/synthetic: Allocate one additional element for size
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 80a2a832f857..9897d0bfcab7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data,
str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) {
- len = *((unsigned long *)str_val);
+ /* reserve one extra element for size */
+ len = *((unsigned long *)str_val) + 1;
len *= sizeof(unsigned long);
} else {
len = fetch_store_strlen((unsigned long)str_val);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 887f92e09ef34a949745ad26ce82be69e2dabcf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082700-clamp-coerce-0153@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
887f92e09ef3 ("tracing/synthetic: Skip first entry for stack traces")
ddeea494a16f ("tracing/synthetic: Use union instead of casts")
116b41162f8b ("Merge tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 887f92e09ef34a949745ad26ce82be69e2dabcf6 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:27 +0200
Subject: [PATCH] tracing/synthetic: Skip first entry for stack traces
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..80a2a832f857 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
- unsigned int i, n_u64;
+ unsigned int i, j, n_u64;
char print_fmt[32];
const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- unsigned long *p, *end;
union trace_synth_field *data = &entry->fields[n_u64];
-
- p = (void *)entry + data->as_dynamic.offset;
- end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
+ unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
-
- for (; *p && p < end; p++)
- trace_seq_printf(s, "=> %pS\n", (void *)*p);
+ for (j = 1; j < data->as_dynamic.len / sizeof(long); j++)
+ trace_seq_printf(s, "=> %pS\n", (void *)p[j]);
n_u64++;
-
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
break;
}
- /* Include the zero'd element if it fits */
- if (len < HIST_STACKTRACE_DEPTH)
- len++;
-
len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 887f92e09ef34a949745ad26ce82be69e2dabcf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082759-esquire-online-0814@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
887f92e09ef3 ("tracing/synthetic: Skip first entry for stack traces")
ddeea494a16f ("tracing/synthetic: Use union instead of casts")
116b41162f8b ("Merge tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 887f92e09ef34a949745ad26ce82be69e2dabcf6 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:27 +0200
Subject: [PATCH] tracing/synthetic: Skip first entry for stack traces
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..80a2a832f857 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
- unsigned int i, n_u64;
+ unsigned int i, j, n_u64;
char print_fmt[32];
const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- unsigned long *p, *end;
union trace_synth_field *data = &entry->fields[n_u64];
-
- p = (void *)entry + data->as_dynamic.offset;
- end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
+ unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
-
- for (; *p && p < end; p++)
- trace_seq_printf(s, "=> %pS\n", (void *)*p);
+ for (j = 1; j < data->as_dynamic.len / sizeof(long); j++)
+ trace_seq_printf(s, "=> %pS\n", (void *)p[j]);
n_u64++;
-
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
break;
}
- /* Include the zero'd element if it fits */
- if (len < HIST_STACKTRACE_DEPTH)
- len++;
-
len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
From: Pietro Borrello <borrello(a)diag.uniroma1.it>
commit 7c4a5b89a0b5a57a64b601775b296abf77a9fe97 upstream.
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()")
removed any path which could make pick_next_rt_entity() return NULL.
However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of
pick_next_rt_entity()) still checks the error condition, which can
never happen, since list_entry() never returns NULL.
Remove the BUG_ON check, and instead emit a warning in the only
possible error condition here: the queue being empty which should
never happen.
Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()")
Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Phil Auld <pauld(a)redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Srish: Fixes CVE-2023-1077: sched/rt: pick_next_rt_entity(): check list_entry
An insufficient list empty checking in pick_next_rt_entity().
The _pick_next_task_rt() checks pick_next_rt_entity() returns
NULL or not but pick_next_rt_entity() never returns NULL.
So, even if the list is empty, _pick_next_task_rt() continues
its process.]
Signed-off-by: Srish Srinivasan <ssrish(a)vmware.com>
---
kernel/sched/rt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9c6c3572b..394c66442 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1522,6 +1522,8 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
BUG_ON(idx >= MAX_RT_PRIO);
queue = array->queue + idx;
+ if (SCHED_WARN_ON(list_empty(queue)))
+ return NULL;
next = list_entry(queue->next, struct sched_rt_entity, run_list);
return next;
@@ -1535,7 +1537,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
do {
rt_se = pick_next_rt_entity(rq, rt_rq);
- BUG_ON(!rt_se);
+ if (unlikely(!rt_se))
+ return NULL;
rt_rq = group_rt_rq(rt_se);
} while (rt_rq);
--
2.35.6
commit 9c7c4bc986932218fd0df9d2a100509772028fb1 upstream
sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE,
so not necessary to check IO_URING_F_SQE128.
With this change, we get chance to save half SQ ring memory.
Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f48d213fb65e..09d29fa53939 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1271,9 +1271,6 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
__func__, cmd->cmd_op, ub_cmd->q_id, tag,
ub_cmd->result);
- if (!(issue_flags & IO_URING_F_SQE128))
- goto out;
-
if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues)
goto out;
--
2.40.1
From: Sanjay R Mehta <sanju.mehta(a)amd.com>
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
(cherry picked from commit 583893a66d731f5da010a3fa38a0460e05f0149b)
USB4v2 introduced support for uni-directional TMU mode as part of
d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode")
This is not a stable candidate commit, so adjust the code for backport.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/thunderbolt/tmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/thunderbolt/tmu.c b/drivers/thunderbolt/tmu.c
index 626aca3124b1..d9544600b386 100644
--- a/drivers/thunderbolt/tmu.c
+++ b/drivers/thunderbolt/tmu.c
@@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_switch *sw)
* uni-directional mode and we don't want to change it's TMU
* mode.
*/
- tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ return ret;
tb_port_tmu_time_sync_disable(up);
ret = tb_port_tmu_time_sync_disable(down);
--
2.34.1
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9d3de7ee192a6a253f475197fe4d2e2af10a731f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072413-glamorous-unjustly-bb12@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d3de7ee192a6a253f475197fe4d2e2af10a731f Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Sat, 22 Jul 2023 22:45:24 +0530
Subject: [PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
During allocations, while looking for preallocations(PA) in the per
inode rbtree, we can't do a direct traversal of the tree because
ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
and that can cause direct traversal to skip some entries. This was
leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
our request and ultimately tried to create a new PA that would overlap
with the missed one.
To makes sure we handle that case while still keeping the performance of
the rbtree, we make use of the fact that the only pa that could possibly
overlap the original goal start is the one that satisfies the below
conditions:
1. It must have it's logical start immediately to the left of
(ie less than) original logical start.
2. It must not be deleted
To find this pa we use the following traversal method:
1. Descend into the rbtree normally to find the immediate neighboring
PA. Here we keep descending irrespective of if the PA is deleted or if
it overlaps with our request etc. The goal is to find an immediately
adjacent PA.
2. If the found PA is on right of original goal, use rb_prev() to find
the left adjacent PA.
3. Check if this PA is deleted and keep moving left with rb_prev() until
a non deleted PA is found.
4. This is the PA we are looking for. Now we can check if it can satisfy
the original request and proceed accordingly.
This approach also takes care of having deleted PAs in the tree.
(While we are at it, also fix a possible overflow bug in calculating the
end of a PA)
[1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+t…
Cc: stable(a)kernel.org # 6.4
Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Reviewed-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com
Tested-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com
Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.16900459…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 456150ef6111..21b903fe546e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4765,8 +4765,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
int order, i;
struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
struct ext4_locality_group *lg;
- struct ext4_prealloc_space *tmp_pa, *cpa = NULL;
- ext4_lblk_t tmp_pa_start, tmp_pa_end;
+ struct ext4_prealloc_space *tmp_pa = NULL, *cpa = NULL;
+ loff_t tmp_pa_end;
struct rb_node *iter;
ext4_fsblk_t goal_block;
@@ -4774,47 +4774,151 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
return false;
- /* first, try per-file preallocation */
+ /*
+ * first, try per-file preallocation by searching the inode pa rbtree.
+ *
+ * Here, we can't do a direct traversal of the tree because
+ * ext4_mb_discard_group_preallocation() can paralelly mark the pa
+ * deleted and that can cause direct traversal to skip some entries.
+ */
read_lock(&ei->i_prealloc_lock);
+
+ if (RB_EMPTY_ROOT(&ei->i_prealloc_node)) {
+ goto try_group_pa;
+ }
+
+ /*
+ * Step 1: Find a pa with logical start immediately adjacent to the
+ * original logical start. This could be on the left or right.
+ *
+ * (tmp_pa->pa_lstart never changes so we can skip locking for it).
+ */
for (iter = ei->i_prealloc_node.rb_node; iter;
iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical,
- tmp_pa_start, iter)) {
+ tmp_pa->pa_lstart, iter)) {
tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
pa_node.inode_node);
+ }
- /* all fields in this condition don't change,
- * so we can skip locking for them */
- tmp_pa_start = tmp_pa->pa_lstart;
- tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
+ /*
+ * Step 2: The adjacent pa might be to the right of logical start, find
+ * the left adjacent pa. After this step we'd have a valid tmp_pa whose
+ * logical start is towards the left of original request's logical start
+ */
+ if (tmp_pa->pa_lstart > ac->ac_o_ex.fe_logical) {
+ struct rb_node *tmp;
+ tmp = rb_prev(&tmp_pa->pa_node.inode_node);
- /* original request start doesn't lie in this PA */
- if (ac->ac_o_ex.fe_logical < tmp_pa_start ||
- ac->ac_o_ex.fe_logical >= tmp_pa_end)
- continue;
-
- /* non-extent files can't have physical blocks past 2^32 */
- if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
- (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
- EXT4_MAX_BLOCK_FILE_PHYS)) {
+ if (tmp) {
+ tmp_pa = rb_entry(tmp, struct ext4_prealloc_space,
+ pa_node.inode_node);
+ } else {
/*
- * Since PAs don't overlap, we won't find any
- * other PA to satisfy this.
+ * If there is no adjacent pa to the left then finding
+ * an overlapping pa is not possible hence stop searching
+ * inode pa tree
+ */
+ goto try_group_pa;
+ }
+ }
+
+ BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
+
+ /*
+ * Step 3: If the left adjacent pa is deleted, keep moving left to find
+ * the first non deleted adjacent pa. After this step we should have a
+ * valid tmp_pa which is guaranteed to be non deleted.
+ */
+ for (iter = &tmp_pa->pa_node.inode_node;; iter = rb_prev(iter)) {
+ if (!iter) {
+ /*
+ * no non deleted left adjacent pa, so stop searching
+ * inode pa tree
+ */
+ goto try_group_pa;
+ }
+ tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
+ pa_node.inode_node);
+ spin_lock(&tmp_pa->pa_lock);
+ if (tmp_pa->pa_deleted == 0) {
+ /*
+ * We will keep holding the pa_lock from
+ * this point on because we don't want group discard
+ * to delete this pa underneath us. Since group
+ * discard is anyways an ENOSPC operation it
+ * should be okay for it to wait a few more cycles.
*/
break;
- }
-
- /* found preallocated blocks, use them */
- spin_lock(&tmp_pa->pa_lock);
- if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free &&
- likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
- atomic_inc(&tmp_pa->pa_count);
- ext4_mb_use_inode_pa(ac, tmp_pa);
+ } else {
spin_unlock(&tmp_pa->pa_lock);
- read_unlock(&ei->i_prealloc_lock);
- return true;
}
- spin_unlock(&tmp_pa->pa_lock);
}
+
+ BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
+ BUG_ON(tmp_pa->pa_deleted == 1);
+
+ /*
+ * Step 4: We now have the non deleted left adjacent pa. Only this
+ * pa can possibly satisfy the request hence check if it overlaps
+ * original logical start and stop searching if it doesn't.
+ */
+ tmp_pa_end = (loff_t)tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
+
+ if (ac->ac_o_ex.fe_logical >= tmp_pa_end) {
+ spin_unlock(&tmp_pa->pa_lock);
+ goto try_group_pa;
+ }
+
+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
+ (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
+ EXT4_MAX_BLOCK_FILE_PHYS)) {
+ /*
+ * Since PAs don't overlap, we won't find any other PA to
+ * satisfy this.
+ */
+ spin_unlock(&tmp_pa->pa_lock);
+ goto try_group_pa;
+ }
+
+ if (tmp_pa->pa_free && likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
+ atomic_inc(&tmp_pa->pa_count);
+ ext4_mb_use_inode_pa(ac, tmp_pa);
+ spin_unlock(&tmp_pa->pa_lock);
+ read_unlock(&ei->i_prealloc_lock);
+ return true;
+ } else {
+ /*
+ * We found a valid overlapping pa but couldn't use it because
+ * it had no free blocks. This should ideally never happen
+ * because:
+ *
+ * 1. When a new inode pa is added to rbtree it must have
+ * pa_free > 0 since otherwise we won't actually need
+ * preallocation.
+ *
+ * 2. An inode pa that is in the rbtree can only have it's
+ * pa_free become zero when another thread calls:
+ * ext4_mb_new_blocks
+ * ext4_mb_use_preallocated
+ * ext4_mb_use_inode_pa
+ *
+ * 3. Further, after the above calls make pa_free == 0, we will
+ * immediately remove it from the rbtree in:
+ * ext4_mb_new_blocks
+ * ext4_mb_release_context
+ * ext4_mb_put_pa
+ *
+ * 4. Since the pa_free becoming 0 and pa_free getting removed
+ * from tree both happen in ext4_mb_new_blocks, which is always
+ * called with i_data_sem held for data allocations, we can be
+ * sure that another process will never see a pa in rbtree with
+ * pa_free == 0.
+ */
+ WARN_ON_ONCE(tmp_pa->pa_free == 0);
+ }
+ spin_unlock(&tmp_pa->pa_lock);
+try_group_pa:
read_unlock(&ei->i_prealloc_lock);
/* can we use group allocation? */
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072456-starting-gauging-768c@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082118-donation-clench-604d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4b430d4ac997 ("mmc: block: Fix in_flight[issue_type] value error")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5 Mon Sep 17 00:00:00 2001
From: Yibin Ding <yibin.ding(a)unisoc.com>
Date: Wed, 2 Aug 2023 10:30:23 +0800
Subject: [PATCH] mmc: block: Fix in_flight[issue_type] value error
For a completed request, after the mmc_blk_mq_complete_rq(mq, req)
function is executed, the bitmap_tags corresponding to the
request will be cleared, that is, the request will be regarded as
idle. If the request is acquired by a different type of process at
this time, the issue_type of the request may change. It further
caused the value of mq->in_flight[issue_type] to be abnormal,
and a large number of requests could not be sent.
p1: p2:
mmc_blk_mq_complete_rq
blk_mq_free_request
blk_mq_get_request
blk_mq_rq_ctx_init
mmc_blk_mq_dec_in_flight
mmc_issue_type(mq, req)
This strategy can ensure the consistency of issue_type
before and after executing mmc_blk_mq_complete_rq.
Fixes: 81196976ed94 ("mmc: block: Add blk-mq support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yibin Ding <yibin.ding(a)unisoc.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20230802023023.1318134-1-yunlong.xing@unisoc.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f701efb1fa78..b6f4be25b31b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2097,14 +2097,14 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
mmc_blk_urgent_bkops(mq, mqrq);
}
-static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, enum mmc_issue_type issue_type)
{
unsigned long flags;
bool put_card;
spin_lock_irqsave(&mq->lock, flags);
- mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+ mq->in_flight[issue_type] -= 1;
put_card = (mmc_tot_in_flight(mq) == 0);
@@ -2117,6 +2117,7 @@ static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req)
static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req,
bool can_sleep)
{
+ enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_request *mrq = &mqrq->brq.mrq;
struct mmc_host *host = mq->card->host;
@@ -2136,7 +2137,7 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req,
blk_mq_complete_request(req);
}
- mmc_blk_mq_dec_in_flight(mq, req);
+ mmc_blk_mq_dec_in_flight(mq, issue_type);
}
void mmc_blk_mq_recovery(struct mmc_queue *mq)
We observed a 35% of regression running phoronix pts/ramspeed and also 16%
with unixbench. Regression is caused by the following commit:
dd0f194cfeb5 | mm: rewrite wait_on_page_bit_common() logic
Backporting this fixes the regression (this is already in 5.9+):
- 5ef64cc8987a mm: allow a controlled amount of unfairness in the page lock
Linus Torvalds (1):
mm: allow a controlled amount of unfairness in the page lock
include/linux/mm.h | 2 +
include/linux/wait.h | 2 +
kernel/sysctl.c | 8 +++
mm/filemap.c | 160 ++++++++++++++++++++++++++++++++++---------
4 files changed, 141 insertions(+), 31 deletions(-)
--
2.41.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080742-ion-implement-ceb1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory")
506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers")
ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api")
f6c466b84cfa ("drm/i915: Add support for moving fence waiting")
544460c33821 ("drm/i915: Multi-BB execbuf")
5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc")
e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc")
bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship")
6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission")
99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions")
c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship")
4f3059dc2dbb ("drm/i915: Add logical engine mapping")
1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context")
0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct")
0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend")
c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume")
0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects")
68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001
From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Date: Thu, 20 Jul 2023 11:35:44 +0200
Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus…
(cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 8ef93889061a..5ec293011d99 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
}
} while (unlikely(is_barrier(active)));
- if (!__i915_active_fence_set(active, fence))
+ fence = __i915_active_fence_set(active, fence);
+ if (!fence)
__i915_active_acquire(ref);
+ else
+ dma_fence_put(fence);
out:
i915_active_release(ref);
@@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref,
return NULL;
}
- rcu_read_lock();
prev = __i915_active_fence_set(active, fence);
- if (prev)
- prev = dma_fence_get_rcu(prev);
- else
+ if (!prev)
__i915_active_acquire(ref);
- rcu_read_unlock();
return prev;
}
@@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq)
*
* Records the new @fence as the last active fence along its timeline in
* this active tracker, moving the tracking callbacks from the previous
- * fence onto this one. Returns the previous fence (if not already completed),
- * which the caller must ensure is executed before the new fence. To ensure
- * that the order of fences within the timeline of the i915_active_fence is
- * understood, it should be locked by the caller.
+ * fence onto this one. Gets and returns a reference to the previous fence
+ * (if not already completed), which the caller must put after making sure
+ * that it is executed before the new fence. To ensure that the order of
+ * fences within the timeline of the i915_active_fence is understood, it
+ * should be locked by the caller.
*/
struct dma_fence *
__i915_active_fence_set(struct i915_active_fence *active,
@@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active,
struct dma_fence *prev;
unsigned long flags;
- if (fence == rcu_access_pointer(active->fence))
+ /*
+ * In case of fences embedded in i915_requests, their memory is
+ * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release
+ * by new requests. Then, there is a risk of passing back a pointer
+ * to a new, completely unrelated fence that reuses the same memory
+ * while tracked under a different active tracker. Combined with i915
+ * perf open/close operations that build await dependencies between
+ * engine kernel context requests and user requests from different
+ * timelines, this can lead to dependency loops and infinite waits.
+ *
+ * As a countermeasure, we try to get a reference to the active->fence
+ * first, so if we succeed and pass it back to our user then it is not
+ * released and potentially reused by an unrelated request before the
+ * user has a chance to set up an await dependency on it.
+ */
+ prev = i915_active_fence_get(active);
+ if (fence == prev)
return fence;
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
@@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active,
* Consider that we have two threads arriving (A and B), with
* C already resident as the active->fence.
*
- * A does the xchg first, and so it sees C or NULL depending
- * on the timing of the interrupt handler. If it is NULL, the
- * previous fence must have been signaled and we know that
- * we are first on the timeline. If it is still present,
- * we acquire the lock on that fence and serialise with the interrupt
- * handler, in the process removing it from any future interrupt
- * callback. A will then wait on C before executing (if present).
- *
- * As B is second, it sees A as the previous fence and so waits for
- * it to complete its transition and takes over the occupancy for
- * itself -- remembering that it needs to wait on A before executing.
+ * Both A and B have got a reference to C or NULL, depending on the
+ * timing of the interrupt handler. Let's assume that if A has got C
+ * then it has locked C first (before B).
*
* Note the strong ordering of the timeline also provides consistent
* nesting rules for the fence->lock; the inner lock is always the
* older lock.
*/
spin_lock_irqsave(fence->lock, flags);
- prev = xchg(__active_fence_slot(active), fence);
- if (prev) {
- GEM_BUG_ON(prev == fence);
+ if (prev)
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+
+ /*
+ * A does the cmpxchg first, and so it sees C or NULL, as before, or
+ * something else, depending on the timing of other threads and/or
+ * interrupt handler. If not the same as before then A unlocks C if
+ * applicable and retries, starting from an attempt to get a new
+ * active->fence. Meanwhile, B follows the same path as A.
+ * Once A succeeds with cmpxch, B fails again, retires, gets A from
+ * active->fence, locks it as soon as A completes, and possibly
+ * succeeds with cmpxchg.
+ */
+ while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
+ if (prev) {
+ spin_unlock(prev->lock);
+ dma_fence_put(prev);
+ }
+ spin_unlock_irqrestore(fence->lock, flags);
+
+ prev = i915_active_fence_get(active);
+ GEM_BUG_ON(prev == fence);
+
+ spin_lock_irqsave(fence->lock, flags);
+ if (prev)
+ spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+ }
+
+ /*
+ * If prev is NULL then the previous fence must have been signaled
+ * and we know that we are first on the timeline. If it is still
+ * present then, having the lock on that fence already acquired, we
+ * serialise with the interrupt handler, in the process of removing it
+ * from any future interrupt callback. A will then wait on C before
+ * executing (if present).
+ *
+ * As B is second, it sees A as the previous fence and so waits for
+ * it to complete its transition and takes over the occupancy for
+ * itself -- remembering that it needs to wait on A before executing.
+ */
+ if (prev) {
__list_del_entry(&active->cb.node);
spin_unlock(prev->lock); /* serialise with prev->cb_list */
}
@@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active,
int err = 0;
/* Must maintain timeline ordering wrt previous active requests */
- rcu_read_lock();
fence = __i915_active_fence_set(active, &rq->fence);
- if (fence) /* but the previous fence may not belong to that timeline! */
- fence = dma_fence_get_rcu(fence);
- rcu_read_unlock();
if (fence) {
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 894068bb37b6..833b73edefdb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq,
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
+ /*
+ * Users have to put a reference potentially got by
+ * __i915_active_fence_set() to the returned request
+ * when no longer needed
+ */
return to_request(__i915_active_fence_set(&timeline->last_request,
&rq->fence));
}
@@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq,
0);
}
+ /*
+ * Users have to put the reference to prev potentially got
+ * by __i915_active_fence_set() when no longer needed
+ */
return prev;
}
@@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
prev = __i915_request_ensure_ordering(rq, timeline);
else
prev = __i915_request_ensure_parallel_ordering(rq, timeline);
+ if (prev)
+ i915_request_put(prev);
/*
* Make sure that no request gazumped us - if it was allocated after
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 656f9aec07dba7c61d469727494a5d1b18d0bef4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082705-predator-enjoyable-15fb@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 656f9aec07dba7c61d469727494a5d1b18d0bef4 Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Sat, 26 Aug 2023 22:21:57 +0800
Subject: [PATCH] LoongArch: Ensure FP/SIMD registers in the core dump file is
up to date
This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index b541f6248837..c2d8962fda00 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -173,16 +173,30 @@ static inline void restore_fp(struct task_struct *tsk)
_restore_fp(&tsk->thread.fpu);
}
-static inline union fpureg *get_fpu_regs(struct task_struct *tsk)
+static inline void save_fpu_regs(struct task_struct *tsk)
{
+ unsigned int euen;
+
if (tsk == current) {
preempt_disable();
- if (is_fpu_owner())
+
+ euen = csr_read32(LOONGARCH_CSR_EUEN);
+
+#ifdef CONFIG_CPU_HAS_LASX
+ if (euen & CSR_EUEN_LASXEN)
+ _save_lasx(¤t->thread.fpu);
+ else
+#endif
+#ifdef CONFIG_CPU_HAS_LSX
+ if (euen & CSR_EUEN_LSXEN)
+ _save_lsx(¤t->thread.fpu);
+ else
+#endif
+ if (euen & CSR_EUEN_FPEN)
_save_fp(¤t->thread.fpu);
+
preempt_enable();
}
-
- return tsk->thread.fpu.fpr;
}
static inline int is_simd_owner(void)
diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c
index a0767c3a0f0a..f72adbf530c6 100644
--- a/arch/loongarch/kernel/ptrace.c
+++ b/arch/loongarch/kernel/ptrace.c
@@ -147,6 +147,8 @@ static int fpr_get(struct task_struct *target,
{
int r;
+ save_fpu_regs(target);
+
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
r = gfpr_get(target, &to);
else
@@ -278,6 +280,8 @@ static int simd_get(struct task_struct *target,
{
const unsigned int wr_size = NUM_FPU_REGS * regset->size;
+ save_fpu_regs(target);
+
if (!tsk_used_math(target)) {
/* The task hasn't used FP or LSX, fill with 0xff */
copy_pad_fprs(target, regset, &to, 0);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080739-trinity-overfed-f523@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory")
506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers")
ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api")
f6c466b84cfa ("drm/i915: Add support for moving fence waiting")
544460c33821 ("drm/i915: Multi-BB execbuf")
5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc")
e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc")
bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship")
6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission")
99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions")
c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship")
4f3059dc2dbb ("drm/i915: Add logical engine mapping")
1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context")
0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct")
0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend")
c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume")
0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects")
68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001
From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Date: Thu, 20 Jul 2023 11:35:44 +0200
Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus…
(cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 8ef93889061a..5ec293011d99 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
}
} while (unlikely(is_barrier(active)));
- if (!__i915_active_fence_set(active, fence))
+ fence = __i915_active_fence_set(active, fence);
+ if (!fence)
__i915_active_acquire(ref);
+ else
+ dma_fence_put(fence);
out:
i915_active_release(ref);
@@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref,
return NULL;
}
- rcu_read_lock();
prev = __i915_active_fence_set(active, fence);
- if (prev)
- prev = dma_fence_get_rcu(prev);
- else
+ if (!prev)
__i915_active_acquire(ref);
- rcu_read_unlock();
return prev;
}
@@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq)
*
* Records the new @fence as the last active fence along its timeline in
* this active tracker, moving the tracking callbacks from the previous
- * fence onto this one. Returns the previous fence (if not already completed),
- * which the caller must ensure is executed before the new fence. To ensure
- * that the order of fences within the timeline of the i915_active_fence is
- * understood, it should be locked by the caller.
+ * fence onto this one. Gets and returns a reference to the previous fence
+ * (if not already completed), which the caller must put after making sure
+ * that it is executed before the new fence. To ensure that the order of
+ * fences within the timeline of the i915_active_fence is understood, it
+ * should be locked by the caller.
*/
struct dma_fence *
__i915_active_fence_set(struct i915_active_fence *active,
@@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active,
struct dma_fence *prev;
unsigned long flags;
- if (fence == rcu_access_pointer(active->fence))
+ /*
+ * In case of fences embedded in i915_requests, their memory is
+ * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release
+ * by new requests. Then, there is a risk of passing back a pointer
+ * to a new, completely unrelated fence that reuses the same memory
+ * while tracked under a different active tracker. Combined with i915
+ * perf open/close operations that build await dependencies between
+ * engine kernel context requests and user requests from different
+ * timelines, this can lead to dependency loops and infinite waits.
+ *
+ * As a countermeasure, we try to get a reference to the active->fence
+ * first, so if we succeed and pass it back to our user then it is not
+ * released and potentially reused by an unrelated request before the
+ * user has a chance to set up an await dependency on it.
+ */
+ prev = i915_active_fence_get(active);
+ if (fence == prev)
return fence;
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
@@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active,
* Consider that we have two threads arriving (A and B), with
* C already resident as the active->fence.
*
- * A does the xchg first, and so it sees C or NULL depending
- * on the timing of the interrupt handler. If it is NULL, the
- * previous fence must have been signaled and we know that
- * we are first on the timeline. If it is still present,
- * we acquire the lock on that fence and serialise with the interrupt
- * handler, in the process removing it from any future interrupt
- * callback. A will then wait on C before executing (if present).
- *
- * As B is second, it sees A as the previous fence and so waits for
- * it to complete its transition and takes over the occupancy for
- * itself -- remembering that it needs to wait on A before executing.
+ * Both A and B have got a reference to C or NULL, depending on the
+ * timing of the interrupt handler. Let's assume that if A has got C
+ * then it has locked C first (before B).
*
* Note the strong ordering of the timeline also provides consistent
* nesting rules for the fence->lock; the inner lock is always the
* older lock.
*/
spin_lock_irqsave(fence->lock, flags);
- prev = xchg(__active_fence_slot(active), fence);
- if (prev) {
- GEM_BUG_ON(prev == fence);
+ if (prev)
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+
+ /*
+ * A does the cmpxchg first, and so it sees C or NULL, as before, or
+ * something else, depending on the timing of other threads and/or
+ * interrupt handler. If not the same as before then A unlocks C if
+ * applicable and retries, starting from an attempt to get a new
+ * active->fence. Meanwhile, B follows the same path as A.
+ * Once A succeeds with cmpxch, B fails again, retires, gets A from
+ * active->fence, locks it as soon as A completes, and possibly
+ * succeeds with cmpxchg.
+ */
+ while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
+ if (prev) {
+ spin_unlock(prev->lock);
+ dma_fence_put(prev);
+ }
+ spin_unlock_irqrestore(fence->lock, flags);
+
+ prev = i915_active_fence_get(active);
+ GEM_BUG_ON(prev == fence);
+
+ spin_lock_irqsave(fence->lock, flags);
+ if (prev)
+ spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+ }
+
+ /*
+ * If prev is NULL then the previous fence must have been signaled
+ * and we know that we are first on the timeline. If it is still
+ * present then, having the lock on that fence already acquired, we
+ * serialise with the interrupt handler, in the process of removing it
+ * from any future interrupt callback. A will then wait on C before
+ * executing (if present).
+ *
+ * As B is second, it sees A as the previous fence and so waits for
+ * it to complete its transition and takes over the occupancy for
+ * itself -- remembering that it needs to wait on A before executing.
+ */
+ if (prev) {
__list_del_entry(&active->cb.node);
spin_unlock(prev->lock); /* serialise with prev->cb_list */
}
@@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active,
int err = 0;
/* Must maintain timeline ordering wrt previous active requests */
- rcu_read_lock();
fence = __i915_active_fence_set(active, &rq->fence);
- if (fence) /* but the previous fence may not belong to that timeline! */
- fence = dma_fence_get_rcu(fence);
- rcu_read_unlock();
if (fence) {
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 894068bb37b6..833b73edefdb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq,
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
+ /*
+ * Users have to put a reference potentially got by
+ * __i915_active_fence_set() to the returned request
+ * when no longer needed
+ */
return to_request(__i915_active_fence_set(&timeline->last_request,
&rq->fence));
}
@@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq,
0);
}
+ /*
+ * Users have to put the reference to prev potentially got
+ * by __i915_active_fence_set() when no longer needed
+ */
return prev;
}
@@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
prev = __i915_request_ensure_ordering(rq, timeline);
else
prev = __i915_request_ensure_parallel_ordering(rq, timeline);
+ if (prev)
+ i915_request_put(prev);
/*
* Make sure that no request gazumped us - if it was allocated after
From: Joel Fernandes <joel(a)joelfernandes.org>
[ Upstream commit d52d3a2bf408ff86f3a79560b5cce80efb340239 ]
During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.
Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later. However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.
This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.
Cc: Paul McKenney <paulmck(a)kernel.org>
Cc: Frederic Weisbecker <fweisbec(a)gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
---
kernel/torture.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
VERBOSE_TOROUT_STRING(buf);
while (!kthread_should_stop()) {
torture_shutdown_absorb(title);
- schedule_timeout_uninterruptible(1);
+ schedule_timeout_uninterruptible(HZ/20);
}
}
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
--
2.41.0.640.ga95def55d0-goog
These two are backports for 6.1.y. Conflict resolution in done in
both patches.
I have tested LTP-nfs fchown02 and chown02 on 6.1.y with below patches
applied. The tests passed.
I would like to have a review as I am not familiar with this code.
Thanks to Vegard for helping me with this.
Thanks,
Harshit
Christian Brauner (2):
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
fs/attr.c | 1 +
fs/internal.h | 2 --
fs/nfs/inode.c | 4 +---
fs/nfsd/vfs.c | 4 +++-
include/linux/fs.h | 2 ++
5 files changed, 7 insertions(+), 6 deletions(-)
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f9e96bf1905479f18e83a3a4c314a8dfa56ede2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082736-canal-swimsuit-6b7b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f9e96bf19054 ("drm/vmwgfx: Fix possible invalid drm gem put calls")
9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
668b206601c5 ("drm/vmwgfx: Stop using raw ttm_buffer_object's")
39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
e0029da927fa ("drm/vmwgfx: Rename dummy to is_iomem")
cb8097a45da1 ("drm/vmwgfx: Cleanup the vmw bo usage in the cursor paths")
6703e28f976d ("drm/vmwgfx: Simplify fb pinning")
09881d2940bb ("drm/vmwgfx: Rename vmw_buffer_object to vmw_bo")
6b2e8aa45126 ("drm/vmwgfx: Remove the duplicate bo_free function")
f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs")
aebd8f0c6f82 ("Merge v6.2-rc6 into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9e96bf1905479f18e83a3a4c314a8dfa56ede2c Mon Sep 17 00:00:00 2001
From: Zack Rusin <zackr(a)vmware.com>
Date: Fri, 18 Aug 2023 00:13:01 -0400
Subject: [PATCH] drm/vmwgfx: Fix possible invalid drm gem put calls
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr(a)vmware.com>
Reported-by: Ian Forbes <iforbes(a)vmware.com>
Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable(a)vger.kernel.org> # v6.4+
Reviewed-by: Maaz Mombasawala<mombasawalam(a)vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@…
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 82094c137855..c43853597776 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -497,10 +497,9 @@ static int vmw_user_bo_synccpu_release(struct drm_file *filp,
if (!(flags & drm_vmw_synccpu_allow_cs)) {
atomic_dec(&vmw_bo->cpu_writers);
}
- ttm_bo_put(&vmw_bo->tbo);
+ vmw_user_bo_unref(vmw_bo);
}
- drm_gem_object_put(&vmw_bo->tbo.base);
return ret;
}
@@ -540,8 +539,7 @@ int vmw_user_bo_synccpu_ioctl(struct drm_device *dev, void *data,
return ret;
ret = vmw_user_bo_synccpu_grab(vbo, arg->flags);
- vmw_bo_unreference(&vbo);
- drm_gem_object_put(&vbo->tbo.base);
+ vmw_user_bo_unref(vbo);
if (unlikely(ret != 0)) {
if (ret == -ERESTARTSYS || ret == -EBUSY)
return -EBUSY;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 50a836e70994..1d433fceed3d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -195,6 +195,14 @@ static inline struct vmw_bo *vmw_bo_reference(struct vmw_bo *buf)
return buf;
}
+static inline void vmw_user_bo_unref(struct vmw_bo *vbo)
+{
+ if (vbo) {
+ ttm_bo_put(&vbo->tbo);
+ drm_gem_object_put(&vbo->tbo.base);
+ }
+}
+
static inline struct vmw_bo *to_vmw_bo(struct drm_gem_object *gobj)
{
return container_of((gobj), struct vmw_bo, tbo.base);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index d30c0e3d3ab7..98e0723ca6f5 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -1164,8 +1164,7 @@ static int vmw_translate_mob_ptr(struct vmw_private *dev_priv,
}
vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_MOB, VMW_BO_DOMAIN_MOB);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
- ttm_bo_put(&vmw_bo->tbo);
- drm_gem_object_put(&vmw_bo->tbo.base);
+ vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
@@ -1221,8 +1220,7 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv,
vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM,
VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
- ttm_bo_put(&vmw_bo->tbo);
- drm_gem_object_put(&vmw_bo->tbo.base);
+ vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index b62207be3363..1489ad73c103 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1665,10 +1665,8 @@ static struct drm_framebuffer *vmw_kms_fb_create(struct drm_device *dev,
err_out:
/* vmw_user_lookup_handle takes one ref so does new_fb */
- if (bo) {
- vmw_bo_unreference(&bo);
- drm_gem_object_put(&bo->tbo.base);
- }
+ if (bo)
+ vmw_user_bo_unref(bo);
if (surface)
vmw_surface_unreference(&surface);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
index 7e112319a23c..fb85f244c3d0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
@@ -451,8 +451,7 @@ int vmw_overlay_ioctl(struct drm_device *dev, void *data,
ret = vmw_overlay_update_stream(dev_priv, buf, arg, true);
- vmw_bo_unreference(&buf);
- drm_gem_object_put(&buf->tbo.base);
+ vmw_user_bo_unref(buf);
out_unlock:
mutex_unlock(&overlay->mutex);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
index e7226db8b242..1e81ff2422cf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
@@ -809,8 +809,7 @@ static int vmw_shader_define(struct drm_device *dev, struct drm_file *file_priv,
shader_type, num_input_sig,
num_output_sig, tfile, shader_handle);
out_bad_arg:
- vmw_bo_unreference(&buffer);
- drm_gem_object_put(&buffer->tbo.base);
+ vmw_user_bo_unref(buffer);
return ret;
}
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
Changes from v4:
Geert Uytterhoeven:
- use %px for ap->ioaddr.data_addr
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..616064b02de6 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %px",
+ base, ctl_base, ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
since the commit
a855724dc08b pinctrl: amd: Fix mistake in handling clearing pins at startup
after boot, pressing power button is detected just once and ignored
afterwards.
product: IdeaPad 5 14ALC05
cpu: AMD Ryzen 5 5500U with Radeon Graphics
bios version: G5CN16WW(V1.04)
distro: Arch Linux
desktop environment: KDE Plasma 5.27.7
steps to reproduce:
boot the computer
log in
run sudo evtest
select event2
(on my computer the power button is always represented by
/dev/input/event2,
i don't know if it's the same on others)
press the power button multiple times
(might have to close the log out dialog depending on the DE)
expected behavior:
all the power button presses are recorded
observed behavior:
only the first power button press is recorded
i also have a desktop computer with a ryzen 5 2600x processor, but that
isn't affected
#regzbot introduced: a855724dc08b
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2f406263e3e954aa24c1248edcfa9be0c1bb30fa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082620-overact-feisty-c309@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2f406263e3e954aa24c1248edcfa9be0c1bb30fa Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:15 +0800
Subject: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): don't use
mapcount() against large folio for sharing check
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index bfe0e06427bd..46802b4cf65a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -384,7 +384,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -458,7 +458,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082644-dimmed-purse-07c2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
cfeb6ae8bcb9 ("maple_tree: disable mas_wr_append() when other readers are possible")
2e1da329b424 ("maple_tree: add comments and some minor cleanups to mas_wr_append()")
c6fc9e4a5c50 ("maple_tree: add mas_wr_new_end() to calculate new_end accurately")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Fri, 18 Aug 2023 20:43:55 -0400
Subject: [PATCH] maple_tree: disable mas_wr_append() when other readers are
possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4dd73cf936a6..f723024e1426 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082642-energetic-playpen-0bea@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Fri, 18 Aug 2023 20:43:55 -0400
Subject: [PATCH] maple_tree: disable mas_wr_append() when other readers are
possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4dd73cf936a6..f723024e1426 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 987aae75fc1041072941ffb622b45ce2359a99b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082645-unicycle-diaphragm-272c@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
987aae75fc10 ("batman-adv: Hold rtnl lock during MTU update via netlink")
3e15b06eb7e4 ("batman-adv: Add fragmentation mesh genl configuration")
a1c8de803296 ("batman-adv: Add distributed_arp_table mesh genl configuration")
43ff6105a527 ("batman-adv: Add bridge_loop_avoidance mesh genl configuration")
d7e52506b680 ("batman-adv: Add bonding mesh genl configuration")
e43d16b87dc2 ("batman-adv: Add ap_isolation mesh/vlan genl configuration")
9ab4cee5ced9 ("batman-adv: Add aggregated_ogms mesh genl configuration")
49e7e37cd981 ("batman-adv: Prepare framework for vlan genl config")
5c55a40fa801 ("batman-adv: Prepare framework for hardif genl config")
600405135360 ("batman-adv: Prepare framework for mesh genl config")
c4a7a8d9bb8f ("batman-adv: Move common genl doit code pre/post hooks")
fb69be697916 ("batman-adv: Add inconsistent hardif netlink dump detection")
53dd9a68ba68 ("batman-adv: add multicast flags netlink support")
41aeefcc38a2 ("batman-adv: add DAT cache netlink support")
fec149f5d323 ("batman-adv: Convert packet.h to uapi header")
7e9a8c2ce7c5 ("batman-adv: Use parentheses in function kernel-doc")
7db7d9f369a4 ("batman-adv: Add SPDX license identifier above copyright header")
40b16b9be577 ("batman-adv: use inline kernel-doc for uapi constants")
706cc9f51d9a ("batman-adv: Add argument names for function ptr definitions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 987aae75fc1041072941ffb622b45ce2359a99b9 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Mon, 21 Aug 2023 21:48:48 +0200
Subject: [PATCH] batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7b…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 987aae75fc1041072941ffb622b45ce2359a99b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082644-placard-bullfight-dbc3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
987aae75fc10 ("batman-adv: Hold rtnl lock during MTU update via netlink")
3e15b06eb7e4 ("batman-adv: Add fragmentation mesh genl configuration")
a1c8de803296 ("batman-adv: Add distributed_arp_table mesh genl configuration")
43ff6105a527 ("batman-adv: Add bridge_loop_avoidance mesh genl configuration")
d7e52506b680 ("batman-adv: Add bonding mesh genl configuration")
e43d16b87dc2 ("batman-adv: Add ap_isolation mesh/vlan genl configuration")
9ab4cee5ced9 ("batman-adv: Add aggregated_ogms mesh genl configuration")
49e7e37cd981 ("batman-adv: Prepare framework for vlan genl config")
5c55a40fa801 ("batman-adv: Prepare framework for hardif genl config")
600405135360 ("batman-adv: Prepare framework for mesh genl config")
c4a7a8d9bb8f ("batman-adv: Move common genl doit code pre/post hooks")
fb69be697916 ("batman-adv: Add inconsistent hardif netlink dump detection")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 987aae75fc1041072941ffb622b45ce2359a99b9 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Mon, 21 Aug 2023 21:48:48 +0200
Subject: [PATCH] batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7b…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x d8e42a2b0addf238be8b3b37dcd9795a5c1be459
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082656-sadly-harness-6601@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
d8e42a2b0add ("batman-adv: Don't increase MTU when set by user")
c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
8b84cc4fb556 ("batman-adv: Use inline kernel-doc for enum/struct")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d8e42a2b0addf238be8b3b37dcd9795a5c1be459 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Wed, 19 Jul 2023 10:01:15 +0200
Subject: [PATCH] batman-adv: Don't increase MTU when set by user
If the user set an MTU value, it usually means that there are special
requirements for the MTU. But if an interface gots activated, the MTU was
always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the
MTU has to be decreased because batman-adv is not able to transfer packets
with the user specified size.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index ae5762af0146..24c9c0c3f316 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -630,7 +630,19 @@ int batadv_hardif_min_mtu(struct net_device *soft_iface)
*/
void batadv_update_min_mtu(struct net_device *soft_iface)
{
- dev_set_mtu(soft_iface, batadv_hardif_min_mtu(soft_iface));
+ struct batadv_priv *bat_priv = netdev_priv(soft_iface);
+ int limit_mtu;
+ int mtu;
+
+ mtu = batadv_hardif_min_mtu(soft_iface);
+
+ if (bat_priv->mtu_set_by_user)
+ limit_mtu = bat_priv->mtu_set_by_user;
+ else
+ limit_mtu = ETH_DATA_LEN;
+
+ mtu = min(mtu, limit_mtu);
+ dev_set_mtu(soft_iface, mtu);
/* Check if the local translate table should be cleaned up to match a
* new (and smaller) MTU.
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index d3fdf82282af..85d00dc9ce32 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -153,11 +153,14 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p)
static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu)
{
+ struct batadv_priv *bat_priv = netdev_priv(dev);
+
/* check ranges */
if (new_mtu < 68 || new_mtu > batadv_hardif_min_mtu(dev))
return -EINVAL;
dev->mtu = new_mtu;
+ bat_priv->mtu_set_by_user = new_mtu;
return 0;
}
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index ca9449ec9836..cf1a0eafe3ab 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -1546,6 +1546,12 @@ struct batadv_priv {
/** @soft_iface: net device which holds this struct as private data */
struct net_device *soft_iface;
+ /**
+ * @mtu_set_by_user: MTU was set once by user
+ * protected by rtnl_lock
+ */
+ int mtu_set_by_user;
+
/**
* @bat_counters: mesh internal traffic statistic counters (see
* batadv_counters)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e2c1ab070fdc81010ec44634838d24fce9ff9e53
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082631-preacher-spousal-e0e5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()")
7adb45887c8a ("mm: memory-failure: kill soft_offline_free_page()")
2a57d83c78f8 ("mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e2c1ab070fdc81010ec44634838d24fce9ff9e53 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Tue, 27 Jun 2023 19:28:08 +0800
Subject: [PATCH] mm: memory-failure: fix unexpected return value in
soft_offline_page()
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 139b31fdb678..fe121fdb05f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2741,10 +2741,13 @@ int soft_offline_page(unsigned long pfn, int flags)
if (ret > 0) {
ret = soft_offline_in_use_page(page);
} else if (ret == 0) {
- if (!page_handle_poison(page, true, false) && try_again) {
- try_again = false;
- flags &= ~MF_COUNT_INCREASED;
- goto retry;
+ if (!page_handle_poison(page, true, false)) {
+ if (try_again) {
+ try_again = false;
+ flags &= ~MF_COUNT_INCREASED;
+ goto retry;
+ }
+ ret = -EBUSY;
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e2c1ab070fdc81010ec44634838d24fce9ff9e53
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082619-rocky-strobe-5ad9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()")
7adb45887c8a ("mm: memory-failure: kill soft_offline_free_page()")
2a57d83c78f8 ("mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()")
dad4e5b39086 ("mm: fix page reference leak in soft_offline_page()")
8295d535e2aa ("mm,hwpoison: refactor get_any_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e2c1ab070fdc81010ec44634838d24fce9ff9e53 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Tue, 27 Jun 2023 19:28:08 +0800
Subject: [PATCH] mm: memory-failure: fix unexpected return value in
soft_offline_page()
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 139b31fdb678..fe121fdb05f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2741,10 +2741,13 @@ int soft_offline_page(unsigned long pfn, int flags)
if (ret > 0) {
ret = soft_offline_in_use_page(page);
} else if (ret == 0) {
- if (!page_handle_poison(page, true, false) && try_again) {
- try_again = false;
- flags &= ~MF_COUNT_INCREASED;
- goto retry;
+ if (!page_handle_poison(page, true, false)) {
+ if (try_again) {
+ try_again = false;
+ flags &= ~MF_COUNT_INCREASED;
+ goto retry;
+ }
+ ret = -EBUSY;
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d74943a2f3cdade34e471b36f55f7979be656867
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082639-crushed-impart-a42d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d74943a2f3cd ("mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT")
2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h")
63b605128655 ("mm/gup: move gup_must_unshare() to mm/internal.h")
f04740f54594 ("mm/gup: add FOLL_UNLOCKABLE")
d64e2dbc33a1 ("mm/gup: simplify the external interface functions and consolidate invariants")
afa3c33e2684 ("mm/gup: don't call __gup_longterm_locked() if FOLL_LONGTERM cannot be set")
b2a72dff85fa ("mm/gup: have internal functions get the mmap_read_lock()")
b5054174ac7c ("mm: move FOLL_* defs to mm_types.h")
8fa590bf3448 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d74943a2f3cdade34e471b36f55f7979be656867 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Thu, 3 Aug 2023 16:32:02 +0200
Subject: [PATCH] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
[2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs…
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: liubo <liubo254(a)huawei.com>
Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
Reported-by: Peter Xu <peterx(a)redhat.com>
Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 406ab9ea818f..34f9dba17c1a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
* Indicates whether GUP can follow a PROT_NONE mapped page, or whether
* a (NUMA hinting) fault is required.
*/
-static inline bool gup_can_follow_protnone(unsigned int flags)
+static inline bool gup_can_follow_protnone(struct vm_area_struct *vma,
+ unsigned int flags)
{
/*
- * FOLL_FORCE has to be able to make progress even if the VMA is
- * inaccessible. Further, FOLL_FORCE access usually does not represent
- * application behaviour and we should avoid triggering NUMA hinting
- * faults.
+ * If callers don't want to honor NUMA hinting faults, no need to
+ * determine if we would actually have to trigger a NUMA hinting fault.
*/
- return flags & FOLL_FORCE;
+ if (!(flags & FOLL_HONOR_NUMA_FAULT))
+ return true;
+
+ /*
+ * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs.
+ *
+ * Requiring a fault here even for inaccessible VMAs would mean that
+ * FOLL_FORCE cannot make any progress, because handle_mm_fault()
+ * refuses to process NUMA hinting faults in inaccessible VMAs.
+ */
+ return !vma_is_accessible(vma);
}
typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e74ce4a28cd..7d30dc4ff0ff 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1286,6 +1286,15 @@ enum {
FOLL_PCI_P2PDMA = 1 << 10,
/* allow interrupts from generic signals */
FOLL_INTERRUPTIBLE = 1 << 11,
+ /*
+ * Always honor (trigger) NUMA hinting faults.
+ *
+ * FOLL_WRITE implicitly honors NUMA hinting faults because a
+ * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE
+ * apply). get_user_pages_fast_only() always implicitly honors NUMA
+ * hinting faults.
+ */
+ FOLL_HONOR_NUMA_FAULT = 1 << 12,
/* See also internal only FOLL flags in mm/internal.h */
};
diff --git a/mm/gup.c b/mm/gup.c
index 76d222ccc3ff..6e2f9e9d6537 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -597,7 +597,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
pte = ptep_get(ptep);
if (!pte_present(pte))
goto no_page;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags))
goto no_page;
page = vm_normal_page(vma, address, pte);
@@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (likely(!pmd_trans_huge(pmdval)))
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
- if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags))
return no_page_table(vma, flags);
ptl = pmd_lock(mm, pmd);
@@ -851,6 +851,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
return NULL;
+ /*
+ * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect
+ * to fail on PROT_NONE-mapped pages.
+ */
page = follow_page_mask(vma, address, foll_flags, &ctx);
if (ctx.pgmap)
put_dev_pagemap(ctx.pgmap);
@@ -2227,6 +2231,13 @@ static bool is_valid_gup_args(struct page **pages, int *locked,
gup_flags |= FOLL_UNLOCKABLE;
}
+ /*
+ * For now, always trigger NUMA hinting faults. Some GUP users like
+ * KVM require the hint to be as the calling context of GUP is
+ * functionally similar to a memory reference from task context.
+ */
+ gup_flags |= FOLL_HONOR_NUMA_FAULT;
+
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
@@ -2551,7 +2562,14 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
struct page *page;
struct folio *folio;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ /*
+ * Always fallback to ordinary GUP on PROT_NONE-mapped pages:
+ * pte_access_permitted() better should reject these pages
+ * either way: otherwise, GUP-fast might succeed in
+ * cases where ordinary GUP would fail due to VMA access
+ * permissions.
+ */
+ if (pte_protnone(pte))
goto pte_unmap;
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
@@ -2970,8 +2988,8 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
pmd_devmap(pmd))) {
- if (pmd_protnone(pmd) &&
- !gup_can_follow_protnone(flags))
+ /* See gup_pte_range() */
+ if (pmd_protnone(pmd))
return 0;
if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
@@ -3151,7 +3169,7 @@ static int internal_get_user_pages_fast(unsigned long start,
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET |
FOLL_FAST_ONLY | FOLL_NOFAULT |
- FOLL_PCI_P2PDMA)))
+ FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT)))
return -EINVAL;
if (gup_flags & FOLL_PIN)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..f15d557e5708 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
return ERR_PTR(-EFAULT);
- /* Full NUMA hinting faults to serialise migration in fault paths */
- if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
return NULL;
if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page))
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1cbc11aaa01f80577b67ae02c73ee781112125fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082613-lilac-overpower-64a2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1cbc11aaa01f ("NFSv4: Fix dropped lock for racing OPEN and delegation return")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cbc11aaa01f80577b67ae02c73ee781112125fd Mon Sep 17 00:00:00 2001
From: Benjamin Coddington <bcodding(a)redhat.com>
Date: Fri, 30 Jun 2023 09:18:13 -0400
Subject: [PATCH] NFSv4: Fix dropped lock for racing OPEN and delegation return
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case. This patch expects that commit to be
reverted.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
Fix this by using the old_stateid refresh helpers if the server replies
with OLD_STATEID.
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding(a)redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e1a886b58354..4604e9f3d1b0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7181,8 +7181,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
} else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
goto out_restart;
break;
- case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_OLD_STATEID:
+ if (data->arg.new_lock_owner != 0 &&
+ nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
+ lsp->ls_state))
+ goto out_restart;
+ if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
+ goto out_restart;
+ fallthrough;
+ case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
if (data->arg.new_lock_owner != 0) {
Hi Sasha
I just saw that you queued up mm-disable-config_per_vma_lock-until-its-fixed.patch for 6.4.
The problems that this patch tried to prevent were fixed before it actually made it into a
release, and Linus un-did the commit in his merge (at the bottom):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/m…
Since the fixes for PER_VMA_LOCK have been in 6.4 releases for a while, this patch
should not go in.
thanks
Holger
Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
when checking to see if a page fault is stale, as the sequence count is
stored as an "unsigned long" everywhere else in KVM. This fixes a bug
where KVM will effectively hang vCPUs due to always thinking page faults
are stale, which results in KVM refusing to "fix" faults.
mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
KVM is handling page faults to detect if userspace mappings relevant to
the guest were invalidated between snapshotting the counter and acquiring
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
resolve the page fault is fresh. If KVM sees that the counter has
changed, KVM simply resumes the guest without fixing the fault.
What _should_ happen is that the source of the mmu_notifier invalidations
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
again fix guest page fault(s).
But for a long-lived VM and/or a VM that the host just doesn't particularly
like, it's possible for a VM to be on the receiving end of 2 billion (with
a B) mmu_notifier invalidations. When that happens, bit 31 will be set in
mmu_invalidate_seq. This causes the value to be turned into a 32-bit
negative value when implicitly cast to an "int" by is_page_fault_stale(),
and then sign-extended into a 64-bit unsigned when the signed "int" is
implicitly cast back to an "unsigned long" on the call to
mmu_invalidate_retry_hva().
As a result of the casting and sign-extension, given a sequence counter of
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
if (0x8002dc25 != 0xffffffff8002dc25)
and signals that the page fault is stale and needs to be retried even
though the sequence counter is stable, and KVM effectively hangs any vCPU
that takes a page fault (EPT violation or #NPF when TDP is enabled).
Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
how KVM tracks the sequence counter snapshot.
Reported-by: Brian Rak <brak(a)vultr.com>
Reported-by: Amaan Cheval <amaan.cheval(a)gmail.com>
Reported-by: Eric Wheeler <kvm(a)lists.ewheeler.net>
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameserver…
Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 230108a90cf3..beca03556379 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,7 +4212,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* root was invalidated by a memslot update or a relevant mmu_notifier fired.
*/
static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault, int mmu_seq)
+ struct kvm_page_fault *fault,
+ unsigned long mmu_seq)
{
struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Upstream commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec.
Preserve TDP MMU roots until they are explicitly invalidated by gifting
the TDP MMU itself a reference to a root when it is allocated. Keeping a
reference in the TDP MMU fixes a flaw where the TDP MMU exhibits terrible
performance, and can potentially even soft-hang a vCPU, if a vCPU
frequently unloads its roots, e.g. when KVM is emulating SMI+RSM.
When KVM emulates something that invalidates _all_ TLB entries, e.g. SMI
and RSM, KVM unloads all of the vCPUs roots (KVM keeps a small per-vCPU
cache of previous roots). Unloading roots is a simple way to ensure KVM
flushes and synchronizes all roots for the vCPU, as KVM flushes and syncs
when allocating a "new" root (from the vCPU's perspective).
In the shadow MMU, KVM keeps track of all shadow pages, roots included, in
a per-VM hash table. Unloading a shadow MMU root just wipes it from the
per-vCPU cache; the root is still tracked in the per-VM hash table. When
KVM loads a "new" root for the vCPU, KVM will find the old, unloaded root
in the per-VM hash table.
Unlike the shadow MMU, the TDP MMU doesn't track "inactive" roots in a
per-VM structure, where "active" in this case means a root is either
in-use or cached as a previous root by at least one vCPU. When a TDP MMU
root becomes inactive, i.e. the last vCPU reference to the root is put,
KVM immediately frees the root (asterisk on "immediately" as the actual
freeing may be done by a worker, but for all intents and purposes the root
is gone).
The TDP MMU behavior is especially problematic for 1-vCPU setups, as
unloading all roots effectively frees all roots. The issue is mitigated
to some degree in multi-vCPU setups as a different vCPU usually holds a
reference to an unloaded root and thus keeps the root alive, allowing the
vCPU to reuse its old root after unloading (with a flush+sync).
The TDP MMU flaw has been known for some time, as until very recently,
KVM's handling of CR0.WP also triggered unloading of all roots. The
CR0.WP toggling scenario was eventually addressed by not unloading roots
when _only_ CR0.WP is toggled, but such an approach doesn't Just Work
for emulating SMM as KVM must emulate a full TLB flush on entry and exit
to/from SMM. Given that the shadow MMU plays nice with unloading roots
at will, teaching the TDP MMU to do the same is far less complex than
modifying KVM to track which roots need to be flushed before reuse.
Note, preserving all possible TDP MMU roots is not a concern with respect
to memory consumption. Now that the role for direct MMUs doesn't include
information about the guest, e.g. CR0.PG, CR0.WP, CR4.SMEP, etc., there
are _at most_ six possible roots (where "guest_mode" here means L2):
1. 4-level !SMM !guest_mode
2. 4-level SMM !guest_mode
3. 5-level !SMM !guest_mode
4. 5-level SMM !guest_mode
5. 4-level !SMM guest_mode
6. 5-level !SMM guest_mode
And because each vCPU can track 4 valid roots, a VM can already have all
6 root combinations live at any given time. Not to mention that, in
practice, no sane VMM will advertise different guest.MAXPHYADDR values
across vCPUs, i.e. KVM won't ever use both 4-level and 5-level roots for
a single VM. Furthermore, the vast majority of modern hypervisors will
utilize EPT/NPT when available, thus the guest_mode=%true cases are also
unlikely to be utilized.
[6.1 backport notes: conflicts with
09732d2b4dc5 ("KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled")
1f98f2bd8ec4 ("KVM: x86/mmu: Change tdp_mmu to a read-only parameter")
de0322f575be ("KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page()")
prevented a clean cherry-pick. First two resolved by keeping 6.1's check
on kvm->arch.tdp_mmu_enabled, last one resolved by taking the upstream
change, i.e. by opportunistically switching to is_tdp_mmu_page()]
Reported-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Link: https://lore.kernel.org/all/959c5bce-beb5-b463-7158-33fc4a4f910c@linux.micr…
Link: https://lkml.kernel.org/r/20220209170020.1775368-1-pbonzini%40redhat.com
Link: https://lore.kernel.org/all/20230322013731.102955-1-minipli@grsecurity.net
Link: https://lore.kernel.org/all/000000000000a0bc2b05f9dd7fab@google.com
Link: https://lore.kernel.org/all/000000000000eca0b905fa0f7756@google.com
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230426220323.3079789-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 121 +++++++++++++++++--------------------
1 file changed, 56 insertions(+), 65 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 672f0432d777..70945f00ec41 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -51,7 +51,17 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
if (!kvm->arch.tdp_mmu_enabled)
return;
- /* Also waits for any queued work items. */
+ /*
+ * Invalidate all roots, which besides the obvious, schedules all roots
+ * for zapping and thus puts the TDP MMU's reference to each root, i.e.
+ * ultimately frees all roots.
+ */
+ kvm_tdp_mmu_invalidate_all_roots(kvm);
+
+ /*
+ * Destroying a workqueue also first flushes the workqueue, i.e. no
+ * need to invoke kvm_tdp_mmu_zap_invalidated_roots().
+ */
destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
@@ -127,16 +137,6 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
}
-static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
-{
- union kvm_mmu_page_role role = page->role;
- role.invalid = true;
-
- /* No need to use cmpxchg, only the invalid bit can change. */
- role.word = xchg(&page->role.word, role.word);
- return role.invalid;
-}
-
void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
bool shared)
{
@@ -145,45 +145,12 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
return;
- WARN_ON(!root->tdp_mmu_page);
-
/*
- * The root now has refcount=0. It is valid, but readers already
- * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
- * rejects it. This remains true for the rest of the execution
- * of this function, because readers visit valid roots only
- * (except for tdp_mmu_zap_root_work(), which however
- * does not acquire any reference itself).
- *
- * Even though there are flows that need to visit all roots for
- * correctness, they all take mmu_lock for write, so they cannot yet
- * run concurrently. The same is true after kvm_tdp_root_mark_invalid,
- * since the root still has refcount=0.
- *
- * However, tdp_mmu_zap_root can yield, and writers do not expect to
- * see refcount=0 (see for example kvm_tdp_mmu_invalidate_all_roots()).
- * So the root temporarily gets an extra reference, going to refcount=1
- * while staying invalid. Readers still cannot acquire any reference;
- * but writers are now allowed to run if tdp_mmu_zap_root yields and
- * they might take an extra reference if they themselves yield.
- * Therefore, when the reference is given back by the worker,
- * there is no guarantee that the refcount is still 1. If not, whoever
- * puts the last reference will free the page, but they will not have to
- * zap the root because a root cannot go from invalid to valid.
+ * The TDP MMU itself holds a reference to each root until the root is
+ * explicitly invalidated, i.e. the final reference should be never be
+ * put for a valid root.
*/
- if (!kvm_tdp_root_mark_invalid(root)) {
- refcount_set(&root->tdp_mmu_root_count, 1);
-
- /*
- * Zapping the root in a worker is not just "nice to have";
- * it is required because kvm_tdp_mmu_invalidate_all_roots()
- * skips already-invalid roots. If kvm_tdp_mmu_put_root() did
- * not add the root to the workqueue, kvm_tdp_mmu_zap_all_fast()
- * might return with some roots not zapped yet.
- */
- tdp_mmu_schedule_zap_root(kvm, root);
- return;
- }
+ KVM_BUG_ON(!is_tdp_mmu_page(root) || !root->role.invalid, kvm);
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_del_rcu(&root->link);
@@ -329,7 +296,14 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
root = tdp_mmu_alloc_sp(vcpu);
tdp_mmu_init_sp(root, NULL, 0, role);
- refcount_set(&root->tdp_mmu_root_count, 1);
+ /*
+ * TDP MMU roots are kept until they are explicitly invalidated, either
+ * by a memslot update or by the destruction of the VM. Initialize the
+ * refcount to two; one reference for the vCPU, and one reference for
+ * the TDP MMU itself, which is held until the root is invalidated and
+ * is ultimately put by tdp_mmu_zap_root_work().
+ */
+ refcount_set(&root->tdp_mmu_root_count, 2);
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -1027,32 +1001,49 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
/*
* Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that
* is about to be zapped, e.g. in response to a memslots update. The actual
- * zapping is performed asynchronously, so a reference is taken on all roots.
- * Using a separate workqueue makes it easy to ensure that the destruction is
- * performed before the "fast zap" completes, without keeping a separate list
- * of invalidated roots; the list is effectively the list of work items in
- * the workqueue.
+ * zapping is performed asynchronously. Using a separate workqueue makes it
+ * easy to ensure that the destruction is performed before the "fast zap"
+ * completes, without keeping a separate list of invalidated roots; the list is
+ * effectively the list of work items in the workqueue.
*
- * Get a reference even if the root is already invalid, the asynchronous worker
- * assumes it was gifted a reference to the root it processes. Because mmu_lock
- * is held for write, it should be impossible to observe a root with zero refcount,
- * i.e. the list of roots cannot be stale.
- *
- * This has essentially the same effect for the TDP MMU
- * as updating mmu_valid_gen does for the shadow MMU.
+ * Note, the asynchronous worker is gifted the TDP MMU's reference.
+ * See kvm_tdp_mmu_get_vcpu_root_hpa().
*/
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
{
struct kvm_mmu_page *root;
- lockdep_assert_held_write(&kvm->mmu_lock);
- list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
- if (!root->role.invalid &&
- !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
+ /*
+ * mmu_lock must be held for write to ensure that a root doesn't become
+ * invalid while there are active readers (invalidating a root while
+ * there are active readers may or may not be problematic in practice,
+ * but it's uncharted territory and not supported).
+ *
+ * Waive the assertion if there are no users of @kvm, i.e. the VM is
+ * being destroyed after all references have been put, or if no vCPUs
+ * have been created (which means there are no roots), i.e. the VM is
+ * being destroyed in an error path of KVM_CREATE_VM.
+ */
+ if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
+ refcount_read(&kvm->users_count) && kvm->created_vcpus)
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ /*
+ * As above, mmu_lock isn't held when destroying the VM! There can't
+ * be other references to @kvm, i.e. nothing else can invalidate roots
+ * or be consuming roots, but walking the list of roots does need to be
+ * guarded against roots being deleted by the asynchronous zap worker.
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) {
+ if (!root->role.invalid) {
root->role.invalid = true;
tdp_mmu_schedule_zap_root(kvm, root);
}
}
+
+ rcu_read_unlock();
}
/*
base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
--
2.42.0.rc2.253.gd59a3bf2b4-goog