This is the start of the stable review cycle for the 5.15.129 release.
There are 89 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.129-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.129-rc1
Rik van Riel <riel(a)surriel.com>
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Miaohe Lin <linmiaohe(a)huawei.com>
mm: memory-failure: fix unexpected return value in soft_offline_page()
Kefeng Wang <wangkefeng.wang(a)huawei.com>
mm: memory-failure: kill soft_offline_free_page()
Rob Clark <robdclark(a)chromium.org>
dma-buf/sw_sync: Avoid recursive lock during fence signal
Biju Das <biju.das.jz(a)bp.renesas.com>
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
Biju Das <biju.das.jz(a)bp.renesas.com>
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: raid_class: Remove raid_component_add()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix double free in snic_tgt_create()
Oliver Hartkopp <socketcan(a)hartkopp.net>
can: raw: add missing refcount for memory leak fix
Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
drm/i915: Fix premature release of request's reusable memory
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
cgroup/cpuset: Free DL BW in case can_attach() fails
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
sched/deadline: Create DL BW alloc, free & check overflow interface
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Iterate only if DEADLINE tasks are present
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
Juri Lelli <juri.lelli(a)redhat.com>
sched/cpuset: Bring back cpuset_mutex
Juri Lelli <juri.lelli(a)redhat.com>
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Joel Fernandes (Google) <joel(a)joelfernandes.org>
torture: Fix hang during kthread shutdown phase
Christian Brauner <brauner(a)kernel.org>
nfsd: use vfs setgid helper
Christian Brauner <brauner(a)kernel.org>
nfs: use vfs setgid helper
Feng Tang <feng.tang(a)intel.com>
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Rick Edgecombe <rick.p.edgecombe(a)intel.com>
x86/fpu: Invalidate FPU state correctly on exec()
Ankit Nautiyal <ankit.k.nautiyal(a)intel.com>
drm/display/dp: Fix the DP DSC Receiver cap size
Zack Rusin <zackr(a)vmware.com>
drm/vmwgfx: Fix shader stage validation
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
Wei Chen <harperchen1110(a)gmail.com>
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Rob Herring <robh(a)kernel.org>
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
Rob Herring <robh(a)kernel.org>
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
Arnd Bergmann <arnd(a)arndb.de>
radix tree: remove unused variable
Helge Deller <deller(a)gmx.de>
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Hold rtnl lock during MTU update via netlink
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't increase MTU when set by user
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Trigger events for auto adjusted MTU
Christian Göttsche <cgzones(a)googlemail.com>
selinux: set next pointer before attaching to list
Benjamin Coddington <bcodding(a)redhat.com>
nfsd: Fix race to FREE_STATEID and cl_revoked
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix a use after free in nfs_direct_join_group()
Alexandre Ghiti <alexghiti(a)rivosinc.com>
mm: add a call to flush_cache_vmap() in vmap_pfn()
Takashi Iwai <tiwai(a)suse.de>
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
clk: Fix slab-out-of-bounds error in devm_clk_release()
Benjamin Coddington <bcodding(a)redhat.com>
NFSv4: Fix dropped lock for racing OPEN and delegation return
Michael Ellerman <mpe(a)ellerman.id.au>
ibmveth: Use dcbf rather than dcbfl
Sean Christopherson <seanjc(a)google.com>
Revert "KVM: x86: enable TDP MMU by default"
Ivan Mikhaylov <fr0st61te(a)gmail.com>
net/ncsi: change from ndo_set_mac_address to dev_set_mac_address
Ivan Mikhaylov <fr0st61te(a)gmail.com>
net/ncsi: make one oem_gma function for all mfr id
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: fix macvlan over alb bond support
Jakub Kicinski <kuba(a)kernel.org>
net: remove bond_slave_has_mac_rcu()
Ido Schimmel <idosch(a)nvidia.com>
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Florent Fourcot <florent.fourcot(a)wifirst.fr>
rtnetlink: return ENODEV when ifname does not exist and group is given
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix out of memory error handling
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: flush pending destroy work before netlink notifier
Jamal Hadi Salim <jhs(a)mojatatu.com>
net/sched: fix a qdisc modification with ambiguous command request
Sasha Neftin <sasha.neftin(a)intel.com>
igc: Fix the typo in the PTM Control macro
Alessio Igor Bogani <alessio.bogani(a)elettra.eu>
igb: Avoid starting unnecessary workqueues
Jesse Brandeburg <jesse.brandeburg(a)intel.com>
ice: fix receive buffer size miscalculation
Jakub Kicinski <kuba(a)kernel.org>
net: validate veth and vxcan peer ifindexes
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bcmgenet: Fix return value check for fixed_phy_register()
Ruan Jinjie <ruanjinjie(a)huawei.com>
net: bgmac: Fix return value check for fixed_phy_register()
Lu Wei <luwei32(a)huawei.com>
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
Eric Dumazet <edumazet(a)google.com>
dccp: annotate data-races in dccp_poll()
Eric Dumazet <edumazet(a)google.com>
sock: annotate data-races around prot->memory_pressure
Hariprasad Kelam <hkelam(a)marvell.com>
octeontx2-af: SDP: fix receive link config
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix memleak due to race between current_tracer and trace
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Eric Dumazet <edumazet(a)google.com>
can: raw: fix lockdep issue in raw_release()
Taimur Hassan <syed.hassan(a)amd.com>
drm/amd/display: check TG is non-null before checking if enabled
Josip Pavic <Josip.Pavic(a)amd.com>
drm/amd/display: do not wait for mpc idle if tg is disabled
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: raw: fix receiver memory leak
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: fix a race when checking checkpoint buffer busy
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: remove journal_clean_one_cp_list()
Zhang Yi <yi.zhang(a)huawei.com>
jbd2: remove t_checkpoint_io_list
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
Zhang Shurong <zhang_shurong(a)foxmail.com>
fbdev: fix potential OOB read in fast_imageblit()
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Fix sys_imageblit() for arbitrary image widths
Thomas Zimmermann <tzimmermann(a)suse.de>
fbdev: Improve performance of sys_imageblit()
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: cpu-features: Enable octeon_cache by cpu_type
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: fix mismatch of plock results from userspace
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: use dlm_plock_info for do_unlock_close
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: change plock interrupted message to debug again
Alexander Aring <aahringo(a)redhat.com>
fs: dlm: add pid to debug log
Jakob Koschel <jakobkoschel(a)gmail.com>
dlm: replace usage of found with dedicated list iterator variable
Alexander Aring <aahringo(a)redhat.com>
dlm: improve plock logging if interrupted
Igor Mammedov <imammedo(a)redhat.com>
PCI: acpiphp: Reassign resources on bridge if necessary
Chuck Lever <chuck.lever(a)oracle.com>
xprtrdma: Remap Receive buffers after a reconnect
Fedor Pchelkin <pchelkin(a)ispras.ru>
NFSv4: fix out path in __nfs4_get_acl_uncached
Fedor Pchelkin <pchelkin(a)ispras.ru>
NFSv4.2: fix error handling in nfs42_proc_getxattr
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fix SRSO mess
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/cpu-features.h | 21 +-
arch/x86/include/asm/fpu/internal.h | 3 +-
arch/x86/kernel/fpu/core.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 7 +
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
drivers/clk/clk-devres.c | 13 +-
drivers/dma-buf/sw_sync.c | 18 +-
.../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 +-
drivers/gpu/drm/i915/i915_active.c | 99 ++++++---
drivers/gpu/drm/i915/i915_request.c | 2 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 12 ++
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 29 +--
drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 2 +
drivers/net/bonding/bond_alb.c | 6 +-
drivers/net/can/vxcan.c | 7 +-
drivers/net/ethernet/broadcom/bgmac.c | 2 +-
drivers/net/ethernet/broadcom/genet/bcmmii.c | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
drivers/net/ethernet/intel/ice/ice_base.c | 3 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 24 +--
drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 3 +-
drivers/net/ipvlan/ipvlan_main.c | 3 +-
drivers/net/veth.c | 5 +-
drivers/of/dynamic.c | 31 +--
drivers/of/kexec.c | 4 +-
drivers/of/unittest.c | 4 +-
drivers/pci/hotplug/acpiphp_glue.c | 9 +-
drivers/pinctrl/renesas/pinctrl-rza2.c | 17 +-
drivers/scsi/raid_class.c | 48 -----
drivers/scsi/snic/snic_disc.c | 3 +-
drivers/video/fbdev/core/sysimgblt.c | 64 +++++-
fs/attr.c | 1 +
fs/dlm/lock.c | 53 +++--
fs/dlm/plock.c | 89 +++++---
fs/dlm/recover.c | 39 ++--
fs/internal.h | 2 -
fs/jbd2/checkpoint.c | 165 ++++++---------
fs/jbd2/commit.c | 3 +-
fs/jbd2/transaction.c | 17 +-
fs/nfs/direct.c | 26 ++-
fs/nfs/inode.c | 4 +-
fs/nfs/nfs42proc.c | 5 +-
fs/nfs/nfs4proc.c | 14 +-
fs/nfsd/nfs4state.c | 2 +-
fs/nfsd/vfs.c | 4 +-
include/drm/drm_dp_helper.h | 2 +-
include/linux/clk.h | 80 +++----
include/linux/cpuset.h | 12 +-
include/linux/fs.h | 2 +
include/linux/jbd2.h | 7 +-
include/linux/raid_class.h | 4 -
include/linux/sched.h | 4 +-
include/net/bonding.h | 25 +--
include/net/rtnetlink.h | 4 +-
include/net/sock.h | 7 +-
include/trace/events/jbd2.h | 12 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 232 ++++++++++++++-------
kernel/sched/core.c | 41 ++--
kernel/sched/deadline.c | 66 ++++--
kernel/sched/sched.h | 2 +-
kernel/torture.c | 2 +-
kernel/trace/trace.c | 15 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +
lib/clz_ctz.c | 32 +--
lib/radix-tree.c | 1 -
mm/memory-failure.c | 21 +-
mm/vmalloc.c | 4 +
net/batman-adv/bat_v_elp.c | 3 +-
net/batman-adv/bat_v_ogm.c | 7 +-
net/batman-adv/hard-interface.c | 14 +-
net/batman-adv/netlink.c | 3 +
net/batman-adv/soft-interface.c | 3 +
net/batman-adv/translation-table.c | 1 -
net/batman-adv/types.h | 6 +
net/can/raw.c | 76 ++++---
net/core/rtnetlink.c | 43 +++-
net/dccp/proto.c | 20 +-
net/ncsi/ncsi-rsp.c | 93 ++-------
net/netfilter/nf_tables_api.c | 2 +-
net/netfilter/nft_set_pipapo.c | 13 +-
net/sched/sch_api.c | 53 +++--
net/sctp/socket.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 9 +-
security/selinux/ss/policydb.c | 2 +-
sound/core/pcm_memory.c | 44 +++-
sound/pci/ymfpci/ymfpci.c | 10 +-
tools/objtool/arch/x86/decode.c | 11 +-
tools/objtool/check.c | 22 +-
tools/objtool/include/objtool/arch.h | 1 +
tools/objtool/include/objtool/elf.h | 1 +
94 files changed, 1070 insertions(+), 834 deletions(-)
The following commit has been merged into the smp/urgent branch of tip:
Commit-ID: 2b8272ff4a70b866106ae13c36be7ecbef5d5da2
Gitweb: https://git.kernel.org/tip/2b8272ff4a70b866106ae13c36be7ecbef5d5da2
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 23 Aug 2023 10:47:02 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 30 Aug 2023 12:24:22 +02:00
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
CPU1 CPU2
T1 sets cfs_quota
starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
...
'period_timer' expires and is re-enqueued on CPU1
...
take_cpu_down()
CPU1 shuts down and does not handle timers
anymore. They have to be migrated in the
post dead hotplug steps by the control task.
T1 runs the post dead offline operation
T1 is scheduled out
T1 waits for 'period_timer' to expire
T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().
Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.
Reported-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Yu Liao <liaoyu15(a)huawei.com>
Acked-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
---
kernel/cpu.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f6811c8..6de7c6b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1487,8 +1487,22 @@ out:
return ret;
}
+struct cpu_down_work {
+ unsigned int cpu;
+ enum cpuhp_state target;
+};
+
+static long __cpu_down_maps_locked(void *arg)
+{
+ struct cpu_down_work *work = arg;
+
+ return _cpu_down(work->cpu, 0, work->target);
+}
+
static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
{
+ struct cpu_down_work work = { .cpu = cpu, .target = target, };
+
/*
* If the platform does not support hotplug, report it explicitly to
* differentiate it from a transient offlining failure.
@@ -1497,7 +1511,15 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
return -EOPNOTSUPP;
if (cpu_hotplug_disabled)
return -EBUSY;
- return _cpu_down(cpu, 0, target);
+
+ /*
+ * Ensure that the control task does not run on the to be offlined
+ * CPU to prevent a deadlock against cfs_b->period_timer.
+ */
+ cpu = cpumask_any_but(cpu_online_mask, cpu);
+ if (cpu >= nr_cpu_ids)
+ return -EBUSY;
+ return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
}
static int cpu_down(unsigned int cpu, enum cpuhp_state target)
This is the start of the stable review cycle for the 4.14.324 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 30 Aug 2023 10:11:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.324-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.324-rc1
Rob Clark <robdclark(a)chromium.org>
dma-buf/sw_sync: Avoid recursive lock during fence signal
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: raid_class: Remove raid_component_add()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix double free in snic_tgt_create()
Ido Schimmel <idosch(a)nvidia.com>
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
Feng Tang <feng.tang(a)intel.com>
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
Wei Chen <harperchen1110(a)gmail.com>
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Helge Deller <deller(a)gmx.de>
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Fix TT global entry leak when client roamed back
Remi Pommarel <repk(a)triplefau.lt>
batman-adv: Do not get eth header before batadv_check_management_packet
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Trigger events for auto adjusted MTU
Michael Ellerman <mpe(a)ellerman.id.au>
ibmveth: Use dcbf rather than dcbfl
Sishuai Gong <sishuai.system(a)gmail.com>
ipvs: fix racy memcpy in proc_do_sync_threshold
Junwei Hu <hujunwei4(a)huawei.com>
ipvs: Improve robustness to the ipvs sysctl
Alessio Igor Bogani <alessio.bogani(a)elettra.eu>
igb: Avoid starting unnecessary workqueues
Eric Dumazet <edumazet(a)google.com>
sock: annotate data-races around prot->memory_pressure
Zheng Yejian <zhengyejian1(a)huawei.com>
tracing: Fix memleak due to race between current_tracer and trace
Justin Chen <justin.chen(a)broadcom.com>
net: phy: broadcom: stub c45 read/write for 54810
Lin Ma <linma(a)zju.edu.cn>
net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
Jason Xing <kernelxing(a)tencent.com>
net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled
Kuniyuki Iwashima <kuniyu(a)amazon.com>
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
Zhang Shurong <zhang_shurong(a)foxmail.com>
ASoC: rt5665: add missed regulator_bulk_disable
Xin Long <lucien.xin(a)gmail.com>
netfilter: set default timeout to 3 secs for sctp shutdown send and recv state
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
test_firmware: prevent race conditions by a correct implementation of locking
Qi Zheng <zhengqi.arch(a)bytedance.com>
binder: fix memory leak in binder_init()
Tony Lindgren <tony(a)atomide.com>
serial: 8250: Fix oops for port->pm on uart_change_pm()
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
Russell Harmon via samba-technical <samba-technical(a)lists.samba.org>
cifs: Release folio lock on fscache read hit.
dengxiang <dengxiang(a)nfschina.com>
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
Eric Dumazet <edumazet(a)google.com>
net: do not allow gso_size to be set to GSO_BY_FRAGS
Abel Wu <wuyun.abel(a)bytedance.com>
sock: Fix misuse of sk_under_memory_pressure()
Andrii Staikov <andrii.staikov(a)intel.com>
i40e: fix misleading debug logs
Ziyang Xuan <william.xuanziyang(a)huawei.com>
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_dynset: disallow object maps
Lin Ma <linma(a)zju.edu.cn>
xfrm: add NULL check in xfrm_update_ae_params
Zhengchao Shao <shaozhengchao(a)huawei.com>
ip_vti: fix potential slab-use-after-free in decode_session6
Zhengchao Shao <shaozhengchao(a)huawei.com>
ip6_vti: fix slab-use-after-free in decode_session6
Lin Ma <linma(a)zju.edu.cn>
net: af_key: fix sadb_x_filter validation
Lin Ma <linma(a)zju.edu.cn>
net: xfrm: Fix xfrm_address_filter OOB read
Nathan Lynch <nathanl(a)linux.ibm.com>
powerpc/rtas_flash: allow user copy to flash block cache objects
Yuanjun Gong <ruc_gongyuanjun(a)163.com>
fbdev: mmp: fix value check in mmphw_probe()
shanzhulig <shanzhulig(a)gmail.com>
drm/amdgpu: Fix potential fence use-after-free v2
Zhengping Jiang <jiangzp(a)google.com>
Bluetooth: L2CAP: Fix use-after-free
Armin Wolf <W_Armin(a)gmx.de>
pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db()
Tuo Li <islituo(a)gmail.com>
gfs2: Fix possible data races in gfs2_show_options()
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: platform: mediatek: vpu: fix NULL ptr dereference
Yunfei Dong <yunfei.dong(a)mediatek.com>
media: v4l2-mem2mem: add lock to protect parameter num_rdy
Immad Mir <mirimmad17(a)gmail.com>
FS: JFS: Check for read-only mounted filesystem in txBegin
Immad Mir <mirimmad17(a)gmail.com>
FS: JFS: Fix null-ptr-deref Read in txBegin
Gustavo A. R. Silva <gustavoars(a)kernel.org>
MIPS: dec: prom: Address -Warray-bounds warning
Yogesh <yogi.kernel(a)gmail.com>
fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev
Jan Kara <jack(a)suse.cz>
udf: Fix uninitialized array access for some pathnames
Ye Bin <yebin10(a)huawei.com>
quota: fix warning in dqgrab()
Jan Kara <jack(a)suse.cz>
quota: Properly disable quotas when add_dquot_ref() fails
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: roll up loops in DSP setup code for Audigy
hackyzh002 <hackyzh002(a)gmail.com>
drm/radeon: Fix integer overflow in radeon_cs_parser_init
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Eliminate unused umul_ppmm definitions for MIPS
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/dec/prom.h | 2 +-
arch/powerpc/kernel/rtas_flash.c | 6 +-
arch/x86/kernel/fpu/xstate.c | 8 ++
drivers/android/binder.c | 1 +
drivers/android/binder_alloc.c | 6 ++
drivers/android/binder_alloc.h | 1 +
drivers/dma-buf/sw_sync.c | 18 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +
drivers/gpu/drm/radeon/radeon_cs.c | 3 +-
drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c | 2 +
drivers/media/platform/mtk-vpu/mtk_vpu.c | 6 +-
drivers/mmc/host/wbsd.c | 2 -
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_nvm.c | 16 +--
drivers/net/ethernet/intel/igb/igb_ptp.c | 24 ++---
drivers/net/phy/broadcom.c | 13 +++
drivers/net/team/team.c | 4 +-
drivers/pcmcia/rsrc_nonstatic.c | 2 +
drivers/scsi/raid_class.c | 48 ---------
drivers/scsi/snic/snic_disc.c | 3 +-
drivers/tty/serial/8250/8250_port.c | 1 +
drivers/video/fbdev/mmp/hw/mmp_ctrl.c | 4 +-
fs/cifs/file.c | 2 +-
fs/gfs2/super.c | 26 +++--
fs/jfs/jfs_dmap.c | 3 +
fs/jfs/jfs_txnmgr.c | 5 +
fs/jfs/namei.c | 5 +
fs/quota/dquot.c | 5 +-
fs/udf/unicode.c | 2 +-
include/linux/raid_class.h | 4 -
include/linux/virtio_net.h | 4 +
include/media/v4l2-mem2mem.h | 18 +++-
include/net/sock.h | 11 +-
kernel/trace/trace.c | 9 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +
lib/clz_ctz.c | 32 ++----
lib/mpi/longlong.h | 36 +------
lib/test_firmware.c | 39 +++++--
net/batman-adv/bat_v_elp.c | 3 +-
net/batman-adv/bat_v_ogm.c | 7 +-
net/batman-adv/hard-interface.c | 2 +-
net/batman-adv/translation-table.c | 1 -
net/bluetooth/l2cap_core.c | 5 +
net/core/rtnetlink.c | 5 +-
net/core/sock.c | 2 +-
net/ipv4/ip_vti.c | 4 +-
net/ipv4/tcp_timer.c | 4 +-
net/ipv6/ip6_vti.c | 4 +-
net/key/af_key.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 74 +++++++-------
net/netfilter/nf_conntrack_proto_sctp.c | 6 +-
net/netfilter/nft_dynset.c | 3 +
net/sctp/socket.c | 2 +-
net/unix/af_unix.c | 9 +-
net/xfrm/xfrm_user.c | 13 ++-
sound/pci/emu10k1/emufx.c | 112 ++-------------------
sound/soc/codecs/rt5665.c | 2 +
sound/usb/quirks-table.h | 29 ++++++
60 files changed, 325 insertions(+), 351 deletions(-)
From: Linus Torvalds <torvalds(a)linux-foundation.org>
[ upstream commit 5ef64cc8987a9211d3f3667331ba3411a94ddc79 ]
Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.
That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.
It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.
Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to. But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.
There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.
The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.
This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks. And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.
Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@…
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@…
Reported-and-tested-by: Michael Larabel <Michael(a)michaellarabel.com>
Tested-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Chris Mason <clm(a)fb.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
CC: <stable(a)vger.kernel.org> # 5.4
[ mheyne: fixed contextual conflict in mm/filemap.c due to missing
commit c7510ab2cf5c ("mm: abstract out wake_page_match() from
wake_page_function()"). Added WQ_FLAG_CUSTOM due to missing commit
7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem") ]
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
---
include/linux/mm.h | 2 +
include/linux/wait.h | 2 +
kernel/sysctl.c | 8 +++
mm/filemap.c | 160 ++++++++++++++++++++++++++++++++++---------
4 files changed, 141 insertions(+), 31 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d35c29d322d8..d14aba548ff4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -37,6 +37,8 @@ struct user_struct;
struct writeback_control;
struct bdi_writeback;
+extern int sysctl_page_lock_unfairness;
+
void init_mm_internals(void);
#ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7d04c1b588c7..03bff85e365f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -20,6 +20,8 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int
#define WQ_FLAG_EXCLUSIVE 0x01
#define WQ_FLAG_WOKEN 0x02
#define WQ_FLAG_BOOKMARK 0x04
+#define WQ_FLAG_CUSTOM 0x08
+#define WQ_FLAG_DONE 0x10
/*
* A single wait-queue entry structure:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index decabf5714c0..4f85f7ed42fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1563,6 +1563,14 @@ static struct ctl_table vm_table[] = {
.proc_handler = percpu_pagelist_fraction_sysctl_handler,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "page_lock_unfairness",
+ .data = &sysctl_page_lock_unfairness,
+ .maxlen = sizeof(sysctl_page_lock_unfairness),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
#ifdef CONFIG_MMU
{
.procname = "max_map_count",
diff --git a/mm/filemap.c b/mm/filemap.c
index adc27af737c6..f1ed0400c37c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1044,9 +1044,43 @@ struct wait_page_queue {
wait_queue_entry_t wait;
};
+/*
+ * The page wait code treats the "wait->flags" somewhat unusually, because
+ * we have multiple different kinds of waits, not just he usual "exclusive"
+ * one.
+ *
+ * We have:
+ *
+ * (a) no special bits set:
+ *
+ * We're just waiting for the bit to be released, and when a waker
+ * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up,
+ * and remove it from the wait queue.
+ *
+ * Simple and straightforward.
+ *
+ * (b) WQ_FLAG_EXCLUSIVE:
+ *
+ * The waiter is waiting to get the lock, and only one waiter should
+ * be woken up to avoid any thundering herd behavior. We'll set the
+ * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue.
+ *
+ * This is the traditional exclusive wait.
+ *
+ * (b) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM:
+ *
+ * The waiter is waiting to get the bit, and additionally wants the
+ * lock to be transferred to it for fair lock behavior. If the lock
+ * cannot be taken, we stop walking the wait queue without waking
+ * the waiter.
+ *
+ * This is the "fair lock handoff" case, and in addition to setting
+ * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see
+ * that it now has the lock.
+ */
static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *arg)
{
- int ret;
+ unsigned int flags;
struct wait_page_key *key = arg;
struct wait_page_queue *wait_page
= container_of(wait, struct wait_page_queue, wait);
@@ -1059,35 +1093,44 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
return 0;
/*
- * If it's an exclusive wait, we get the bit for it, and
- * stop walking if we can't.
- *
- * If it's a non-exclusive wait, then the fact that this
- * wake function was called means that the bit already
- * was cleared, and we don't care if somebody then
- * re-took it.
+ * If it's a lock handoff wait, we get the bit for it, and
+ * stop walking (and do not wake it up) if we can't.
*/
- ret = 0;
- if (wait->flags & WQ_FLAG_EXCLUSIVE) {
- if (test_and_set_bit(key->bit_nr, &key->page->flags))
+ flags = wait->flags;
+ if (flags & WQ_FLAG_EXCLUSIVE) {
+ if (test_bit(key->bit_nr, &key->page->flags))
return -1;
- ret = 1;
+ if (flags & WQ_FLAG_CUSTOM) {
+ if (test_and_set_bit(key->bit_nr, &key->page->flags))
+ return -1;
+ flags |= WQ_FLAG_DONE;
+ }
}
- wait->flags |= WQ_FLAG_WOKEN;
+ /*
+ * We are holding the wait-queue lock, but the waiter that
+ * is waiting for this will be checking the flags without
+ * any locking.
+ *
+ * So update the flags atomically, and wake up the waiter
+ * afterwards to avoid any races. This store-release pairs
+ * with the load-acquire in wait_on_page_bit_common().
+ */
+ smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
wake_up_state(wait->private, mode);
/*
* Ok, we have successfully done what we're waiting for,
* and we can unconditionally remove the wait entry.
*
- * Note that this has to be the absolute last thing we do,
- * since after list_del_init(&wait->entry) the wait entry
+ * Note that this pairs with the "finish_wait()" in the
+ * waiter, and has to be the absolute last thing we do.
+ * After this list_del_init(&wait->entry) the wait entry
* might be de-allocated and the process might even have
* exited.
*/
list_del_init_careful(&wait->entry);
- return ret;
+ return (flags & WQ_FLAG_EXCLUSIVE) != 0;
}
static void wake_up_page_bit(struct page *page, int bit_nr)
@@ -1167,8 +1210,8 @@ enum behavior {
};
/*
- * Attempt to check (or get) the page bit, and mark the
- * waiter woken if successful.
+ * Attempt to check (or get) the page bit, and mark us done
+ * if successful.
*/
static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
struct wait_queue_entry *wait)
@@ -1179,13 +1222,17 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
} else if (test_bit(bit_nr, &page->flags))
return false;
- wait->flags |= WQ_FLAG_WOKEN;
+ wait->flags |= WQ_FLAG_WOKEN | WQ_FLAG_DONE;
return true;
}
+/* How many times do we accept lock stealing from under a waiter? */
+int sysctl_page_lock_unfairness = 5;
+
static inline int wait_on_page_bit_common(wait_queue_head_t *q,
struct page *page, int bit_nr, int state, enum behavior behavior)
{
+ int unfairness = sysctl_page_lock_unfairness;
struct wait_page_queue wait_page;
wait_queue_entry_t *wait = &wait_page.wait;
bool thrashing = false;
@@ -1203,11 +1250,18 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
}
init_wait(wait);
- wait->flags = behavior == EXCLUSIVE ? WQ_FLAG_EXCLUSIVE : 0;
wait->func = wake_page_function;
wait_page.page = page;
wait_page.bit_nr = bit_nr;
+repeat:
+ wait->flags = 0;
+ if (behavior == EXCLUSIVE) {
+ wait->flags = WQ_FLAG_EXCLUSIVE;
+ if (--unfairness < 0)
+ wait->flags |= WQ_FLAG_CUSTOM;
+ }
+
/*
* Do one last check whether we can get the
* page bit synchronously.
@@ -1230,27 +1284,63 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
/*
* From now on, all the logic will be based on
- * the WQ_FLAG_WOKEN flag, and the and the page
- * bit testing (and setting) will be - or has
- * already been - done by the wake function.
+ * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to
+ * see whether the page bit testing has already
+ * been done by the wake function.
*
* We can drop our reference to the page.
*/
if (behavior == DROP)
put_page(page);
+ /*
+ * Note that until the "finish_wait()", or until
+ * we see the WQ_FLAG_WOKEN flag, we need to
+ * be very careful with the 'wait->flags', because
+ * we may race with a waker that sets them.
+ */
for (;;) {
+ unsigned int flags;
+
set_current_state(state);
- if (signal_pending_state(state, current))
+ /* Loop until we've been woken or interrupted */
+ flags = smp_load_acquire(&wait->flags);
+ if (!(flags & WQ_FLAG_WOKEN)) {
+ if (signal_pending_state(state, current))
+ break;
+
+ io_schedule();
+ continue;
+ }
+
+ /* If we were non-exclusive, we're done */
+ if (behavior != EXCLUSIVE)
break;
- if (wait->flags & WQ_FLAG_WOKEN)
+ /* If the waker got the lock for us, we're done */
+ if (flags & WQ_FLAG_DONE)
break;
- io_schedule();
+ /*
+ * Otherwise, if we're getting the lock, we need to
+ * try to get it ourselves.
+ *
+ * And if that fails, we'll have to retry this all.
+ */
+ if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+ goto repeat;
+
+ wait->flags |= WQ_FLAG_DONE;
+ break;
}
+ /*
+ * If a signal happened, this 'finish_wait()' may remove the last
+ * waiter from the wait-queues, but the PageWaiters bit will remain
+ * set. That's ok. The next wakeup will take care of it, and trying
+ * to do it here would be difficult and prone to races.
+ */
finish_wait(q, wait);
if (thrashing) {
@@ -1260,12 +1350,20 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
}
/*
- * A signal could leave PageWaiters set. Clearing it here if
- * !waitqueue_active would be possible (by open-coding finish_wait),
- * but still fail to catch it in the case of wait hash collision. We
- * already can fail to clear wait hash collision cases, so don't
- * bother with signals either.
+ * NOTE! The wait->flags weren't stable until we've done the
+ * 'finish_wait()', and we could have exited the loop above due
+ * to a signal, and had a wakeup event happen after the signal
+ * test but before the 'finish_wait()'.
+ *
+ * So only after the finish_wait() can we reliably determine
+ * if we got woken up or not, so we can now figure out the final
+ * return value based on that state without races.
+ *
+ * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive
+ * waiter, but an exclusive one requires WQ_FLAG_DONE.
*/
+ if (behavior == EXCLUSIVE)
+ return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR;
return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
}
--
2.40.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
From: Song Shuai <suagrfillet(a)gmail.com>
The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names.
But if page mode is downgraded from kernel cmdline or restricted by
the hardware in 64BIT, it will give a wrong name.
Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD"
that should be "PGD":
0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V
So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it.
Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Signed-off-by: Song Shuai <suagrfillet(a)gmail.com>
Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
arch/riscv/mm/ptdump.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 20a9f991a6d7..e9090b38f811 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -384,6 +384,9 @@ static int __init ptdump_init(void)
kernel_ptd_info.base_addr = KERN_VIRT_START;
+ pg_level[1].name = pgtable_l5_enabled ? "P4D" : "PGD";
+ pg_level[2].name = pgtable_l4_enabled ? "PUD" : "PGD";
+
for (i = 0; i < ARRAY_SIZE(pg_level); i++)
for (j = 0; j < ARRAY_SIZE(pte_bits); j++)
pg_level[i].mask |= pte_bits[j].mask;
--
2.41.0