This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.106-rc1
Will Deacon will@kernel.org KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Eric Dumazet edumazet@google.com wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Waiman Long longman@redhat.com cgroup: Move rcu_head up near the top of cgroup_root
Kees Cook kees@kernel.org binfmt_flat: Fix corruption when not offsetting data start
Andi Shyti andi.shyti@linux.intel.com drm/i915/gem: Adjust vma offset for framebuffer mmap offset
Dan Carpenter dan.carpenter@linaro.org drm/i915: Fix a NULL vs IS_ERR() bug
Nirmoy Das nirmoy.das@intel.com drm/i915: Add a function to mmap framebuffer obj
Yafang Shao laoar.shao@gmail.com cgroup: Make operations on the cgroup root_list RCU safe
Andi Shyti andi.shyti@linux.intel.com drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: fully established after ADD_ADDR echo on MPJ
WangYuli wangyuli@uniontech.com nvme/pci: Add APST quirk for Lenovo N60z laptop
Josef Bacik josef@toxicpanda.com nfsd: make svc_stat per-network namespace instead of global
Josef Bacik josef@toxicpanda.com nfsd: remove nfsd_stats, make th_cnt a global counter
Josef Bacik josef@toxicpanda.com nfsd: make all of the nfsd stats per-network namespace
Josef Bacik josef@toxicpanda.com nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
Josef Bacik josef@toxicpanda.com nfsd: rename NFSD_NET_* to NFSD_STATS_*
Josef Bacik josef@toxicpanda.com sunrpc: use the struct net as the svc proc private
Josef Bacik josef@toxicpanda.com sunrpc: remove ->pg_stats from svc_program
Josef Bacik josef@toxicpanda.com sunrpc: pass in the sv_stats struct through svc_create_pooled
Josef Bacik josef@toxicpanda.com nfsd: stop setting ->pg_stats for unused stats
Josef Bacik josef@toxicpanda.com sunrpc: don't change ->sv_stats if it doesn't exist
Chuck Lever chuck.lever@oracle.com NFSD: Fix frame size warning in svc_export_parse()
Chuck Lever chuck.lever@oracle.com NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor the duplicate reply cache shrinker
Chuck Lever chuck.lever@oracle.com NFSD: Replace nfsd_prune_bucket()
Chuck Lever chuck.lever@oracle.com NFSD: Rename nfsd_reply_cache_alloc()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd_reply_cache_free_locked()
Jeff Layton jlayton@kernel.org nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Jeff Layton jlayton@kernel.org nfsd: move reply cache initialization into nfsd startup
Huacai Chen chenhuacai@kernel.org LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Kees Cook kees@kernel.org exec: Fix ToCToU between perm check and set-uid/gid usage
Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com ASoC: topology: Fix route memory corruption
Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com ASoC: topology: Clean up route loading
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: test both signal & subflow
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: don't try to create sf if alloc failed
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: reduce indentation blocks
Geliang Tang geliang.tang@suse.com mptcp: pass addr to mptcp_pm_alloc_anno_list
-------------
Diffstat:
Makefile | 4 +- arch/arm64/kvm/hyp/pgtable.c | 10 +- arch/loongarch/include/uapi/asm/unistd.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 192 ++++++++++++++++------ drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2 +- drivers/nvme/host/pci.c | 7 + fs/binfmt_flat.c | 4 +- fs/exec.c | 8 +- fs/lockd/svc.c | 3 - fs/nfs/callback.c | 3 - fs/nfsd/export.c | 32 ++-- fs/nfsd/export.h | 4 +- fs/nfsd/netns.h | 25 ++- fs/nfsd/nfs4proc.c | 6 +- fs/nfsd/nfscache.c | 201 ++++++++++++++---------- fs/nfsd/nfsctl.c | 24 ++- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfsfh.c | 3 +- fs/nfsd/nfssvc.c | 24 +-- fs/nfsd/stats.c | 52 +++--- fs/nfsd/stats.h | 83 ++++------ fs/nfsd/trace.h | 22 +++ fs/nfsd/vfs.c | 6 +- include/linux/cgroup-defs.h | 7 +- include/linux/sunrpc/svc.h | 5 +- kernel/cgroup/cgroup-internal.h | 3 +- kernel/cgroup/cgroup.c | 23 ++- net/mptcp/options.c | 3 +- net/mptcp/pm_netlink.c | 49 +++--- net/mptcp/pm_userspace.c | 2 +- net/mptcp/protocol.h | 2 +- net/sunrpc/stats.c | 2 +- net/sunrpc/svc.c | 36 +++-- net/wireless/nl80211.c | 6 +- sound/soc/soc-topology.c | 32 +--- tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 ++ 36 files changed, 555 insertions(+), 346 deletions(-)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit 528cb5f2a1e859522f36f091f29f5c81ec6d4a4c upstream.
Pass addr parameter to mptcp_pm_alloc_anno_list() instead of entry. We can reduce the scope, e.g. in mptcp_pm_alloc_anno_list(), we only access "entry->addr", we can then restrict to the pointer to "addr" then.
Signed-off-by: Geliang Tang geliang.tang@suse.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: c95eb32ced82 ("mptcp: pm: reduce indentation blocks") Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 8 ++++---- net/mptcp/pm_userspace.c | 2 +- net/mptcp/protocol.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -352,7 +352,7 @@ mptcp_pm_del_add_timer(struct mptcp_sock }
bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, - const struct mptcp_pm_addr_entry *entry) + const struct mptcp_addr_info *addr) { struct mptcp_pm_add_entry *add_entry = NULL; struct sock *sk = (struct sock *)msk; @@ -360,7 +360,7 @@ bool mptcp_pm_alloc_anno_list(struct mpt
lockdep_assert_held(&msk->pm.lock);
- add_entry = mptcp_lookup_anno_list_by_saddr(msk, &entry->addr); + add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
if (add_entry) { if (mptcp_pm_is_kernel(msk)) @@ -377,7 +377,7 @@ bool mptcp_pm_alloc_anno_list(struct mpt
list_add(&add_entry->list, &msk->pm.anno_list);
- add_entry->addr = entry->addr; + add_entry->addr = *addr; add_entry->sock = msk; add_entry->retrans_times = 0;
@@ -580,7 +580,7 @@ static void mptcp_pm_create_subflow_or_s return;
if (local) { - if (mptcp_pm_alloc_anno_list(msk, local)) { + if (mptcp_pm_alloc_anno_list(msk, &local->addr)) { __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); --- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -225,7 +225,7 @@ int mptcp_nl_cmd_announce(struct sk_buff lock_sock((struct sock *)msk); spin_lock_bh(&msk->pm.lock);
- if (mptcp_pm_alloc_anno_list(msk, &addr_val)) { + if (mptcp_pm_alloc_anno_list(msk, &addr_val.addr)) { msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &addr_val.addr, false); mptcp_pm_nl_addr_send_ack(msk); --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -812,7 +812,7 @@ int mptcp_pm_nl_mp_prio_send_ack(struct struct mptcp_addr_info *rem, u8 bkup); bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, - const struct mptcp_pm_addr_entry *entry); + const struct mptcp_addr_info *addr); void mptcp_pm_free_anno_list(struct mptcp_sock *msk); bool mptcp_pm_sport_in_anno_list(struct mptcp_sock *msk, const struct sock *sk); struct mptcp_pm_add_entry *
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit c95eb32ced823a00be62202b43966b07b2f20b7f upstream.
That will simplify the following commits.
No functional changes intended.
Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-s... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: cd7c957f936f ("mptcp: pm: don't try to create sf if alloc failed") Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -579,16 +579,19 @@ static void mptcp_pm_create_subflow_or_s if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return;
- if (local) { - if (mptcp_pm_alloc_anno_list(msk, &local->addr)) { - __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); - msk->pm.add_addr_signaled++; - mptcp_pm_announce_addr(msk, &local->addr, false); - mptcp_pm_nl_addr_send_ack(msk); - } - } + if (!local) + goto subflow; + + if (!mptcp_pm_alloc_anno_list(msk, &local->addr)) + goto subflow; + + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + msk->pm.add_addr_signaled++; + mptcp_pm_announce_addr(msk, &local->addr, false); + mptcp_pm_nl_addr_send_ack(msk); }
+subflow: /* check if should create a new subflow */ while (msk->pm.local_addr_used < local_addr_max && msk->pm.subflows < subflows_max) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit cd7c957f936f8cb80d03e5152f4013aae65bd986 upstream.
It sounds better to avoid wasting cycles and / or put extreme memory pressure on the system by trying to create new subflows if it was not possible to add a new item in the announce list.
While at it, a warning is now printed if the entry was already in the list as it should not happen with the in-kernel path-manager. With this PM, mptcp_pm_alloc_anno_list() should only fail in case of memory pressure.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-s... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -363,7 +363,7 @@ bool mptcp_pm_alloc_anno_list(struct mpt add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
if (add_entry) { - if (mptcp_pm_is_kernel(msk)) + if (WARN_ON_ONCE(mptcp_pm_is_kernel(msk))) return false;
sk_reset_timer(sk, &add_entry->add_timer, @@ -567,8 +567,6 @@ static void mptcp_pm_create_subflow_or_s
/* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) { - local = select_signal_address(pernet, msk); - /* due to racing events on both ends we can reach here while * previous add address is still running: if we invoke now * mptcp_pm_announce_addr(), that will fail and the @@ -579,11 +577,15 @@ static void mptcp_pm_create_subflow_or_s if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return;
+ local = select_signal_address(pernet, msk); if (!local) goto subflow;
+ /* If the alloc fails, we are on memory pressure, not worth + * continuing, and trying to create subflows. + */ if (!mptcp_pm_alloc_anno_list(msk, &local->addr)) - goto subflow; + return;
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit 85df533a787bf07bf4367ce2a02b822ff1fba1a3 upstream.
Up to the 'Fixes' commit, having an endpoint with both the 'signal' and 'subflow' flags, resulted in the creation of a subflow and an address announcement using the address linked to this endpoint. After this commit, only the address announcement was done, ignoring the 'subflow' flag.
That's because the same bitmap is used for the two flags. It is OK to keep this single bitmap, the already selected local endpoint simply have to be re-used, but not via select_local_address() not to look at the just modified bitmap.
Note that it is unusual to set the two flags together: creating a new subflow using a new local address will implicitly advertise it to the other peer. So in theory, no need to advertise it explicitly as well. Maybe there are use-cases -- the subflow might not reach the other peer that way, we can ask the other peer to try initiating the new subflow without delay -- or very likely the user is confused, and put both flags "just to be sure at least the right one is set". Still, if it is allowed, the kernel should do what has been asked: using this endpoint to announce the address and to create a new subflow from it.
An alternative is to forbid the use of the two flags together, but that's probably too late, there are maybe use-cases, and it was working before. This patch will avoid people complaining subflows are not created using the endpoint they added with the 'subflow' and 'signal' flag.
Note that with the current patch, the subflow might not be created in some corner cases, e.g. if the 'subflows' limit was reached when sending the ADD_ADDR, but changed later on. It is probably not worth splitting id_avail_bitmap per target ('signal', 'subflow'), which will add another large field to the msk "just" to track (again) endpoints. Anyway, currently when the limits are changed, the kernel doesn't check if new subflows can be created or removed, because we would need to keep track of the received ADD_ADDR, and more. It sounds OK to assume that the limits should be properly configured before establishing new connections.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-s... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -524,8 +524,8 @@ __lookup_addr(struct pm_nl_pernet *perne
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { + struct mptcp_pm_addr_entry *local, *signal_and_subflow = NULL; struct sock *sk = (struct sock *)msk; - struct mptcp_pm_addr_entry *local; unsigned int add_addr_signal_max; unsigned int local_addr_max; struct pm_nl_pernet *pernet; @@ -591,6 +591,9 @@ static void mptcp_pm_create_subflow_or_s msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); mptcp_pm_nl_addr_send_ack(msk); + + if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + signal_and_subflow = local; }
subflow: @@ -601,9 +604,14 @@ subflow: bool fullmesh; int i, nr;
- local = select_local_address(pernet, msk); - if (!local) - break; + if (signal_and_subflow) { + local = signal_and_subflow; + signal_and_subflow = NULL; + } else { + local = select_local_address(pernet, msk); + if (!local) + break; + }
fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit 4d2868b5d191c74262f7407972d68d1bf3245d6a upstream.
It should be quite uncommon to set both the subflow and the signal flags: the initiator of the connection is typically the one creating new subflows, not the other peer, then no need to announce additional local addresses, and use it to create subflows.
But some people might be confused about the flags, and set both "just to be sure at least the right one is set". To verify the previous fix, and avoid future regressions, this specific case is now validated: the client announces a new address, and initiates a new subflow from the same address.
While working on this, another bug has been noticed, where the client reset the new subflow because an ADD_ADDR echo got received as the 3rd ACK: this new test also explicitly checks that no RST have been sent by the client and server.
The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-s... Signed-off-by: Jakub Kicinski kuba@kernel.org [ No conflicts, but not using 'chk_add_nr 1 1 0 invert': in this version, 'chk_add_nr' cannot be used with 'invert': d73bb9d3957b ("selftests: mptcp: join: ability to invert ADD_ADDR check") is not in this version, and backporting it causes a lot of conflicts. That's fine, checking that there is an additional subflow should be enough. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2090,6 +2090,20 @@ signal_address_tests() chk_add_nr 1 1 fi
+ # uncommon: subflow and signal flags on the same endpoint + # or because the user wrongly picked both, but still expects the client + # to create additional subflows + if reset "subflow and signal together"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 0 2 + pm_nl_add_endpoint $ns2 10.0.3.2 flags signal,subflow + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 0 0 0 # none initiated by ns1 + chk_rst_nr 0 0 invert # no RST sent by the client + chk_rst_nr 0 0 # no RST sent by the server + fi + # accept and use add_addr with additional subflows if reset "multiple subflows and signal"; then pm_nl_set_limits $ns1 0 3
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com
commit e0e7bc2cbee93778c4ad7d9a792d425ffb5af6f7 upstream.
Instead of using very long macro name, assign it to shorter variable and use it instead. While doing that, we can reduce multiple if checks using this define to one.
Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Signed-off-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Link: https://lore.kernel.org/r/20240603102818.36165-5-amadeuszx.slawinski@linux.i... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/soc-topology.c | 26 ++++++++------------------ 1 file changed, 8 insertions(+), 18 deletions(-)
--- a/sound/soc/soc-topology.c +++ b/sound/soc/soc-topology.c @@ -1062,6 +1062,7 @@ static int soc_tplg_dapm_graph_elems_loa struct snd_soc_tplg_hdr *hdr) { struct snd_soc_dapm_context *dapm = &tplg->comp->dapm; + const size_t maxlen = SNDRV_CTL_ELEM_ID_NAME_MAXLEN; struct snd_soc_tplg_dapm_graph_elem *elem; struct snd_soc_dapm_route *route; int count, i; @@ -1085,38 +1086,27 @@ static int soc_tplg_dapm_graph_elems_loa tplg->pos += sizeof(struct snd_soc_tplg_dapm_graph_elem);
/* validate routes */ - if (strnlen(elem->source, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) == - SNDRV_CTL_ELEM_ID_NAME_MAXLEN) { - ret = -EINVAL; - break; - } - if (strnlen(elem->sink, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) == - SNDRV_CTL_ELEM_ID_NAME_MAXLEN) { - ret = -EINVAL; - break; - } - if (strnlen(elem->control, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) == - SNDRV_CTL_ELEM_ID_NAME_MAXLEN) { + if ((strnlen(elem->source, maxlen) == maxlen) || + (strnlen(elem->sink, maxlen) == maxlen) || + (strnlen(elem->control, maxlen) == maxlen)) { ret = -EINVAL; break; }
route->source = devm_kmemdup(tplg->dev, elem->source, - min(strlen(elem->source), - SNDRV_CTL_ELEM_ID_NAME_MAXLEN), + min(strlen(elem->source), maxlen), GFP_KERNEL); route->sink = devm_kmemdup(tplg->dev, elem->sink, - min(strlen(elem->sink), SNDRV_CTL_ELEM_ID_NAME_MAXLEN), + min(strlen(elem->sink), maxlen), GFP_KERNEL); if (!route->source || !route->sink) { ret = -ENOMEM; break; }
- if (strnlen(elem->control, SNDRV_CTL_ELEM_ID_NAME_MAXLEN) != 0) { + if (strnlen(elem->control, maxlen) != 0) { route->control = devm_kmemdup(tplg->dev, elem->control, - min(strlen(elem->control), - SNDRV_CTL_ELEM_ID_NAME_MAXLEN), + min(strlen(elem->control), maxlen), GFP_KERNEL); if (!route->control) { ret = -ENOMEM;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com
commit 0298f51652be47b79780833e0b63194e1231fa34 upstream.
It was reported that recent fix for memory corruption during topology load, causes corruption in other cases. Instead of being overeager with checking topology, assume that it is properly formatted and just duplicate strings.
Reported-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Closes: https://lore.kernel.org/linux-sound/171812236450.201359.3019210915105428447.... Suggested-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Signed-off-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Link: https://lore.kernel.org/r/20240613090126.841189-1-amadeuszx.slawinski@linux.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/soc-topology.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
--- a/sound/soc/soc-topology.c +++ b/sound/soc/soc-topology.c @@ -1093,21 +1093,15 @@ static int soc_tplg_dapm_graph_elems_loa break; }
- route->source = devm_kmemdup(tplg->dev, elem->source, - min(strlen(elem->source), maxlen), - GFP_KERNEL); - route->sink = devm_kmemdup(tplg->dev, elem->sink, - min(strlen(elem->sink), maxlen), - GFP_KERNEL); + route->source = devm_kstrdup(tplg->dev, elem->source, GFP_KERNEL); + route->sink = devm_kstrdup(tplg->dev, elem->sink, GFP_KERNEL); if (!route->source || !route->sink) { ret = -ENOMEM; break; }
if (strnlen(elem->control, maxlen) != 0) { - route->control = devm_kmemdup(tplg->dev, elem->control, - min(strlen(elem->control), maxlen), - GFP_KERNEL); + route->control = devm_kstrdup(tplg->dev, elem->control, GFP_KERNEL); if (!route->control) { ret = -ENOMEM; break;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
commit f50733b45d865f91db90919f8311e2127ce5a0cb upstream.
When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been disallowed.
While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root".
Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti mvanotti@google.com Tested-by: Marco Vanotti mvanotti@google.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Cc: Eric Biederman ebiederm@xmission.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exec.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/exec.c +++ b/fs/exec.c @@ -1603,6 +1603,7 @@ static void bprm_fill_uid(struct linux_b unsigned int mode; kuid_t uid; kgid_t gid; + int err;
if (!mnt_may_suid(file->f_path.mnt)) return; @@ -1619,12 +1620,17 @@ static void bprm_fill_uid(struct linux_b /* Be careful if suid/sgid is set */ inode_lock(inode);
- /* reload atomically mode/uid/gid now that lock held */ + /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; uid = i_uid_into_mnt(mnt_userns, inode); gid = i_gid_into_mnt(mnt_userns, inode); + err = inode_permission(mnt_userns, inode, MAY_EXEC); inode_unlock(inode);
+ /* Did the exec bit vanish out from under us? Give up. */ + if (err) + return; + /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!kuid_has_mapping(bprm->cred->user_ns, uid) || !kgid_has_mapping(bprm->cred->user_ns, gid))
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 7697a0fe0154468f5df35c23ebd7aa48994c2cdc upstream.
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the LoongArch port was under review, people were aware of the same problem with sandboxing clone3 [3], so clone was eventually kept. Unfortunately it seemed at that time no one had noticed statx, so besides restoring fstat/newfstatat to LoongArch uapi (and postponing the problem further), it seems inevitable that we would need to tackle seccomp deep argument inspection.
However, this is obviously a decision that shouldn't be taken lightly, so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT in unistd.h. This is the simplest solution for now, and so we hope the community will tackle the long-standing problem of seccomp deep argument inspection in the future [4][5].
Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to upstream asm-generic changes.
More infomation please reading this thread [6].
[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150 [2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/... [3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.... [4] https://lwn.net/Articles/799557/ [5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-in... [6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@br...
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/include/uapi/asm/unistd.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/loongarch/include/uapi/asm/unistd.h +++ b/arch/loongarch/include/uapi/asm/unistd.h @@ -1,4 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#define __ARCH_WANT_NEW_STAT #define __ARCH_WANT_SYS_CLONE #define __ARCH_WANT_SYS_CLONE3
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit f5f9d4a314da88c0a5faa6d168bf69081b7a25ae ]
There's no need to start the reply cache before nfsd is up and running, and doing so means that we register a shrinker for every net namespace instead of just the ones where nfsd is running.
Move it to the per-net nfsd startup instead.
Reported-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Jeff Layton jlayton@kernel.org Stable-dep-of: ed9ab7346e90 ("nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsctl.c | 8 -------- fs/nfsd/nfssvc.c | 10 +++++++++- 2 files changed, 9 insertions(+), 9 deletions(-)
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1453,16 +1453,11 @@ static __net_init int nfsd_init_net(stru nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; nfsd4_init_leases_net(nn); - retval = nfsd_reply_cache_init(nn); - if (retval) - goto out_cache_error; get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock);
return 0;
-out_cache_error: - nfsd_idmap_shutdown(net); out_idmap_error: nfsd_export_shutdown(net); out_export_error: @@ -1471,9 +1466,6 @@ out_export_error:
static __net_exit void nfsd_exit_net(struct net *net) { - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - nfsd_reply_cache_shutdown(nn); nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -427,16 +427,23 @@ static int nfsd_startup_net(struct net * ret = nfsd_file_cache_start_net(net); if (ret) goto out_lockd; - ret = nfs4_state_start_net(net); + + ret = nfsd_reply_cache_init(nn); if (ret) goto out_filecache;
+ ret = nfs4_state_start_net(net); + if (ret) + goto out_reply_cache; + #ifdef CONFIG_NFSD_V4_2_INTER_SSC nfsd4_ssc_init_umount_work(nn); #endif nn->nfsd_net_up = true; return 0;
+out_reply_cache: + nfsd_reply_cache_shutdown(nn); out_filecache: nfsd_file_cache_shutdown_net(net); out_lockd: @@ -454,6 +461,7 @@ static void nfsd_shutdown_net(struct net struct nfsd_net *nn = net_generic(net, nfsd_net_id);
nfs4_state_shutdown_net(net); + nfsd_reply_cache_shutdown(nn); nfsd_file_cache_shutdown_net(net); if (nn->lockd_up) { lockd_down(net);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit ed9ab7346e908496816cffdecd46932035f66e2e ]
Commit f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache into nfsd startup, but didn't account for the stats counters, which can be accessed before nfsd is ever started. The result can be a NULL pointer dereference when someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still shut down.
This is a regression and a user-triggerable oops in the right situation:
- non-x86_64 arch - /proc/fs/nfsd is mounted in the namespace - nfsd is not started in the namespace - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats"
Although this is easy to trigger on some arches (like aarch64), on x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the fixed_percpu_data. That struct looks just enough like a newly initialized percpu var to allow nfsd_reply_cache_stats_show to access it without Oopsing.
Move the initialization of the per-net+per-cpu reply-cache counters back into nfsd_init_net, while leaving the rest of the reply cache allocations to be done at nfsd startup time.
Kudos to Eirik who did most of the legwork to track this down.
Cc: stable@vger.kernel.org # v6.3+ Fixes: f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") Reported-and-tested-by: Eirik Fuller efuller@redhat.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429 Signed-off-by: Jeff Layton jlayton@kernel.org Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/cache.h | 2 ++ fs/nfsd/nfscache.c | 25 ++++++++++++++----------- fs/nfsd/nfsctl.c | 10 +++++++++- 3 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/nfsd/cache.h +++ b/fs/nfsd/cache.h @@ -80,6 +80,8 @@ enum {
int nfsd_drc_slab_create(void); void nfsd_drc_slab_free(void); +int nfsd_net_reply_cache_init(struct nfsd_net *nn); +void nfsd_net_reply_cache_destroy(struct nfsd_net *nn); int nfsd_reply_cache_init(struct nfsd_net *); void nfsd_reply_cache_shutdown(struct nfsd_net *); int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start, --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -148,12 +148,23 @@ void nfsd_drc_slab_free(void) kmem_cache_destroy(drc_slab); }
-static int nfsd_reply_cache_stats_init(struct nfsd_net *nn) +/** + * nfsd_net_reply_cache_init - per net namespace reply cache set-up + * @nn: nfsd_net being initialized + * + * Returns zero on succes; otherwise a negative errno is returned. + */ +int nfsd_net_reply_cache_init(struct nfsd_net *nn) { return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM); }
-static void nfsd_reply_cache_stats_destroy(struct nfsd_net *nn) +/** + * nfsd_net_reply_cache_destroy - per net namespace reply cache tear-down + * @nn: nfsd_net being freed + * + */ +void nfsd_net_reply_cache_destroy(struct nfsd_net *nn) { nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM); } @@ -169,17 +180,13 @@ int nfsd_reply_cache_init(struct nfsd_ne hashsize = nfsd_hashsize(nn->max_drc_entries); nn->maskbits = ilog2(hashsize);
- status = nfsd_reply_cache_stats_init(nn); - if (status) - goto out_nomem; - nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan; nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count; nn->nfsd_reply_cache_shrinker.seeks = 1; status = register_shrinker(&nn->nfsd_reply_cache_shrinker, "nfsd-reply:%s", nn->nfsd_name); if (status) - goto out_stats_destroy; + return status;
nn->drc_hashtbl = kvzalloc(array_size(hashsize, sizeof(*nn->drc_hashtbl)), GFP_KERNEL); @@ -195,9 +202,6 @@ int nfsd_reply_cache_init(struct nfsd_ne return 0; out_shrinker: unregister_shrinker(&nn->nfsd_reply_cache_shrinker); -out_stats_destroy: - nfsd_reply_cache_stats_destroy(nn); -out_nomem: printk(KERN_ERR "nfsd: failed to allocate reply cache\n"); return -ENOMEM; } @@ -217,7 +221,6 @@ void nfsd_reply_cache_shutdown(struct nf rp, nn); } } - nfsd_reply_cache_stats_destroy(nn);
kvfree(nn->drc_hashtbl); nn->drc_hashtbl = NULL; --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1450,6 +1450,9 @@ static __net_init int nfsd_init_net(stru retval = nfsd_idmap_init(net); if (retval) goto out_idmap_error; + retval = nfsd_net_reply_cache_init(nn); + if (retval) + goto out_repcache_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; nfsd4_init_leases_net(nn); @@ -1458,6 +1461,8 @@ static __net_init int nfsd_init_net(stru
return 0;
+out_repcache_error: + nfsd_idmap_shutdown(net); out_idmap_error: nfsd_export_shutdown(net); out_export_error: @@ -1466,9 +1471,12 @@ out_export_error:
static __net_exit void nfsd_exit_net(struct net *net) { + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + nfsd_net_reply_cache_destroy(nn); nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); - nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); + nfsd_netns_free_versions(nn); }
static struct pernet_operations nfsd_net_ops = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 35308e7f0fc3942edc87d9c6dc78c4a096428957 ]
To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock).
For readability, rename the helpers nfsd_cacherep_<verb>.
Reviewed-by: Jeff Layton jlayton@kernel.org Stable-dep-of: a9507f6af145 ("NFSD: Replace nfsd_prune_bucket()") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfscache.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-)
--- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -110,21 +110,33 @@ nfsd_reply_cache_alloc(struct svc_rqst * return rp; }
+static void nfsd_cacherep_free(struct svc_cacherep *rp) +{ + if (rp->c_type == RC_REPLBUFF) + kfree(rp->c_replvec.iov_base); + kmem_cache_free(drc_slab, rp); +} + static void -nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp, - struct nfsd_net *nn) +nfsd_cacherep_unlink_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b, + struct svc_cacherep *rp) { - if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) { + if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) nfsd_stats_drc_mem_usage_sub(nn, rp->c_replvec.iov_len); - kfree(rp->c_replvec.iov_base); - } if (rp->c_state != RC_UNUSED) { rb_erase(&rp->c_node, &b->rb_head); list_del(&rp->c_lru); atomic_dec(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_sub(nn, sizeof(*rp)); } - kmem_cache_free(drc_slab, rp); +} + +static void +nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp, + struct nfsd_net *nn) +{ + nfsd_cacherep_unlink_locked(nn, b, rp); + nfsd_cacherep_free(rp); }
static void @@ -132,8 +144,9 @@ nfsd_reply_cache_free(struct nfsd_drc_bu struct nfsd_net *nn) { spin_lock(&b->cache_lock); - nfsd_reply_cache_free_locked(b, rp, nn); + nfsd_cacherep_unlink_locked(nn, b, rp); spin_unlock(&b->cache_lock); + nfsd_cacherep_free(rp); }
int nfsd_drc_slab_create(void)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ff0d169329768c1102b7b07eebe5a9839aa1c143 ]
For readability, rename to match the other helpers.
Reviewed-by: Jeff Layton jlayton@kernel.org Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfscache.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -85,8 +85,8 @@ nfsd_hashsize(unsigned int limit) }
static struct svc_cacherep * -nfsd_reply_cache_alloc(struct svc_rqst *rqstp, __wsum csum, - struct nfsd_net *nn) +nfsd_cacherep_alloc(struct svc_rqst *rqstp, __wsum csum, + struct nfsd_net *nn) { struct svc_cacherep *rp;
@@ -481,7 +481,7 @@ int nfsd_cache_lookup(struct svc_rqst *r * preallocate an entry. */ nn = net_generic(SVC_NET(rqstp), nfsd_net_id); - rp = nfsd_reply_cache_alloc(rqstp, csum, nn); + rp = nfsd_cacherep_alloc(rqstp, csum, nn); if (!rp) goto out;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a9507f6af1450ed26a4a36d979af518f5bb21e5d ]
Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache.
A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held.
Reviewed-by: Jeff Layton jlayton@kernel.org Stable-dep-of: c135e1269f34 ("NFSD: Refactor the duplicate reply cache shrinker") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfscache.c | 78 ++++++++++++++++++++++++++++++++++++++++++----------- fs/nfsd/trace.h | 22 ++++++++++++++ 2 files changed, 85 insertions(+), 15 deletions(-)
--- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -117,6 +117,21 @@ static void nfsd_cacherep_free(struct sv kmem_cache_free(drc_slab, rp); }
+static unsigned long +nfsd_cacherep_dispose(struct list_head *dispose) +{ + struct svc_cacherep *rp; + unsigned long freed = 0; + + while (!list_empty(dispose)) { + rp = list_first_entry(dispose, struct svc_cacherep, c_lru); + list_del(&rp->c_lru); + nfsd_cacherep_free(rp); + freed++; + } + return freed; +} + static void nfsd_cacherep_unlink_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b, struct svc_cacherep *rp) @@ -260,6 +275,41 @@ nfsd_cache_bucket_find(__be32 xid, struc return &nn->drc_hashtbl[hash]; }
+/* + * Remove and return no more than @max expired entries in bucket @b. + * If @max is zero, do not limit the number of removed entries. + */ +static void +nfsd_prune_bucket_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b, + unsigned int max, struct list_head *dispose) +{ + unsigned long expiry = jiffies - RC_EXPIRE; + struct svc_cacherep *rp, *tmp; + unsigned int freed = 0; + + lockdep_assert_held(&b->cache_lock); + + /* The bucket LRU is ordered oldest-first. */ + list_for_each_entry_safe(rp, tmp, &b->lru_head, c_lru) { + /* + * Don't free entries attached to calls that are still + * in-progress, but do keep scanning the list. + */ + if (rp->c_state == RC_INPROG) + continue; + + if (atomic_read(&nn->num_drc_entries) <= nn->max_drc_entries && + time_before(expiry, rp->c_timestamp)) + break; + + nfsd_cacherep_unlink_locked(nn, b, rp); + list_add(&rp->c_lru, dispose); + + if (max && ++freed > max) + break; + } +} + static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, unsigned int max) { @@ -283,11 +333,6 @@ static long prune_bucket(struct nfsd_drc return freed; }
-static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) -{ - return prune_bucket(b, nn, 3); -} - /* * Walk the LRU list and prune off entries that are older than RC_EXPIRE. * Also prune the oldest ones when the total exceeds the max number of entries. @@ -466,6 +511,8 @@ int nfsd_cache_lookup(struct svc_rqst *r __wsum csum; struct nfsd_drc_bucket *b; int type = rqstp->rq_cachetype; + unsigned long freed; + LIST_HEAD(dispose); int rtn = RC_DOIT;
rqstp->rq_cacherep = NULL; @@ -490,20 +537,18 @@ int nfsd_cache_lookup(struct svc_rqst *r found = nfsd_cache_insert(b, rp, nn); if (found != rp) goto found_entry; - - nfsd_stats_rc_misses_inc(); rqstp->rq_cacherep = rp; rp->c_state = RC_INPROG; + nfsd_prune_bucket_locked(nn, b, 3, &dispose); + spin_unlock(&b->cache_lock);
+ freed = nfsd_cacherep_dispose(&dispose); + trace_nfsd_drc_gc(nn, freed); + + nfsd_stats_rc_misses_inc(); atomic_inc(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); - - nfsd_prune_bucket(b, nn); - -out_unlock: - spin_unlock(&b->cache_lock); -out: - return rtn; + goto out;
found_entry: /* We found a matching entry which is either in progress or done. */ @@ -541,7 +586,10 @@ found_entry:
out_trace: trace_nfsd_drc_found(nn, rqstp, rtn); - goto out_unlock; +out_unlock: + spin_unlock(&b->cache_lock); +out: + return rtn; }
/** --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1261,6 +1261,28 @@ TRACE_EVENT(nfsd_drc_mismatch, __entry->ingress) );
+TRACE_EVENT_CONDITION(nfsd_drc_gc, + TP_PROTO( + const struct nfsd_net *nn, + unsigned long freed + ), + TP_ARGS(nn, freed), + TP_CONDITION(freed > 0), + TP_STRUCT__entry( + __field(unsigned long long, boot_time) + __field(unsigned long, freed) + __field(int, total) + ), + TP_fast_assign( + __entry->boot_time = nn->boot_time; + __entry->freed = freed; + __entry->total = atomic_read(&nn->num_drc_entries); + ), + TP_printk("boot_time=%16llx total=%d freed=%lu", + __entry->boot_time, __entry->total, __entry->freed + ) +); + TRACE_EVENT(nfsd_cb_args, TP_PROTO( const struct nfs4_client *clp,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c135e1269f34dfdea4bd94c11060c83a3c0b3c12 ]
Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness.
Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v6.1.y -- this one might not be necessary ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfscache.c | 83 ++++++++++++++++++++++++----------------------------- 1 file changed, 39 insertions(+), 44 deletions(-)
--- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -310,67 +310,62 @@ nfsd_prune_bucket_locked(struct nfsd_net } }
-static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, - unsigned int max) +/** + * nfsd_reply_cache_count - count_objects method for the DRC shrinker + * @shrink: our registered shrinker context + * @sc: garbage collection parameters + * + * Returns the total number of entries in the duplicate reply cache. To + * keep things simple and quick, this is not the number of expired entries + * in the cache (ie, the number that would be removed by a call to + * nfsd_reply_cache_scan). + */ +static unsigned long +nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc) { - struct svc_cacherep *rp, *tmp; - long freed = 0; + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_reply_cache_shrinker);
- list_for_each_entry_safe(rp, tmp, &b->lru_head, c_lru) { - /* - * Don't free entries attached to calls that are still - * in-progress, but do keep scanning the list. - */ - if (rp->c_state == RC_INPROG) - continue; - if (atomic_read(&nn->num_drc_entries) <= nn->max_drc_entries && - time_before(jiffies, rp->c_timestamp + RC_EXPIRE)) - break; - nfsd_reply_cache_free_locked(b, rp, nn); - if (max && freed++ > max) - break; - } - return freed; + return atomic_read(&nn->num_drc_entries); }
-/* - * Walk the LRU list and prune off entries that are older than RC_EXPIRE. - * Also prune the oldest ones when the total exceeds the max number of entries. +/** + * nfsd_reply_cache_scan - scan_objects method for the DRC shrinker + * @shrink: our registered shrinker context + * @sc: garbage collection parameters + * + * Free expired entries on each bucket's LRU list until we've released + * nr_to_scan freed objects. Nothing will be released if the cache + * has not exceeded it's max_drc_entries limit. + * + * Returns the number of entries released by this call. */ -static long -prune_cache_entries(struct nfsd_net *nn) +static unsigned long +nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc) { + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_reply_cache_shrinker); + unsigned long freed = 0; + LIST_HEAD(dispose); unsigned int i; - long freed = 0;
for (i = 0; i < nn->drc_hashsize; i++) { struct nfsd_drc_bucket *b = &nn->drc_hashtbl[i];
if (list_empty(&b->lru_head)) continue; + spin_lock(&b->cache_lock); - freed += prune_bucket(b, nn, 0); + nfsd_prune_bucket_locked(nn, b, 0, &dispose); spin_unlock(&b->cache_lock); - } - return freed; -} - -static unsigned long -nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc) -{ - struct nfsd_net *nn = container_of(shrink, - struct nfsd_net, nfsd_reply_cache_shrinker);
- return atomic_read(&nn->num_drc_entries); -} - -static unsigned long -nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - struct nfsd_net *nn = container_of(shrink, - struct nfsd_net, nfsd_reply_cache_shrinker); + freed += nfsd_cacherep_dispose(&dispose); + if (freed > sc->nr_to_scan) + break; + }
- return prune_cache_entries(nn); + trace_nfsd_drc_gc(nn, freed); + return freed; }
/**
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5ec39944f874e1ecc09f624a70dfaa8ac3bf9d08 ]
In function ‘export_stats_init’, inlined from ‘svc_export_alloc’ at fs/nfsd/export.c:866:6: fs/nfsd/export.c:337:16: warning: ‘nfsd_percpu_counters_init’ accessing 40 bytes in a region of size 0 [-Wstringop-overflow=] 337 | return nfsd_percpu_counters_init(&stats->counter, EXP_STATS_COUNTERS_NUM); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/nfsd/export.c:337:16: note: referencing argument 1 of type ‘struct percpu_counter[0]’ fs/nfsd/stats.h: In function ‘svc_export_alloc’: fs/nfsd/stats.h:40:5: note: in a call to function ‘nfsd_percpu_counters_init’ 40 | int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); | ^~~~~~~~~~~~~~~~~~~~~~~~~
Cc: Amir Goldstein amir73il@gmail.com Reviewed-by: Jeff Layton jlayton@kernel.org Stable-dep-of: 93483ac5fec6 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/stats.c | 2 +- fs/nfsd/stats.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -74,7 +74,7 @@ static int nfsd_show(struct seq_file *se
DEFINE_PROC_SHOW_ATTRIBUTE(nfsd);
-int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) +int nfsd_percpu_counters_init(struct percpu_counter *counters, int num) { int i, err = 0;
--- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -36,9 +36,9 @@ extern struct nfsd_stats nfsdstats;
extern struct svc_stat nfsd_svcstats;
-int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); -void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num); -void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num); +int nfsd_percpu_counters_init(struct percpu_counter *counters, int num); +void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num); +void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num); int nfsd_stat_init(void); void nfsd_stat_shutdown(void);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6939ace1f22681fface7841cdbf34d3204cc94b5 ]
fs/nfsd/export.c: In function 'svc_export_parse': fs/nfsd/export.c:737:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] 737 | }
On my systems, svc_export_parse() has a stack frame of over 800 bytes, not 1040, but nonetheless, it could do with some reduction.
When a struct svc_export is on the stack, it's a temporary structure used as an argument, and not visible as an actual exported FS. No need to reserve space for export_stats in such cases.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202310012359.YEw5IrK6-lkp@intel.com/ Cc: Amir Goldstein amir73il@gmail.com Reviewed-by: Jeff Layton jlayton@kernel.org Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/export.c | 32 +++++++++++++++++++++++--------- fs/nfsd/export.h | 4 ++-- fs/nfsd/stats.h | 12 ++++++------ 3 files changed, 31 insertions(+), 17 deletions(-)
--- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -339,12 +339,16 @@ static int export_stats_init(struct expo
static void export_stats_reset(struct export_stats *stats) { - nfsd_percpu_counters_reset(stats->counter, EXP_STATS_COUNTERS_NUM); + if (stats) + nfsd_percpu_counters_reset(stats->counter, + EXP_STATS_COUNTERS_NUM); }
static void export_stats_destroy(struct export_stats *stats) { - nfsd_percpu_counters_destroy(stats->counter, EXP_STATS_COUNTERS_NUM); + if (stats) + nfsd_percpu_counters_destroy(stats->counter, + EXP_STATS_COUNTERS_NUM); }
static void svc_export_put(struct kref *ref) @@ -353,7 +357,8 @@ static void svc_export_put(struct kref * path_put(&exp->ex_path); auth_domain_put(exp->ex_client); nfsd4_fslocs_free(&exp->ex_fslocs); - export_stats_destroy(&exp->ex_stats); + export_stats_destroy(exp->ex_stats); + kfree(exp->ex_stats); kfree(exp->ex_uuid); kfree_rcu(exp, ex_rcu); } @@ -744,13 +749,15 @@ static int svc_export_show(struct seq_fi seq_putc(m, '\t'); seq_escape(m, exp->ex_client->name, " \t\n\"); if (export_stats) { - seq_printf(m, "\t%lld\n", exp->ex_stats.start_time); + struct percpu_counter *counter = exp->ex_stats->counter; + + seq_printf(m, "\t%lld\n", exp->ex_stats->start_time); seq_printf(m, "\tfh_stale: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_FH_STALE])); + percpu_counter_sum_positive(&counter[EXP_STATS_FH_STALE])); seq_printf(m, "\tio_read: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_READ])); + percpu_counter_sum_positive(&counter[EXP_STATS_IO_READ])); seq_printf(m, "\tio_write: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_WRITE])); + percpu_counter_sum_positive(&counter[EXP_STATS_IO_WRITE])); seq_putc(m, '\n'); return 0; } @@ -796,7 +803,7 @@ static void svc_export_init(struct cache new->ex_layout_types = 0; new->ex_uuid = NULL; new->cd = item->cd; - export_stats_reset(&new->ex_stats); + export_stats_reset(new->ex_stats); }
static void export_update(struct cache_head *cnew, struct cache_head *citem) @@ -832,7 +839,14 @@ static struct cache_head *svc_export_all if (!i) return NULL;
- if (export_stats_init(&i->ex_stats)) { + i->ex_stats = kmalloc(sizeof(*(i->ex_stats)), GFP_KERNEL); + if (!i->ex_stats) { + kfree(i); + return NULL; + } + + if (export_stats_init(i->ex_stats)) { + kfree(i->ex_stats); kfree(i); return NULL; } --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -64,10 +64,10 @@ struct svc_export { struct cache_head h; struct auth_domain * ex_client; int ex_flags; + int ex_fsid; struct path ex_path; kuid_t ex_anon_uid; kgid_t ex_anon_gid; - int ex_fsid; unsigned char * ex_uuid; /* 16 byte fsid */ struct nfsd4_fs_locations ex_fslocs; uint32_t ex_nflavors; @@ -76,7 +76,7 @@ struct svc_export { struct nfsd4_deviceid_map *ex_devid_map; struct cache_detail *cd; struct rcu_head ex_rcu; - struct export_stats ex_stats; + struct export_stats *ex_stats; };
/* an "export key" (expkey) maps a filehandlefragement to an --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -60,22 +60,22 @@ static inline void nfsd_stats_rc_nocache static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp) { percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); - if (exp) - percpu_counter_inc(&exp->ex_stats.counter[EXP_STATS_FH_STALE]); + if (exp && exp->ex_stats) + percpu_counter_inc(&exp->ex_stats->counter[EXP_STATS_FH_STALE]); }
static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); - if (exp) - percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_READ], amount); + if (exp && exp->ex_stats) + percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_READ], amount); }
static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); - if (exp) - percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_WRITE], amount); + if (exp && exp->ex_stats) + percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_WRITE], amount); }
static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit ab42f4d9a26f1723dcfd6c93fcf768032b2bb5e7 ]
We check for the existence of ->sv_stats elsewhere except in the core processing code. It appears that only nfsd actual exports these values anywhere, everybody else just has a write only copy of sv_stats in their svc_program. Add a check for ->sv_stats before every adjustment to allow us to eliminate the stats struct from all the users who don't report the stats.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/svc.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
--- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1324,7 +1324,8 @@ svc_process_common(struct svc_rqst *rqst goto err_bad_proc;
/* Syntactic check complete */ - serv->sv_stats->rpccnt++; + if (serv->sv_stats) + serv->sv_stats->rpccnt++; trace_svc_process(rqstp, progp->pg_name);
/* Build the reply header. */ @@ -1377,7 +1378,8 @@ err_short_len: goto close_xprt;
err_bad_rpc: - serv->sv_stats->rpcbadfmt++; + if (serv->sv_stats) + serv->sv_stats->rpcbadfmt++; svc_putnl(resv, 1); /* REJECT */ svc_putnl(resv, 0); /* RPC_MISMATCH */ svc_putnl(resv, 2); /* Only RPCv2 supported */ @@ -1387,7 +1389,8 @@ err_bad_rpc: err_bad_auth: dprintk("svc: authentication failed (%d)\n", be32_to_cpu(rqstp->rq_auth_stat)); - serv->sv_stats->rpcbadauth++; + if (serv->sv_stats) + serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of accept status: */ xdr_ressize_check(rqstp, reply_statp); svc_putnl(resv, 1); /* REJECT */ @@ -1397,7 +1400,8 @@ err_bad_auth:
err_bad_prog: dprintk("svc: unknown program %d\n", prog); - serv->sv_stats->rpcbadfmt++; + if (serv->sv_stats) + serv->sv_stats->rpcbadfmt++; svc_putnl(resv, RPC_PROG_UNAVAIL); goto sendit;
@@ -1405,7 +1409,8 @@ err_bad_vers: svc_printk(rqstp, "unknown version (%d for prog %d, %s)\n", rqstp->rq_vers, rqstp->rq_prog, progp->pg_name);
- serv->sv_stats->rpcbadfmt++; + if (serv->sv_stats) + serv->sv_stats->rpcbadfmt++; svc_putnl(resv, RPC_PROG_MISMATCH); svc_putnl(resv, process.mismatch.lovers); svc_putnl(resv, process.mismatch.hivers); @@ -1414,7 +1419,8 @@ err_bad_vers: err_bad_proc: svc_printk(rqstp, "unknown procedure (%d)\n", rqstp->rq_proc);
- serv->sv_stats->rpcbadfmt++; + if (serv->sv_stats) + serv->sv_stats->rpcbadfmt++; svc_putnl(resv, RPC_PROC_UNAVAIL); goto sendit;
@@ -1423,7 +1429,8 @@ err_garbage:
rpc_stat = rpc_garbage_args; err_bad: - serv->sv_stats->rpcbadfmt++; + if (serv->sv_stats) + serv->sv_stats->rpcbadfmt++; svc_putnl(resv, ntohl(rpc_stat)); goto sendit; } @@ -1469,7 +1476,8 @@ svc_process(struct svc_rqst *rqstp) out_baddir: svc_printk(rqstp, "bad direction 0x%08x, dropping request\n", be32_to_cpu(dir)); - rqstp->rq_server->sv_stats->rpcbadfmt++; + if (rqstp->rq_server->sv_stats) + rqstp->rq_server->sv_stats->rpcbadfmt++; out_drop: svc_drop(rqstp); return 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit a2214ed588fb3c5b9824a21cff870482510372bb ]
A lot of places are setting a blank svc_stats in ->pg_stats and never utilizing these stats. Remove all of these extra structs as we're not reporting these stats anywhere.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/lockd/svc.c | 3 --- fs/nfs/callback.c | 3 --- fs/nfsd/nfssvc.c | 5 ----- 3 files changed, 11 deletions(-)
--- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -759,8 +759,6 @@ static const struct svc_version *nlmsvc_ #endif };
-static struct svc_stat nlmsvc_stats; - #define NLM_NRVERS ARRAY_SIZE(nlmsvc_version) static struct svc_program nlmsvc_program = { .pg_prog = NLM_PROGRAM, /* program number */ @@ -768,7 +766,6 @@ static struct svc_program nlmsvc_program .pg_vers = nlmsvc_version, /* version table */ .pg_name = "lockd", /* service name */ .pg_class = "nfsd", /* share authentication with nfsd */ - .pg_stats = &nlmsvc_stats, /* stats table */ .pg_authenticate = &lockd_authenticate, /* export authentication */ .pg_init_request = svc_generic_init_request, .pg_rpcbind_set = svc_generic_rpcbind_set, --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -407,15 +407,12 @@ static const struct svc_version *nfs4_ca [4] = &nfs4_callback_version4, };
-static struct svc_stat nfs4_callback_stats; - static struct svc_program nfs4_callback_program = { .pg_prog = NFS4_CALLBACK, /* RPC service number */ .pg_nvers = ARRAY_SIZE(nfs4_callback_version), /* Number of entries */ .pg_vers = nfs4_callback_version, /* version table */ .pg_name = "NFSv4 callback", /* service name */ .pg_class = "nfs", /* authentication class */ - .pg_stats = &nfs4_callback_stats, .pg_authenticate = nfs_callback_authenticate, .pg_init_request = svc_generic_init_request, .pg_rpcbind_set = svc_generic_rpcbind_set, --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -89,7 +89,6 @@ unsigned long nfsd_drc_max_mem; unsigned long nfsd_drc_mem_used;
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) -static struct svc_stat nfsd_acl_svcstats; static const struct svc_version *nfsd_acl_version[] = { # if defined(CONFIG_NFSD_V2_ACL) [2] = &nfsd_acl_version2, @@ -108,15 +107,11 @@ static struct svc_program nfsd_acl_progr .pg_vers = nfsd_acl_version, .pg_name = "nfsacl", .pg_class = "nfsd", - .pg_stats = &nfsd_acl_svcstats, .pg_authenticate = &svc_set_client, .pg_init_request = nfsd_acl_init_request, .pg_rpcbind_set = nfsd_acl_rpcbind_set, };
-static struct svc_stat nfsd_acl_svcstats = { - .program = &nfsd_acl_program, -}; #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
static const struct svc_version *nfsd_version[] = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit f094323867668d50124886ad884b665de7319537 ]
Since only one service actually reports the rpc stats there's not much of a reason to have a pointer to it in the svc_program struct. Adjust the svc_create_pooled function to take the sv_stats as an argument and pass the struct through there as desired instead of getting it from the svc_program->pg_stats.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfssvc.c | 3 ++- include/linux/sunrpc/svc.h | 4 +++- net/sunrpc/svc.c | 12 +++++++----- 3 files changed, 12 insertions(+), 7 deletions(-)
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -657,7 +657,8 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd); + serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats, + nfsd_max_blksize, nfsd); if (serv == NULL) return -ENOMEM;
--- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -493,7 +493,9 @@ void svc_rqst_replace_page(struct sv struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); -struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, +struct svc_serv * svc_create_pooled(struct svc_program *prog, + struct svc_stat *stats, + unsigned int bufsize, int (*threadfn)(void *data)); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -453,8 +453,8 @@ __svc_init_bc(struct svc_serv *serv) * Create an RPC service */ static struct svc_serv * -__svc_create(struct svc_program *prog, unsigned int bufsize, int npools, - int (*threadfn)(void *data)) +__svc_create(struct svc_program *prog, struct svc_stat *stats, + unsigned int bufsize, int npools, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int vers; @@ -466,7 +466,7 @@ __svc_create(struct svc_program *prog, u serv->sv_name = prog->pg_name; serv->sv_program = prog; kref_init(&serv->sv_refcnt); - serv->sv_stats = prog->pg_stats; + serv->sv_stats = stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; @@ -528,26 +528,28 @@ __svc_create(struct svc_program *prog, u struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, int (*threadfn)(void *data)) { - return __svc_create(prog, bufsize, 1, threadfn); + return __svc_create(prog, NULL, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create);
/** * svc_create_pooled - Create an RPC service with pooled threads * @prog: the RPC program the new service will handle + * @stats: the stats struct if desired * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog * * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create_pooled(struct svc_program *prog, + struct svc_stat *stats, unsigned int bufsize, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get();
- serv = __svc_create(prog, bufsize, npools, threadfn); + serv = __svc_create(prog, stats, bufsize, npools, threadfn); if (!serv) goto out_err; return serv;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 3f6ef182f144dcc9a4d942f97b6a8ed969f13c95 ]
Now that this isn't used anywhere, remove it.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfssvc.c | 1 - include/linux/sunrpc/svc.h | 1 - 2 files changed, 2 deletions(-)
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -136,7 +136,6 @@ struct svc_program nfsd_program = { .pg_vers = nfsd_version, /* version table */ .pg_name = "nfsd", /* program name */ .pg_class = "nfsd", /* authentication class */ - .pg_stats = &nfsd_svcstats, /* version table */ .pg_authenticate = &svc_set_client, /* export authentication */ .pg_init_request = nfsd_init_request, .pg_rpcbind_set = nfsd_rpcbind_set, --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -422,7 +422,6 @@ struct svc_program { const struct svc_version **pg_vers; /* version array */ char * pg_name; /* service name */ char * pg_class; /* class name: services sharing authentication */ - struct svc_stat * pg_stats; /* rpc statistics */ int (*pg_authenticate)(struct svc_rqst *); __be32 (*pg_init_request)(struct svc_rqst *, const struct svc_program *,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 418b9687dece5bd763c09b5c27a801a7e3387be9 ]
nfsd is the only thing using this helper, and it doesn't use the private currently. When we switch to per-network namespace stats we will need the struct net * in order to get to the nfsd_net. Use the net as the proc private so we can utilize this when we make the switch over.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/stats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/stats.c +++ b/net/sunrpc/stats.c @@ -309,7 +309,7 @@ EXPORT_SYMBOL_GPL(rpc_proc_unregister); struct proc_dir_entry * svc_proc_register(struct net *net, struct svc_stat *statp, const struct proc_ops *proc_ops) { - return do_register(net, statp->program->pg_name, statp, proc_ops); + return do_register(net, statp->program->pg_name, net, proc_ops); } EXPORT_SYMBOL_GPL(svc_proc_register);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit d98416cc2154053950610bb6880911e3dcbdf8c5 ]
We're going to merge the stats all into per network namespace in subsequent patches, rename these nn counters to be consistent with the rest of the stats.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/netns.h | 4 ++-- fs/nfsd/nfscache.c | 4 ++-- fs/nfsd/stats.h | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-)
--- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -25,9 +25,9 @@ struct nfsd4_client_tracking_ops;
enum { /* cache misses due only to checksum comparison failures */ - NFSD_NET_PAYLOAD_MISSES, + NFSD_STATS_PAYLOAD_MISSES, /* amount of memory (in bytes) currently consumed by the DRC */ - NFSD_NET_DRC_MEM_USAGE, + NFSD_STATS_DRC_MEM_USAGE, NFSD_NET_COUNTERS_NUM };
--- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -696,7 +696,7 @@ int nfsd_reply_cache_stats_show(struct s atomic_read(&nn->num_drc_entries)); seq_printf(m, "hash buckets: %u\n", 1 << nn->maskbits); seq_printf(m, "mem usage: %lld\n", - percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_DRC_MEM_USAGE])); seq_printf(m, "cache hits: %lld\n", percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS])); seq_printf(m, "cache misses: %lld\n", @@ -704,7 +704,7 @@ int nfsd_reply_cache_stats_show(struct s seq_printf(m, "not cached: %lld\n", percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE])); seq_printf(m, "payload misses: %lld\n", - percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_PAYLOAD_MISSES])); seq_printf(m, "longest chain len: %u\n", nn->longest_chain); seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -80,17 +80,17 @@ static inline void nfsd_stats_io_write_a
static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) { - percpu_counter_inc(&nn->counter[NFSD_NET_PAYLOAD_MISSES]); + percpu_counter_inc(&nn->counter[NFSD_STATS_PAYLOAD_MISSES]); }
static inline void nfsd_stats_drc_mem_usage_add(struct nfsd_net *nn, s64 amount) { - percpu_counter_add(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); + percpu_counter_add(&nn->counter[NFSD_STATS_DRC_MEM_USAGE], amount); }
static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount) { - percpu_counter_sub(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); + percpu_counter_sub(&nn->counter[NFSD_STATS_DRC_MEM_USAGE], amount); }
#endif /* _NFSD_STATS_H */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 93483ac5fec62cc1de166051b219d953bb5e4ef4 ]
We are running nfsd servers inside of containers with their own network namespace, and we want to monitor these services using the stats found in /proc. However these are not exposed in the proc inside of the container, so we have to bind mount the host /proc into our containers to get at this information.
Separate out the stat counters init and the proc registration, and move the proc registration into the pernet operations entry and exit points so that these stats can be exposed inside of network namespaces.
This is an intermediate step, this just exposes the global counters in the network namespace. Subsequent patches will move these counters into the per-network namespace container.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsctl.c | 8 +++++--- fs/nfsd/stats.c | 21 ++++++--------------- fs/nfsd/stats.h | 6 ++++-- 3 files changed, 15 insertions(+), 20 deletions(-)
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1458,6 +1458,7 @@ static __net_init int nfsd_init_net(stru nfsd4_init_leases_net(nn); get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); + nfsd_proc_stat_init(net);
return 0;
@@ -1473,6 +1474,7 @@ static __net_exit void nfsd_exit_net(str { struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ nfsd_proc_stat_shutdown(net); nfsd_net_reply_cache_destroy(nn); nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); @@ -1496,7 +1498,7 @@ static int __init init_nfsd(void) retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; - retval = nfsd_stat_init(); /* Statistics */ + retval = nfsd_stat_counters_init(); /* Statistics */ if (retval) goto out_free_pnfs; retval = nfsd_drc_slab_create(); @@ -1532,7 +1534,7 @@ out_free_lockd: nfsd_lockd_shutdown(); nfsd_drc_slab_free(); out_free_stat: - nfsd_stat_shutdown(); + nfsd_stat_counters_destroy(); out_free_pnfs: nfsd4_exit_pnfs(); out_free_slabs: @@ -1549,7 +1551,7 @@ static void __exit exit_nfsd(void) nfsd_drc_slab_free(); remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); - nfsd_stat_shutdown(); + nfsd_stat_counters_destroy(); nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -106,31 +106,22 @@ void nfsd_percpu_counters_destroy(struct percpu_counter_destroy(&counters[i]); }
-static int nfsd_stat_counters_init(void) +int nfsd_stat_counters_init(void) { return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); }
-static void nfsd_stat_counters_destroy(void) +void nfsd_stat_counters_destroy(void) { nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); }
-int nfsd_stat_init(void) +void nfsd_proc_stat_init(struct net *net) { - int err; - - err = nfsd_stat_counters_init(); - if (err) - return err; - - svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops); - - return 0; + svc_proc_register(net, &nfsd_svcstats, &nfsd_proc_ops); }
-void nfsd_stat_shutdown(void) +void nfsd_proc_stat_shutdown(struct net *net) { - nfsd_stat_counters_destroy(); - svc_proc_unregister(&init_net, "nfsd"); + svc_proc_unregister(net, "nfsd"); } --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -39,8 +39,10 @@ extern struct svc_stat nfsd_svcstats; int nfsd_percpu_counters_init(struct percpu_counter *counters, int num); void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num); void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num); -int nfsd_stat_init(void); -void nfsd_stat_shutdown(void); +int nfsd_stat_counters_init(void); +void nfsd_stat_counters_destroy(void); +void nfsd_proc_stat_init(struct net *net); +void nfsd_proc_stat_shutdown(struct net *net);
static inline void nfsd_stats_rc_hits_inc(void) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 4b14885411f74b2b0ce0eb2b39d0fffe54e5ca0d ]
We have a global set of counters that we modify for all of the nfsd operations, but now that we're exposing these stats across all network namespaces we need to make the stats also be per-network namespace. We already have some caching stats that are per-network namespace, so move these definitions into the same counter and then adjust all the helpers and users of these stats to provide the appropriate nfsd_net struct so that the stats are maintained for the per-network namespace objects.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v6.1.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/cache.h | 2 -- fs/nfsd/netns.h | 17 +++++++++++++++-- fs/nfsd/nfs4proc.c | 6 +++--- fs/nfsd/nfscache.c | 36 +++++++----------------------------- fs/nfsd/nfsctl.c | 12 +++--------- fs/nfsd/nfsfh.c | 3 ++- fs/nfsd/stats.c | 24 +++++++++++++----------- fs/nfsd/stats.h | 49 +++++++++++++++++-------------------------------- fs/nfsd/vfs.c | 6 ++++-- 9 files changed, 64 insertions(+), 91 deletions(-)
--- a/fs/nfsd/cache.h +++ b/fs/nfsd/cache.h @@ -80,8 +80,6 @@ enum {
int nfsd_drc_slab_create(void); void nfsd_drc_slab_free(void); -int nfsd_net_reply_cache_init(struct nfsd_net *nn); -void nfsd_net_reply_cache_destroy(struct nfsd_net *nn); int nfsd_reply_cache_init(struct nfsd_net *); void nfsd_reply_cache_shutdown(struct nfsd_net *); int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start, --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -10,6 +10,7 @@
#include <net/net_namespace.h> #include <net/netns/generic.h> +#include <linux/nfs4.h> #include <linux/percpu_counter.h> #include <linux/siphash.h>
@@ -28,7 +29,19 @@ enum { NFSD_STATS_PAYLOAD_MISSES, /* amount of memory (in bytes) currently consumed by the DRC */ NFSD_STATS_DRC_MEM_USAGE, - NFSD_NET_COUNTERS_NUM + NFSD_STATS_RC_HITS, /* repcache hits */ + NFSD_STATS_RC_MISSES, /* repcache misses */ + NFSD_STATS_RC_NOCACHE, /* uncached reqs */ + NFSD_STATS_FH_STALE, /* FH stale error */ + NFSD_STATS_IO_READ, /* bytes returned to read requests */ + NFSD_STATS_IO_WRITE, /* bytes passed in write requests */ +#ifdef CONFIG_NFSD_V4 + NFSD_STATS_FIRST_NFS4_OP, /* count of individual nfsv4 operations */ + NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP, +#define NFSD_STATS_NFS4_OP(op) (NFSD_STATS_FIRST_NFS4_OP + (op)) + NFSD_STATS_WDELEG_GETATTR, /* count of getattr conflict with wdeleg */ +#endif + NFSD_STATS_COUNTERS_NUM };
/* @@ -168,7 +181,7 @@ struct nfsd_net { atomic_t num_drc_entries;
/* Per-netns stats counters */ - struct percpu_counter counter[NFSD_NET_COUNTERS_NUM]; + struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM];
/* longest hash chain seen */ unsigned int longest_chain; --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2430,10 +2430,10 @@ nfsd4_proc_null(struct svc_rqst *rqstp) return rpc_success; }
-static inline void nfsd4_increment_op_stats(u32 opnum) +static inline void nfsd4_increment_op_stats(struct nfsd_net *nn, u32 opnum) { if (opnum >= FIRST_NFS4_OP && opnum <= LAST_NFS4_OP) - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_NFS4_OP(opnum)]); + percpu_counter_inc(&nn->counter[NFSD_STATS_NFS4_OP(opnum)]); }
static const struct nfsd4_operation nfsd4_ops[]; @@ -2708,7 +2708,7 @@ encode_op: status, nfsd4_op_name(op->opnum));
nfsd4_cstate_clear_replay(cstate); - nfsd4_increment_op_stats(op->opnum); + nfsd4_increment_op_stats(nn, op->opnum); }
fh_put(current_fh); --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -176,27 +176,6 @@ void nfsd_drc_slab_free(void) kmem_cache_destroy(drc_slab); }
-/** - * nfsd_net_reply_cache_init - per net namespace reply cache set-up - * @nn: nfsd_net being initialized - * - * Returns zero on succes; otherwise a negative errno is returned. - */ -int nfsd_net_reply_cache_init(struct nfsd_net *nn) -{ - return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM); -} - -/** - * nfsd_net_reply_cache_destroy - per net namespace reply cache tear-down - * @nn: nfsd_net being freed - * - */ -void nfsd_net_reply_cache_destroy(struct nfsd_net *nn) -{ - nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM); -} - int nfsd_reply_cache_init(struct nfsd_net *nn) { unsigned int hashsize; @@ -501,7 +480,7 @@ out: int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start, unsigned int len) { - struct nfsd_net *nn; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp, *found; __wsum csum; struct nfsd_drc_bucket *b; @@ -512,7 +491,7 @@ int nfsd_cache_lookup(struct svc_rqst *r
rqstp->rq_cacherep = NULL; if (type == RC_NOCACHE) { - nfsd_stats_rc_nocache_inc(); + nfsd_stats_rc_nocache_inc(nn); goto out; }
@@ -522,7 +501,6 @@ int nfsd_cache_lookup(struct svc_rqst *r * Since the common case is a cache miss followed by an insert, * preallocate an entry. */ - nn = net_generic(SVC_NET(rqstp), nfsd_net_id); rp = nfsd_cacherep_alloc(rqstp, csum, nn); if (!rp) goto out; @@ -540,7 +518,7 @@ int nfsd_cache_lookup(struct svc_rqst *r freed = nfsd_cacherep_dispose(&dispose); trace_nfsd_drc_gc(nn, freed);
- nfsd_stats_rc_misses_inc(); + nfsd_stats_rc_misses_inc(nn); atomic_inc(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); goto out; @@ -548,7 +526,7 @@ int nfsd_cache_lookup(struct svc_rqst *r found_entry: /* We found a matching entry which is either in progress or done. */ nfsd_reply_cache_free_locked(NULL, rp, nn); - nfsd_stats_rc_hits_inc(); + nfsd_stats_rc_hits_inc(nn); rtn = RC_DROPIT; rp = found;
@@ -698,11 +676,11 @@ int nfsd_reply_cache_stats_show(struct s seq_printf(m, "mem usage: %lld\n", percpu_counter_sum_positive(&nn->counter[NFSD_STATS_DRC_MEM_USAGE])); seq_printf(m, "cache hits: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_HITS])); seq_printf(m, "cache misses: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_MISSES])); seq_printf(m, "not cached: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_NOCACHE])); seq_printf(m, "payload misses: %lld\n", percpu_counter_sum_positive(&nn->counter[NFSD_STATS_PAYLOAD_MISSES])); seq_printf(m, "longest chain len: %u\n", nn->longest_chain); --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1450,7 +1450,7 @@ static __net_init int nfsd_init_net(stru retval = nfsd_idmap_init(net); if (retval) goto out_idmap_error; - retval = nfsd_net_reply_cache_init(nn); + retval = nfsd_stat_counters_init(nn); if (retval) goto out_repcache_error; nn->nfsd_versions = NULL; @@ -1475,7 +1475,7 @@ static __net_exit void nfsd_exit_net(str struct nfsd_net *nn = net_generic(net, nfsd_net_id);
nfsd_proc_stat_shutdown(net); - nfsd_net_reply_cache_destroy(nn); + nfsd_stat_counters_destroy(nn); nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(nn); @@ -1498,12 +1498,9 @@ static int __init init_nfsd(void) retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; - retval = nfsd_stat_counters_init(); /* Statistics */ - if (retval) - goto out_free_pnfs; retval = nfsd_drc_slab_create(); if (retval) - goto out_free_stat; + goto out_free_pnfs; nfsd_lockd_init(); /* lockd->nfsd callbacks */ retval = create_proc_exports_entry(); if (retval) @@ -1533,8 +1530,6 @@ out_free_exports: out_free_lockd: nfsd_lockd_shutdown(); nfsd_drc_slab_free(); -out_free_stat: - nfsd_stat_counters_destroy(); out_free_pnfs: nfsd4_exit_pnfs(); out_free_slabs: @@ -1551,7 +1546,6 @@ static void __exit exit_nfsd(void) nfsd_drc_slab_free(); remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); - nfsd_stat_counters_destroy(); nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -327,6 +327,7 @@ out: __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) { + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_export *exp = NULL; struct dentry *dentry; __be32 error; @@ -395,7 +396,7 @@ skip_pseudoflavor_check: out: trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error); if (error == nfserr_stale) - nfsd_stats_fh_stale_inc(exp); + nfsd_stats_fh_stale_inc(nn, exp); return error; }
--- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -34,15 +34,17 @@ struct svc_stat nfsd_svcstats = {
static int nfsd_show(struct seq_file *seq, void *v) { + struct net *net = pde_data(file_inode(seq->file)); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); int i;
seq_printf(seq, "rc %lld %lld %lld\nfh %lld 0 0 0 0\nio %lld %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_FH_STALE]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_READ]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_HITS]), + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_MISSES]), + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_NOCACHE]), + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_FH_STALE]), + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_READ]), + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_WRITE]));
/* thread usage: */ seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt)); @@ -63,7 +65,7 @@ static int nfsd_show(struct seq_file *se seq_printf(seq,"proc4ops %u", LAST_NFS4_OP + 1); for (i = 0; i <= LAST_NFS4_OP; i++) { seq_printf(seq, " %lld", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_NFS4_OP(i)])); + percpu_counter_sum_positive(&nn->counter[NFSD_STATS_NFS4_OP(i)])); }
seq_putc(seq, '\n'); @@ -106,14 +108,14 @@ void nfsd_percpu_counters_destroy(struct percpu_counter_destroy(&counters[i]); }
-int nfsd_stat_counters_init(void) +int nfsd_stat_counters_init(struct nfsd_net *nn) { - return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); + return nfsd_percpu_counters_init(nn->counter, NFSD_STATS_COUNTERS_NUM); }
-void nfsd_stat_counters_destroy(void) +void nfsd_stat_counters_destroy(struct nfsd_net *nn) { - nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); + nfsd_percpu_counters_destroy(nn->counter, NFSD_STATS_COUNTERS_NUM); }
void nfsd_proc_stat_init(struct net *net) --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -10,25 +10,7 @@ #include <uapi/linux/nfsd/stats.h> #include <linux/percpu_counter.h>
- -enum { - NFSD_STATS_RC_HITS, /* repcache hits */ - NFSD_STATS_RC_MISSES, /* repcache misses */ - NFSD_STATS_RC_NOCACHE, /* uncached reqs */ - NFSD_STATS_FH_STALE, /* FH stale error */ - NFSD_STATS_IO_READ, /* bytes returned to read requests */ - NFSD_STATS_IO_WRITE, /* bytes passed in write requests */ -#ifdef CONFIG_NFSD_V4 - NFSD_STATS_FIRST_NFS4_OP, /* count of individual nfsv4 operations */ - NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP, -#define NFSD_STATS_NFS4_OP(op) (NFSD_STATS_FIRST_NFS4_OP + (op)) -#endif - NFSD_STATS_COUNTERS_NUM -}; - struct nfsd_stats { - struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM]; - atomic_t th_cnt; /* number of available threads */ };
@@ -39,43 +21,46 @@ extern struct svc_stat nfsd_svcstats; int nfsd_percpu_counters_init(struct percpu_counter *counters, int num); void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num); void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num); -int nfsd_stat_counters_init(void); -void nfsd_stat_counters_destroy(void); +int nfsd_stat_counters_init(struct nfsd_net *nn); +void nfsd_stat_counters_destroy(struct nfsd_net *nn); void nfsd_proc_stat_init(struct net *net); void nfsd_proc_stat_shutdown(struct net *net);
-static inline void nfsd_stats_rc_hits_inc(void) +static inline void nfsd_stats_rc_hits_inc(struct nfsd_net *nn) { - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_HITS]); + percpu_counter_inc(&nn->counter[NFSD_STATS_RC_HITS]); }
-static inline void nfsd_stats_rc_misses_inc(void) +static inline void nfsd_stats_rc_misses_inc(struct nfsd_net *nn) { - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_MISSES]); + percpu_counter_inc(&nn->counter[NFSD_STATS_RC_MISSES]); }
-static inline void nfsd_stats_rc_nocache_inc(void) +static inline void nfsd_stats_rc_nocache_inc(struct nfsd_net *nn) { - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); + percpu_counter_inc(&nn->counter[NFSD_STATS_RC_NOCACHE]); }
-static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp) +static inline void nfsd_stats_fh_stale_inc(struct nfsd_net *nn, + struct svc_export *exp) { - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); + percpu_counter_inc(&nn->counter[NFSD_STATS_FH_STALE]); if (exp && exp->ex_stats) percpu_counter_inc(&exp->ex_stats->counter[EXP_STATS_FH_STALE]); }
-static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount) +static inline void nfsd_stats_io_read_add(struct nfsd_net *nn, + struct svc_export *exp, s64 amount) { - percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); + percpu_counter_add(&nn->counter[NFSD_STATS_IO_READ], amount); if (exp && exp->ex_stats) percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_READ], amount); }
-static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount) +static inline void nfsd_stats_io_write_add(struct nfsd_net *nn, + struct svc_export *exp, s64 amount) { - percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); + percpu_counter_add(&nn->counter[NFSD_STATS_IO_WRITE], amount); if (exp && exp->ex_stats) percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_WRITE], amount); } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -983,7 +983,9 @@ static __be32 nfsd_finish_read(struct sv unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsd_stats_io_read_add(fhp->fh_export, host_err); + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + + nfsd_stats_io_read_add(nn, fhp->fh_export, host_err); *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1126,7 +1128,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s goto out_nfserr; } *cnt = host_err; - nfsd_stats_io_write_add(exp, *cnt); + nfsd_stats_io_write_add(nn, exp, *cnt); fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit e41ee44cc6a473b1f414031782c3b4283d7f3e5f ]
This is the last global stat, take it out of the nfsd_stats struct and make it a global part of nfsd, report it the same as always.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfssvc.c | 5 +++-- fs/nfsd/stats.c | 3 +-- fs/nfsd/stats.h | 6 ------ 4 files changed, 5 insertions(+), 10 deletions(-)
--- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -69,6 +69,7 @@ extern struct mutex nfsd_mutex; extern spinlock_t nfsd_drc_lock; extern unsigned long nfsd_drc_max_mem; extern unsigned long nfsd_drc_mem_used; +extern atomic_t nfsd_th_cnt; /* number of available threads */
extern const struct seq_operations nfs_exports_op;
--- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -34,6 +34,7 @@
#define NFSDDBG_FACILITY NFSDDBG_SVC
+atomic_t nfsd_th_cnt = ATOMIC_INIT(0); extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) @@ -955,7 +956,7 @@ nfsd(void *vrqstp)
current->fs->umask = 0;
- atomic_inc(&nfsdstats.th_cnt); + atomic_inc(&nfsd_th_cnt);
set_freezable();
@@ -979,7 +980,7 @@ nfsd(void *vrqstp) validate_process_creds(); }
- atomic_dec(&nfsdstats.th_cnt); + atomic_dec(&nfsd_th_cnt);
out: /* Take an extra ref so that the svc_put in svc_exit_thread() --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -27,7 +27,6 @@
#include "nfsd.h"
-struct nfsd_stats nfsdstats; struct svc_stat nfsd_svcstats = { .program = &nfsd_program, }; @@ -47,7 +46,7 @@ static int nfsd_show(struct seq_file *se percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_WRITE]));
/* thread usage: */ - seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt)); + seq_printf(seq, "th %u 0", atomic_read(&nfsd_th_cnt));
/* deprecated thread usage histogram stats */ for (i = 0; i < 10; i++) --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -10,12 +10,6 @@ #include <uapi/linux/nfsd/stats.h> #include <linux/percpu_counter.h>
-struct nfsd_stats { - atomic_t th_cnt; /* number of available threads */ -}; - -extern struct nfsd_stats nfsdstats; - extern struct svc_stat nfsd_svcstats;
int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 16fb9808ab2c99979f081987752abcbc5b092eac ]
The final bit of stats that is global is the rpc svc_stat. Move this into the nfsd_net struct and use that everywhere instead of the global struct. Remove the unused global struct.
Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/netns.h | 4 ++++ fs/nfsd/nfsctl.c | 2 ++ fs/nfsd/nfssvc.c | 2 +- fs/nfsd/stats.c | 10 ++++------ fs/nfsd/stats.h | 2 -- 5 files changed, 11 insertions(+), 9 deletions(-)
--- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -13,6 +13,7 @@ #include <linux/nfs4.h> #include <linux/percpu_counter.h> #include <linux/siphash.h> +#include <linux/sunrpc/stats.h>
/* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -183,6 +184,9 @@ struct nfsd_net { /* Per-netns stats counters */ struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM];
+ /* sunrpc svc stats */ + struct svc_stat nfsd_svcstats; + /* longest hash chain seen */ unsigned int longest_chain;
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1453,6 +1453,8 @@ static __net_init int nfsd_init_net(stru retval = nfsd_stat_counters_init(nn); if (retval) goto out_repcache_error; + memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats)); + nn->nfsd_svcstats.program = &nfsd_program; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; nfsd4_init_leases_net(nn); --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -657,7 +657,7 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats, + serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats, nfsd_max_blksize, nfsd); if (serv == NULL) return -ENOMEM; --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -27,10 +27,6 @@
#include "nfsd.h"
-struct svc_stat nfsd_svcstats = { - .program = &nfsd_program, -}; - static int nfsd_show(struct seq_file *seq, void *v) { struct net *net = pde_data(file_inode(seq->file)); @@ -56,7 +52,7 @@ static int nfsd_show(struct seq_file *se seq_puts(seq, "\nra 0 0 0 0 0 0 0 0 0 0 0 0\n");
/* show my rpc info */ - svc_seq_show(seq, &nfsd_svcstats); + svc_seq_show(seq, &nn->nfsd_svcstats);
#ifdef CONFIG_NFSD_V4 /* Show count for individual nfsv4 operations */ @@ -119,7 +115,9 @@ void nfsd_stat_counters_destroy(struct n
void nfsd_proc_stat_init(struct net *net) { - svc_proc_register(net, &nfsd_svcstats, &nfsd_proc_ops); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + svc_proc_register(net, &nn->nfsd_svcstats, &nfsd_proc_ops); }
void nfsd_proc_stat_shutdown(struct net *net) --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -10,8 +10,6 @@ #include <uapi/linux/nfsd/stats.h> #include <linux/percpu_counter.h>
-extern struct svc_stat nfsd_svcstats; - int nfsd_percpu_counters_init(struct percpu_counter *counters, int num); void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num); void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: WangYuli wangyuli@uniontech.com
commit ab091ec536cb7b271983c0c063b17f62f3591583 upstream.
There is a hardware power-saving problem with the Lenovo N60z board. When turn it on and leave it for 10 hours, there is a 20% chance that a nvme disk will not wake up until reboot.
Link: https://lore.kernel.org/all/2B5581C46AC6E335+9c7a81f1-05fb-4fd0-9fbb-108757c... Signed-off-by: hmy huanglin@uniontech.com Signed-off-by: Wentao Guan guanwentao@uniontech.com Signed-off-by: WangYuli wangyuli@uniontech.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/pci.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3109,6 +3109,13 @@ static unsigned long check_vendor_combin return NVME_QUIRK_FORCE_NO_SIMPLE_SUSPEND; }
+ /* + * NVMe SSD drops off the PCIe bus after system idle + * for 10 hours on a Lenovo N60z board. + */ + if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6")) + return NVME_QUIRK_NO_APST; + return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit d67c5649c1541dc93f202eeffc6f49220a4ed71d upstream.
Before this patch, receiving an ADD_ADDR echo on the just connected MP_JOIN subflow -- initiator side, after the MP_JOIN 3WHS -- was resulting in an MP_RESET. That's because only ACKs with a DSS or ADD_ADDRs without the echo bit were allowed.
Not allowing the ADD_ADDR echo after an MP_CAPABLE 3WHS makes sense, as we are not supposed to send an ADD_ADDR before because it requires to be in full established mode first. For the MP_JOIN 3WHS, that's different: the ADD_ADDR can be sent on a previous subflow, and the ADD_ADDR echo can be received on the recently created one. The other peer will already be in fully established, so it is allowed to send that.
We can then relax the conditions here to accept the ADD_ADDR echo for MPJ subflows.
Fixes: 67b12f792d5e ("mptcp: full fully established support after ADD_ADDR") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-s... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflicts in options.c, because the context has changed in commit b3ea6b272d79 ("mptcp: consolidate initial ack seq generation"), which is not in this version. This commit is unrelated to this modification. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/options.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -950,7 +950,8 @@ static bool check_fully_established(stru }
if (((mp_opt->suboptions & OPTION_MPTCP_DSS) && mp_opt->use_ack) || - ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && !mp_opt->echo)) { + ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && + (!mp_opt->echo || subflow->mp_join))) { /* subflows are fully established as soon as we get any * additional ack, including ADD_ADDR. */
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@linux.intel.com
commit 8bdd9ef7e9b1b2a73e394712b72b22055e0e26c3 upstream.
Calculating the size of the mapped area as the lesser value between the requested size and the actual size does not consider the partial mapping offset. This can cause page fault access.
Fix the calculation of the starting and ending addresses, the total size is now deduced from the difference between the end and start addresses.
Additionally, the calculations have been rewritten in a clearer and more understandable form.
Fixes: c58305af1835 ("drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass") Reported-by: Jann Horn jannh@google.com Co-developed-by: Chris Wilson chris.p.wilson@linux.intel.com Signed-off-by: Chris Wilson chris.p.wilson@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Jann Horn jannh@google.com Reviewed-by: Jonathan Cavitt Jonathan.cavitt@intel.com [Joonas: Add Requires: tag] Requires: 60a2066c5005 ("drm/i915/gem: Adjust vma offset for framebuffer mmap offset") Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240802083850.103694-3-andi.s... (cherry picked from commit 97b6784753da06d9d40232328efc5c5367e53417) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 53 +++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -290,6 +290,41 @@ out: return i915_error_to_vmf_fault(err); }
+static void set_address_limits(struct vm_area_struct *area, + struct i915_vma *vma, + unsigned long obj_offset, + unsigned long *start_vaddr, + unsigned long *end_vaddr) +{ + unsigned long vm_start, vm_end, vma_size; /* user's memory parameters */ + long start, end; /* memory boundaries */ + + /* + * Let's move into the ">> PAGE_SHIFT" + * domain to be sure not to lose bits + */ + vm_start = area->vm_start >> PAGE_SHIFT; + vm_end = area->vm_end >> PAGE_SHIFT; + vma_size = vma->size >> PAGE_SHIFT; + + /* + * Calculate the memory boundaries by considering the offset + * provided by the user during memory mapping and the offset + * provided for the partial mapping. + */ + start = vm_start; + start -= obj_offset; + start += vma->gtt_view.partial.offset; + end = start + vma_size; + + start = max_t(long, start, vm_start); + end = min_t(long, end, vm_end); + + /* Let's move back into the "<< PAGE_SHIFT" domain */ + *start_vaddr = (unsigned long)start << PAGE_SHIFT; + *end_vaddr = (unsigned long)end << PAGE_SHIFT; +} + static vm_fault_t vm_fault_gtt(struct vm_fault *vmf) { #define MIN_CHUNK_PAGES (SZ_1M >> PAGE_SHIFT) @@ -302,14 +337,18 @@ static vm_fault_t vm_fault_gtt(struct vm struct i915_ggtt *ggtt = to_gt(i915)->ggtt; bool write = area->vm_flags & VM_WRITE; struct i915_gem_ww_ctx ww; + unsigned long obj_offset; + unsigned long start, end; /* memory boundaries */ intel_wakeref_t wakeref; struct i915_vma *vma; pgoff_t page_offset; + unsigned long pfn; int srcu; int ret;
- /* We don't use vmf->pgoff since that has the fake offset */ + obj_offset = area->vm_pgoff - drm_vma_node_start(&mmo->vma_node); page_offset = (vmf->address - area->vm_start) >> PAGE_SHIFT; + page_offset += obj_offset;
trace_i915_gem_object_fault(obj, page_offset, true, write);
@@ -393,12 +432,14 @@ retry: if (ret) goto err_unpin;
+ set_address_limits(area, vma, obj_offset, &start, &end); + + pfn = (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT; + pfn += (start - area->vm_start) >> PAGE_SHIFT; + pfn += obj_offset - vma->gtt_view.partial.offset; + /* Finally, remap it using the new GTT offset */ - ret = remap_io_mapping(area, - area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT), - (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT, - min_t(u64, vma->size, area->vm_end - area->vm_start), - &ggtt->iomap); + ret = remap_io_mapping(area, start, pfn, end - start, &ggtt->iomap); if (ret) goto err_fence;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yafang Shao laoar.shao@gmail.com
commit d23b5c577715892c87533b13923306acc6243f93 upstream.
At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial.
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Tejun Heo tj@kernel.org Cc: Michal Koutný mkoutny@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cgroup-defs.h | 1 + kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 23 ++++++++++++++++------- 3 files changed, 19 insertions(+), 8 deletions(-)
--- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -540,6 +540,7 @@ struct cgroup_root {
/* A list running through the active hierarchies */ struct list_head root_list; + struct rcu_head rcu;
/* Hierarchy-specific flags */ unsigned int flags; --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -170,7 +170,8 @@ extern struct list_head cgroup_roots;
/* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + lockdep_is_held(&cgroup_mutex))
/** * for_each_subsys - iterate all enabled cgroup subsystems --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1346,7 +1346,7 @@ static void cgroup_exit_root_id(struct c
void cgroup_free_root(struct cgroup_root *root) { - kfree(root); + kfree_rcu(root, rcu); }
static void cgroup_destroy_root(struct cgroup_root *root) @@ -1379,7 +1379,7 @@ static void cgroup_destroy_root(struct c spin_unlock_irq(&css_set_lock);
if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; }
@@ -1419,7 +1419,15 @@ static inline struct cgroup *__cset_cgro } }
- BUG_ON(!res_cgroup); + /* + * If cgroup_mutex is not held, the cgrp_cset_link will be freed + * before we remove the cgroup root from the root_list. Consequently, + * when accessing a cgroup root, the cset_link may have already been + * freed, resulting in a NULL res_cgroup. However, by holding the + * cgroup_mutex, we ensure that res_cgroup can't be NULL. + * If we don't hold cgroup_mutex in the caller, we must do the NULL + * check. + */ return res_cgroup; }
@@ -1468,7 +1476,6 @@ static struct cgroup *current_cgns_cgrou static struct cgroup *cset_cgroup_from_root(struct css_set *cset, struct cgroup_root *root) { - lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock);
return __cset_cgroup_from_root(cset, root); @@ -1476,7 +1483,9 @@ static struct cgroup *cset_cgroup_from_r
/* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -2037,7 +2046,7 @@ void init_cgroup_root(struct cgroup_fs_c struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp;
- INIT_LIST_HEAD(&root->root_list); + INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); @@ -2120,7 +2129,7 @@ int cgroup_setup_root(struct cgroup_root * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++;
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nirmoy Das nirmoy.das@intel.com
[ Upstream commit eaee1c08586395182e0004b3512a2f83570ea461 ]
Implement i915_gem_fb_mmap() to enable fb_ops.fb_mmap() callback for i915's framebuffer objects.
v2: add a comment why i915_gem_object_get() needed(Andi). v3: mmap also ttm objects.
Cc: Matthew Auld matthew.auld@intel.com Cc: Andi Shyti andi.shyti@linux.intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Nikula jani.nikula@intel.com Cc: Imre Deak imre.deak@intel.com Signed-off-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230404143100.10452-3-nirmoy.... Stable-dep-of: 1ac5167b3a90 ("drm/i915/gem: Adjust vma offset for framebuffer mmap offset") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 137 +++++++++++++++-------- drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2 +- 2 files changed, 93 insertions(+), 46 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 1fd704d9cf9a9..180b66f6193cb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -969,53 +969,15 @@ static struct file *mmap_singleton(struct drm_i915_private *i915) return file; }
-/* - * This overcomes the limitation in drm_gem_mmap's assignment of a - * drm_gem_object as the vma->vm_private_data. Since we need to - * be able to resolve multiple mmap offsets which could be tied - * to a single gem object. - */ -int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) +static int +i915_gem_object_mmap(struct drm_i915_gem_object *obj, + struct i915_mmap_offset *mmo, + struct vm_area_struct *vma) { - struct drm_vma_offset_node *node; - struct drm_file *priv = filp->private_data; - struct drm_device *dev = priv->minor->dev; - struct drm_i915_gem_object *obj = NULL; - struct i915_mmap_offset *mmo = NULL; + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_device *dev = &i915->drm; struct file *anon;
- if (drm_dev_is_unplugged(dev)) - return -ENODEV; - - rcu_read_lock(); - drm_vma_offset_lock_lookup(dev->vma_offset_manager); - node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager, - vma->vm_pgoff, - vma_pages(vma)); - if (node && drm_vma_node_is_allowed(node, priv)) { - /* - * Skip 0-refcnted objects as it is in the process of being - * destroyed and will be invalid when the vma manager lock - * is released. - */ - if (!node->driver_private) { - mmo = container_of(node, struct i915_mmap_offset, vma_node); - obj = i915_gem_object_get_rcu(mmo->obj); - - GEM_BUG_ON(obj && obj->ops->mmap_ops); - } else { - obj = i915_gem_object_get_rcu - (container_of(node, struct drm_i915_gem_object, - base.vma_node)); - - GEM_BUG_ON(obj && !obj->ops->mmap_ops); - } - } - drm_vma_offset_unlock_lookup(dev->vma_offset_manager); - rcu_read_unlock(); - if (!obj) - return node ? -EACCES : -EINVAL; - if (i915_gem_object_is_readonly(obj)) { if (vma->vm_flags & VM_WRITE) { i915_gem_object_put(obj); @@ -1047,7 +1009,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) if (obj->ops->mmap_ops) { vma->vm_page_prot = pgprot_decrypted(vm_get_page_prot(vma->vm_flags)); vma->vm_ops = obj->ops->mmap_ops; - vma->vm_private_data = node->driver_private; + vma->vm_private_data = obj->base.vma_node.driver_private; return 0; }
@@ -1085,6 +1047,91 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) return 0; }
+/* + * This overcomes the limitation in drm_gem_mmap's assignment of a + * drm_gem_object as the vma->vm_private_data. Since we need to + * be able to resolve multiple mmap offsets which could be tied + * to a single gem object. + */ +int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct drm_vma_offset_node *node; + struct drm_file *priv = filp->private_data; + struct drm_device *dev = priv->minor->dev; + struct drm_i915_gem_object *obj = NULL; + struct i915_mmap_offset *mmo = NULL; + + if (drm_dev_is_unplugged(dev)) + return -ENODEV; + + rcu_read_lock(); + drm_vma_offset_lock_lookup(dev->vma_offset_manager); + node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager, + vma->vm_pgoff, + vma_pages(vma)); + if (node && drm_vma_node_is_allowed(node, priv)) { + /* + * Skip 0-refcnted objects as it is in the process of being + * destroyed and will be invalid when the vma manager lock + * is released. + */ + if (!node->driver_private) { + mmo = container_of(node, struct i915_mmap_offset, vma_node); + obj = i915_gem_object_get_rcu(mmo->obj); + + GEM_BUG_ON(obj && obj->ops->mmap_ops); + } else { + obj = i915_gem_object_get_rcu + (container_of(node, struct drm_i915_gem_object, + base.vma_node)); + + GEM_BUG_ON(obj && !obj->ops->mmap_ops); + } + } + drm_vma_offset_unlock_lookup(dev->vma_offset_manager); + rcu_read_unlock(); + if (!obj) + return node ? -EACCES : -EINVAL; + + return i915_gem_object_mmap(obj, mmo, vma); +} + +int i915_gem_fb_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_device *dev = &i915->drm; + struct i915_mmap_offset *mmo = NULL; + enum i915_mmap_type mmap_type; + struct i915_ggtt *ggtt = to_gt(i915)->ggtt; + + if (drm_dev_is_unplugged(dev)) + return -ENODEV; + + /* handle ttm object */ + if (obj->ops->mmap_ops) { + /* + * ttm fault handler, ttm_bo_vm_fault_reserved() uses fake offset + * to calculate page offset so set that up. + */ + vma->vm_pgoff += drm_vma_node_start(&obj->base.vma_node); + } else { + /* handle stolen and smem objects */ + mmap_type = i915_ggtt_has_aperture(ggtt) ? I915_MMAP_TYPE_GTT : I915_MMAP_TYPE_WC; + mmo = mmap_offset_attach(obj, mmap_type, NULL); + if (!mmo) + return -ENODEV; + } + + /* + * When we install vm_ops for mmap we are too late for + * the vm_ops->open() which increases the ref_count of + * this obj and then it gets decreased by the vm_ops->close(). + * To balance this increase the obj ref_count here. + */ + obj = i915_gem_object_get(obj); + return i915_gem_object_mmap(obj, mmo, vma); +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_gem_mman.c" #endif diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h index 1fa91b3033b35..196417fd0f5c4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h @@ -29,5 +29,5 @@ void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
void i915_gem_object_runtime_pm_release_mmap_offset(struct drm_i915_gem_object *obj); void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj); - +int i915_gem_fb_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma); #endif
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 3a89311387cde27da8e290458b2d037133c1f7b5 ]
The mmap_offset_attach() function returns error pointers, it doesn't return NULL.
Fixes: eaee1c085863 ("drm/i915: Add a function to mmap framebuffer obj") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Nirmoy Das nirmoy.das@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/ZH7tHLRZ9oBjedjN@moroto Stable-dep-of: 1ac5167b3a90 ("drm/i915/gem: Adjust vma offset for framebuffer mmap offset") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 180b66f6193cb..4a291d29c5af5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -1118,8 +1118,8 @@ int i915_gem_fb_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma /* handle stolen and smem objects */ mmap_type = i915_ggtt_has_aperture(ggtt) ? I915_MMAP_TYPE_GTT : I915_MMAP_TYPE_WC; mmo = mmap_offset_attach(obj, mmap_type, NULL); - if (!mmo) - return -ENODEV; + if (IS_ERR(mmo)) + return PTR_ERR(mmo); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@linux.intel.com
[ Upstream commit 1ac5167b3a90c9820daa64cc65e319b2d958d686 ]
When mapping a framebuffer object, the virtual memory area (VMA) offset ('vm_pgoff') should be adjusted by the start of the 'vma_node' associated with the object. This ensures that the VMA offset is correctly aligned with the corresponding offset within the GGTT aperture.
Increment vm_pgoff by the start of the vma_node with the offset= provided by the user.
Suggested-by: Chris Wilson chris.p.wilson@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Cc: stable@vger.kernel.org # v4.9+ [Joonas: Add Cc: stable] Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240802083850.103694-2-andi.s... (cherry picked from commit 60a2066c50058086510c91f404eb582029650970) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 4a291d29c5af5..7e9310d01dfdd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -1120,6 +1120,8 @@ int i915_gem_fb_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma mmo = mmap_offset_attach(obj, mmap_type, NULL); if (IS_ERR(mmo)) return PTR_ERR(mmo); + + vma->vm_pgoff += drm_vma_node_start(&mmo->vma_node); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 3eb3cd5992f7a0c37edc8d05b4c38c98758d8671 ]
Commit 04d82a6d0881 ("binfmt_flat: allow not offsetting data start") introduced a RISC-V specific variant of the FLAT format which does not allocate any space for the (obsolete) array of shared library pointers. However, it did not disable the code which initializes the array, resulting in the corruption of sizeof(long) bytes before the DATA segment, generally the end of the TEXT segment.
Introduce MAX_SHARED_LIBS_UPDATE which depends on the state of CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET to guard the initialization of the shared library pointer region so that it will only be initialized if space is reserved for it.
Fixes: 04d82a6d0881 ("binfmt_flat: allow not offsetting data start") Co-developed-by: Stefan O'Rear sorear@fastmail.com Signed-off-by: Stefan O'Rear sorear@fastmail.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Acked-by: Greg Ungerer gerg@linux-m68k.org Link: https://lore.kernel.org/r/20240807195119.it.782-kees@kernel.org Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/binfmt_flat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index c26545d71d39a..cd6d5bbb4b9df 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -72,8 +72,10 @@
#ifdef CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET #define DATA_START_OFFSET_WORDS (0) +#define MAX_SHARED_LIBS_UPDATE (0) #else #define DATA_START_OFFSET_WORDS (MAX_SHARED_LIBS) +#define MAX_SHARED_LIBS_UPDATE (MAX_SHARED_LIBS) #endif
struct lib_info { @@ -880,7 +882,7 @@ static int load_flat_binary(struct linux_binprm *bprm) return res;
/* Update data segment pointers for all libraries */ - for (i = 0; i < MAX_SHARED_LIBS; i++) { + for (i = 0; i < MAX_SHARED_LIBS_UPDATE; i++) { if (!libinfo.lib_list[i].loaded) continue; for (j = 0; j < MAX_SHARED_LIBS; j++) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream.
Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root.
The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information.
By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure.
Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long longman@redhat.com Acked-by: Yafang Shao laoar.shao@gmail.com Reviewed-by: Yosry Ahmed yosryahmed@google.com Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cgroup-defs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -525,6 +525,10 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id;
+ /* A list running through the active hierarchies */ + struct list_head root_list; + struct rcu_head rcu; /* Must be near the top */ + /* * The root cgroup. The containing cgroup_root will be destroyed on its * release. cgrp->ancestors[0] will be used overflowing into the @@ -538,10 +542,6 @@ struct cgroup_root { /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps;
- /* A list running through the active hierarchies */ - struct list_head root_list; - struct rcu_head rcu; - /* Hierarchy-specific flags */ unsigned int flags;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit d1cba2ea8121e7fdbe1328cea782876b1dd80993 ]
syzbot is able to trigger softlockups, setting NL80211_ATTR_TXQ_QUANTUM to 2^31.
We had a similar issue in sch_fq, fixed with commit d9e15a273306 ("pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM")
watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:0:24] Modules linked in: irq event stamp: 131135 hardirqs last enabled at (131134): [<ffff80008ae8778c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline] hardirqs last enabled at (131134): [<ffff80008ae8778c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95 hardirqs last disabled at (131135): [<ffff80008ae85378>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (131135): [<ffff80008ae85378>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (125892): [<ffff80008907e82c>] neigh_hh_init net/core/neighbour.c:1538 [inline] softirqs last enabled at (125892): [<ffff80008907e82c>] neigh_resolve_output+0x268/0x658 net/core/neighbour.c:1553 softirqs last disabled at (125896): [<ffff80008904166c>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19 CPU: 1 PID: 24 Comm: kworker/1:0 Not tainted 6.9.0-rc7-syzkaller-gfda5695d692c #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: mld mld_ifc_work pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __list_del include/linux/list.h:195 [inline] pc : __list_del_entry include/linux/list.h:218 [inline] pc : list_move_tail include/linux/list.h:310 [inline] pc : fq_tin_dequeue include/net/fq_impl.h:112 [inline] pc : ieee80211_tx_dequeue+0x6b8/0x3b4c net/mac80211/tx.c:3854 lr : __list_del_entry include/linux/list.h:218 [inline] lr : list_move_tail include/linux/list.h:310 [inline] lr : fq_tin_dequeue include/net/fq_impl.h:112 [inline] lr : ieee80211_tx_dequeue+0x67c/0x3b4c net/mac80211/tx.c:3854 sp : ffff800093d36700 x29: ffff800093d36a60 x28: ffff800093d36960 x27: dfff800000000000 x26: ffff0000d800ad50 x25: ffff0000d800abe0 x24: ffff0000d800abf0 x23: ffff0000e0032468 x22: ffff0000e00324d4 x21: ffff0000d800abf0 x20: ffff0000d800abf8 x19: ffff0000d800abf0 x18: ffff800093d363c0 x17: 000000000000d476 x16: ffff8000805519dc x15: ffff7000127a6cc8 x14: 1ffff000127a6cc8 x13: 0000000000000004 x12: ffffffffffffffff x11: ffff7000127a6cc8 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffff80009287aa08 x4 : 0000000000000008 x3 : ffff80008034c7fc x2 : ffff0000e0032468 x1 : 00000000da0e46b8 x0 : ffff0000e0032470 Call trace: __list_del include/linux/list.h:195 [inline] __list_del_entry include/linux/list.h:218 [inline] list_move_tail include/linux/list.h:310 [inline] fq_tin_dequeue include/net/fq_impl.h:112 [inline] ieee80211_tx_dequeue+0x6b8/0x3b4c net/mac80211/tx.c:3854 wake_tx_push_queue net/mac80211/util.c:294 [inline] ieee80211_handle_wake_tx_queue+0x118/0x274 net/mac80211/util.c:315 drv_wake_tx_queue net/mac80211/driver-ops.h:1350 [inline] schedule_and_wake_txq net/mac80211/driver-ops.h:1357 [inline] ieee80211_queue_skb+0x18e8/0x2244 net/mac80211/tx.c:1664 ieee80211_tx+0x260/0x400 net/mac80211/tx.c:1966 ieee80211_xmit+0x278/0x354 net/mac80211/tx.c:2062 __ieee80211_subif_start_xmit+0xab8/0x122c net/mac80211/tx.c:4338 ieee80211_subif_start_xmit+0xe0/0x438 net/mac80211/tx.c:4532 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x27c/0x938 net/core/dev.c:3547 __dev_queue_xmit+0x1678/0x33fc net/core/dev.c:4341 dev_queue_xmit include/linux/netdevice.h:3091 [inline] neigh_resolve_output+0x558/0x658 net/core/neighbour.c:1563 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x104c/0x1ee8 net/ipv6/ip6_output.c:137 ip6_finish_output+0x428/0x7a0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x270/0x594 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:450 [inline] NF_HOOK+0x160/0x4f0 include/linux/netfilter.h:314 mld_sendpack+0x7b4/0x10f4 net/ipv6/mcast.c:1818 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x840/0xd0c net/ipv6/mcast.c:2650 process_one_work+0x7b8/0x15d4 kernel/workqueue.c:3267 process_scheduled_works kernel/workqueue.c:3348 [inline] worker_thread+0x938/0xef4 kernel/workqueue.c:3429 kthread+0x288/0x310 kernel/kthread.c:388 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
Fixes: 52539ca89f36 ("cfg80211: Expose TXQ stats and parameters to userspace") Signed-off-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20240615160800.250667-1-edumazet@google.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/nl80211.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -463,6 +463,10 @@ nl80211_sta_wme_policy[NL80211_STA_WME_M [NL80211_STA_WME_MAX_SP] = { .type = NLA_U8 }, };
+static struct netlink_range_validation q_range = { + .max = INT_MAX, +}; + static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [0] = { .strict_start_type = NL80211_ATTR_HE_OBSS_PD }, [NL80211_ATTR_WIPHY] = { .type = NLA_U32 }, @@ -745,7 +749,7 @@ static const struct nla_policy nl80211_p
[NL80211_ATTR_TXQ_LIMIT] = { .type = NLA_U32 }, [NL80211_ATTR_TXQ_MEMORY_LIMIT] = { .type = NLA_U32 }, - [NL80211_ATTR_TXQ_QUANTUM] = { .type = NLA_U32 }, + [NL80211_ATTR_TXQ_QUANTUM] = NLA_POLICY_FULL_RANGE(NLA_U32, &q_range), [NL80211_ATTR_HE_CAPABILITY] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, validate_he_capa, NL80211_HE_MAX_CAPABILITY_LEN),
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Will Deacon will@kernel.org
commit 36e008323926036650299cfbb2dca704c7aba849 upstream.
The TLBI level hints are for leaf entries only, so take care not to pass them incorrectly after clearing a table entry.
Cc: Gavin Shan gshan@redhat.com Cc: Marc Zyngier maz@kernel.org Cc: Quentin Perret qperret@google.com Fixes: 82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2") Fixes: 6d9d2115c480 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table") Reviewed-by: Shaoqin Huang shahuang@redhat.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240327124853.11206-3-will@kernel.org Signed-off-by: Oliver Upton oliver.upton@linux.dev Cc: stable@vger.kernel.org # 6.1.y only [will@: Use '0' instead of TLBI_TTL_UNKNOWN_to indicate "no level". Force level to 0 in stage2_put_pte() if we're clearing a table entry.] Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/pgtable.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -475,7 +475,7 @@ static int hyp_unmap_walker(u64 addr, u6
kvm_clear_pte(ptep); dsb(ishst); - __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level); + __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), 0); } else { if (end - addr < granule) return -EINVAL; @@ -699,8 +699,14 @@ static void stage2_put_pte(kvm_pte_t *pt * Clear the existing PTE, and perform break-before-make with * TLB maintenance if it was valid. */ - if (kvm_pte_valid(*ptep)) { + kvm_pte_t pte = *ptep; + + if (kvm_pte_valid(pte)) { kvm_clear_pte(ptep); + + if (kvm_pte_table(pte, level)) + level = 0; + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level); }
Hi!
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
Am 15.08.2024 um 15:25 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
On 8/15/24 06:25, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Thu, 15 Aug 2024 at 16:05, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. No regressions on arm64, arm, x86_64 and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.1.106-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 09ce23af4dbba4a348ad90cbee52a46ef1d685cf * git describe: v6.1.105-39-g09ce23af4dbb * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.10...
## Test Regressions (compared to v6.1.104-150-g9bd7537a3dd0)
## Metric Regressions (compared to v6.1.104-150-g9bd7537a3dd0)
## Test Fixes (compared to v6.1.104-150-g9bd7537a3dd0)
## Metric Fixes (compared to v6.1.104-150-g9bd7537a3dd0)
## Test result summary total: 159479, pass: 137501, fail: 2504, skip: 19237, xfail: 237
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 135 total, 135 passed, 0 failed * arm64: 41 total, 41 passed, 0 failed * i386: 28 total, 28 passed, 0 failed * mips: 26 total, 25 passed, 1 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 35 passed, 1 failed * riscv: 11 total, 11 passed, 0 failed * s390: 14 total, 14 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 7 total, 7 passed, 0 failed * x86_64: 33 total, 33 passed, 0 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On Thu, Aug 15, 2024 at 03:25:34PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Thu, 15 Aug 2024 15:25:34 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.1: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.1.106-rc1-g09ce23af4dbb Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 8/15/24 6:25 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.106 release. There are 38 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
linux-stable-mirror@lists.linaro.org