On 3/20/25 6:07 AM, James Thomas wrote:
> Hello all,
>
> I encountered an issue with the CPU affinity of tasks launched by systemd in a
> slice, after updating from systemd 254 to by systemd >= 256, on the LTS 5.15.x
> branch (tested on v5.15.179).
>
> Despite the slice file stipulating AllowedCPUS=2 (and confirming this was set in
> /sys/fs/cgroup/test.slice/cpuset.cpus) tasks launched in the slice would have
> the CPU affinity of the system.slice (i.e all by default) rather than 2.
>
> To reproduce:
>
> * Check kernel version and systemd version (I used a debian testing image for
> testing)
>
> ```
> # uname -r
> 5.15.179
> # systemctl --version
> systemd 257 (257.4-3)
> ...
> ```
>
> * Create a test.slice with AllowedCPUS=2
>
> ```
> # cat <<EOF > /usr/lib/systemd/system/test.slice
> [Unit]
> Description=Test slice
> Before=slices.target
> [Slice]
> AllowedCPUs=2
> [Install]
> WantedBy=slices.target
> EOF
> # systemctl daemon-reload && systemctl start test.slice
> ```
>
> * Confirm cpuset
>
> ```
> # cat /sys/fs/cgroup/test.slice/cpuset.cpus
> 2
> ```
>
> * Launch task in slice
>
> ```
> # systemd-run --slice test.slice yes
> Running as unit: run-r9187b97c6958498aad5bba213289ac56.service; invocation ID:
> f470f74047ac43b7a60861d03a7ef6f9
> # cat
> /sys/fs/cgroup/test.slice/run-r9187b97c6958498aad5bba213289ac56.service/cgroup.procs
>
> 317
> ```
>
> # Check affinity
>
> ```
> # taskset -pc 317
> pid 317's current affinity list: 0-7
> ```
>
> This issue is fixed by applying upstream commits:
>
> 18f9a4d47527772515ad6cbdac796422566e6440
> cgroup/cpuset: Skip spread flags update on v2
> and
> 42a11bf5c5436e91b040aeb04063be1710bb9f9c
> cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
>
> With these applied:
>
> ```
> # systemd-run --slice test.slice yes
> Running as unit: run-r442c444559ff49f48c6c2b8325b3b500.service; invocation ID:
> 5211167267154e9292cb6b854585cb91
> # cat /sys/fs/cgroup/test.slice/run-r442c444559ff49f48c6c2b8325b3b500.service
> 291
> # taskset -pc 291
> pid 291's current affinity list: 2
> ```
>
> Perhaps these are a good candidate for backport onto the 5.15 LTS branch?
>
> Thanks
> James
>
You should also send this email to stable(a)vger.kernel.org for
consideration into the 5.15 LTS branch.
Cheers,
Longman
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d(a)syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: syzbot+6105ffc1ded71d194d6d(a)syzkaller.appspotmail.com
cc: Jarkko Sakkinen <jarkko(a)kernel.org>
cc: Oleg Nesterov <oleg(a)redhat.com>
cc: Kees Cook <kees(a)kernel.org>
cc: Hillf Danton <hdanton(a)sina.com>,
cc: keyrings(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v6.10+
---
include/linux/key.h | 1 +
security/keys/gc.c | 4 +++-
security/keys/key.c | 2 ++
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/key.h b/include/linux/key.h
index 074dca3222b9..ba05de8579ec 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -236,6 +236,7 @@ struct key {
#define KEY_FLAG_ROOT_CAN_INVAL 7 /* set if key can be invalidated by root without permission */
#define KEY_FLAG_KEEP 8 /* set if key should not be removed */
#define KEY_FLAG_UID_KEYRING 9 /* set if key is a user or user session keyring */
+#define KEY_FLAG_FINAL_PUT 10 /* set if final put has happened on key */
/* the key type and key description string
* - the desc is used to match a key against search criteria
diff --git a/security/keys/gc.c b/security/keys/gc.c
index 7d687b0962b1..f27223ea4578 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -218,8 +218,10 @@ static void key_garbage_collector(struct work_struct *work)
key = rb_entry(cursor, struct key, serial_node);
cursor = rb_next(cursor);
- if (refcount_read(&key->usage) == 0)
+ if (test_bit(KEY_FLAG_FINAL_PUT, &key->flags)) {
+ smp_mb(); /* Clobber key->user after FINAL_PUT seen. */
goto found_unreferenced_key;
+ }
if (unlikely(gc_state & KEY_GC_REAPING_DEAD_1)) {
if (key->type == key_gc_dead_keytype) {
diff --git a/security/keys/key.c b/security/keys/key.c
index 3d7d185019d3..7198cd2ac3a3 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -658,6 +658,8 @@ void key_put(struct key *key)
key->user->qnbytes -= key->quotalen;
spin_unlock_irqrestore(&key->user->lock, flags);
}
+ smp_mb(); /* key->user before FINAL_PUT set. */
+ set_bit(KEY_FLAG_FINAL_PUT, &key->flags);
schedule_work(&key_gc_work);
}
}
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
commit 4a1d3acd6ea86075e77fcc1188c3fc372833ba73 upstream.
The nft_counter uses two s64 counters for statistics. Those two are
protected by a seqcount to ensure that the 64bit variable is always
properly seen during updates even on 32bit architectures where the store
is performed by two writes. A side effect is that the two counter (bytes
and packet) are written and read together in the same window.
This can be replaced with u64_stats_t. write_seqcount_begin()/ end() is
replaced with u64_stats_update_begin()/ end() and behaves the same way
as with seqcount_t on 32bit architectures. Additionally there is a
preempt_disable on PREEMPT_RT to ensure that a reader does not preempt a
writer.
On 64bit architectures the macros are removed and the reads happen
without any retries. This also means that the reader can observe one
counter (bytes) from before the update and the other counter (packets)
but that is okay since there is no requirement to have both counter from
the same update window.
Convert the statistic to u64_stats_t. There is one optimisation:
nft_counter_do_init() and nft_counter_clone() allocate a new per-CPU
counter and assign a value to it. During this assignment preemption is
disabled which is not needed because the counter is not yet exposed to
the system so there can not be another writer or reader. Therefore
disabling preemption is omitted and raw_cpu_ptr() is used to obtain a
pointer to a counter for the assignment.
Cc: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
---
I propose the backport, as this is a performance improvement. Note,
that this is a bugfix on RT kernels.
net/netfilter/nft_counter.c | 90 +++++++++++++++++++------------------
1 file changed, 46 insertions(+), 44 deletions(-)
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 781d3a26f5df..8d19bd001277 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -8,7 +8,7 @@
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
-#include <linux/seqlock.h>
+#include <linux/u64_stats_sync.h>
#include <linux/netlink.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nf_tables.h>
@@ -17,6 +17,11 @@
#include <net/netfilter/nf_tables_offload.h>
struct nft_counter {
+ u64_stats_t bytes;
+ u64_stats_t packets;
+};
+
+struct nft_counter_tot {
s64 bytes;
s64 packets;
};
@@ -25,25 +30,24 @@ struct nft_counter_percpu_priv {
struct nft_counter __percpu *counter;
};
-static DEFINE_PER_CPU(seqcount_t, nft_counter_seq);
+static DEFINE_PER_CPU(struct u64_stats_sync, nft_counter_sync);
static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
{
+ struct u64_stats_sync *nft_sync;
struct nft_counter *this_cpu;
- seqcount_t *myseq;
local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
- myseq = this_cpu_ptr(&nft_counter_seq);
-
- write_seqcount_begin(myseq);
+ nft_sync = this_cpu_ptr(&nft_counter_sync);
- this_cpu->bytes += pkt->skb->len;
- this_cpu->packets++;
+ u64_stats_update_begin(nft_sync);
+ u64_stats_add(&this_cpu->bytes, pkt->skb->len);
+ u64_stats_inc(&this_cpu->packets);
+ u64_stats_update_end(nft_sync);
- write_seqcount_end(myseq);
local_bh_enable();
}
@@ -66,17 +70,16 @@ static int nft_counter_do_init(const struct nlattr * const tb[],
if (cpu_stats == NULL)
return -ENOMEM;
- preempt_disable();
- this_cpu = this_cpu_ptr(cpu_stats);
+ this_cpu = raw_cpu_ptr(cpu_stats);
if (tb[NFTA_COUNTER_PACKETS]) {
- this_cpu->packets =
- be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]));
+ u64_stats_set(&this_cpu->packets,
+ be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])));
}
if (tb[NFTA_COUNTER_BYTES]) {
- this_cpu->bytes =
- be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]));
+ u64_stats_set(&this_cpu->bytes,
+ be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])));
}
- preempt_enable();
+
priv->counter = cpu_stats;
return 0;
}
@@ -104,40 +107,41 @@ static void nft_counter_obj_destroy(const struct nft_ctx *ctx,
}
static void nft_counter_reset(struct nft_counter_percpu_priv *priv,
- struct nft_counter *total)
+ struct nft_counter_tot *total)
{
+ struct u64_stats_sync *nft_sync;
struct nft_counter *this_cpu;
- seqcount_t *myseq;
local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
- myseq = this_cpu_ptr(&nft_counter_seq);
+ nft_sync = this_cpu_ptr(&nft_counter_sync);
+
+ u64_stats_update_begin(nft_sync);
+ u64_stats_add(&this_cpu->packets, -total->packets);
+ u64_stats_add(&this_cpu->bytes, -total->bytes);
+ u64_stats_update_end(nft_sync);
- write_seqcount_begin(myseq);
- this_cpu->packets -= total->packets;
- this_cpu->bytes -= total->bytes;
- write_seqcount_end(myseq);
local_bh_enable();
}
static void nft_counter_fetch(struct nft_counter_percpu_priv *priv,
- struct nft_counter *total)
+ struct nft_counter_tot *total)
{
struct nft_counter *this_cpu;
- const seqcount_t *myseq;
u64 bytes, packets;
unsigned int seq;
int cpu;
memset(total, 0, sizeof(*total));
for_each_possible_cpu(cpu) {
- myseq = per_cpu_ptr(&nft_counter_seq, cpu);
+ struct u64_stats_sync *nft_sync = per_cpu_ptr(&nft_counter_sync, cpu);
+
this_cpu = per_cpu_ptr(priv->counter, cpu);
do {
- seq = read_seqcount_begin(myseq);
- bytes = this_cpu->bytes;
- packets = this_cpu->packets;
- } while (read_seqcount_retry(myseq, seq));
+ seq = u64_stats_fetch_begin(nft_sync);
+ bytes = u64_stats_read(&this_cpu->bytes);
+ packets = u64_stats_read(&this_cpu->packets);
+ } while (u64_stats_fetch_retry(nft_sync, seq));
total->bytes += bytes;
total->packets += packets;
@@ -148,7 +152,7 @@ static int nft_counter_do_dump(struct sk_buff *skb,
struct nft_counter_percpu_priv *priv,
bool reset)
{
- struct nft_counter total;
+ struct nft_counter_tot total;
nft_counter_fetch(priv, &total);
@@ -236,7 +240,7 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src, g
struct nft_counter_percpu_priv *priv_clone = nft_expr_priv(dst);
struct nft_counter __percpu *cpu_stats;
struct nft_counter *this_cpu;
- struct nft_counter total;
+ struct nft_counter_tot total;
nft_counter_fetch(priv, &total);
@@ -244,11 +248,9 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src, g
if (cpu_stats == NULL)
return -ENOMEM;
- preempt_disable();
- this_cpu = this_cpu_ptr(cpu_stats);
- this_cpu->packets = total.packets;
- this_cpu->bytes = total.bytes;
- preempt_enable();
+ this_cpu = raw_cpu_ptr(cpu_stats);
+ u64_stats_set(&this_cpu->packets, total.packets);
+ u64_stats_set(&this_cpu->bytes, total.bytes);
priv_clone->counter = cpu_stats;
return 0;
@@ -266,17 +268,17 @@ static void nft_counter_offload_stats(struct nft_expr *expr,
const struct flow_stats *stats)
{
struct nft_counter_percpu_priv *priv = nft_expr_priv(expr);
+ struct u64_stats_sync *nft_sync;
struct nft_counter *this_cpu;
- seqcount_t *myseq;
local_bh_disable();
this_cpu = this_cpu_ptr(priv->counter);
- myseq = this_cpu_ptr(&nft_counter_seq);
+ nft_sync = this_cpu_ptr(&nft_counter_sync);
- write_seqcount_begin(myseq);
- this_cpu->packets += stats->pkts;
- this_cpu->bytes += stats->bytes;
- write_seqcount_end(myseq);
+ u64_stats_update_begin(nft_sync);
+ u64_stats_add(&this_cpu->packets, stats->pkts);
+ u64_stats_add(&this_cpu->bytes, stats->bytes);
+ u64_stats_update_end(nft_sync);
local_bh_enable();
}
@@ -285,7 +287,7 @@ void nft_counter_init_seqcount(void)
int cpu;
for_each_possible_cpu(cpu)
- seqcount_init(per_cpu_ptr(&nft_counter_seq, cpu));
+ u64_stats_init(per_cpu_ptr(&nft_counter_sync, cpu));
}
struct nft_expr_type nft_counter_type;
--
2.49.0
From: Kang Yang <quic_kangyang(a)quicinc.com>
[ Upstream commit 95c38953cb1ecf40399a676a1f85dfe2b5780a9a ]
When running 'rmmod ath10k', ath10k_sdio_remove() will free sdio
workqueue by destroy_workqueue(). But if CONFIG_INIT_ON_FREE_DEFAULT_ON
is set to yes, kernel panic will happen:
Call trace:
destroy_workqueue+0x1c/0x258
ath10k_sdio_remove+0x84/0x94
sdio_bus_remove+0x50/0x16c
device_release_driver_internal+0x188/0x25c
device_driver_detach+0x20/0x2c
This is because during 'rmmod ath10k', ath10k_sdio_remove() will call
ath10k_core_destroy() before destroy_workqueue(). wiphy_dev_release()
will finally be called in ath10k_core_destroy(). This function will free
struct cfg80211_registered_device *rdev and all its members, including
wiphy, dev and the pointer of sdio workqueue. Then the pointer of sdio
workqueue will be set to NULL due to CONFIG_INIT_ON_FREE_DEFAULT_ON.
After device release, destroy_workqueue() will use NULL pointer then the
kernel panic happen.
Call trace:
ath10k_sdio_remove
->ath10k_core_unregister
……
->ath10k_core_stop
->ath10k_hif_stop
->ath10k_sdio_irq_disable
->ath10k_hif_power_down
->del_timer_sync(&ar_sdio->sleep_timer)
->ath10k_core_destroy
->ath10k_mac_destroy
->ieee80211_free_hw
->wiphy_free
……
->wiphy_dev_release
->destroy_workqueue
Need to call destroy_workqueue() before ath10k_core_destroy(), free
the work queue buffer first and then free pointer of work queue by
ath10k_core_destroy(). This order matches the error path order in
ath10k_sdio_probe().
No work will be queued on sdio workqueue between it is destroyed and
ath10k_core_destroy() is called. Based on the call_stack above, the
reason is:
Only ath10k_sdio_sleep_timer_handler(), ath10k_sdio_hif_tx_sg() and
ath10k_sdio_irq_disable() will queue work on sdio workqueue.
Sleep timer will be deleted before ath10k_core_destroy() in
ath10k_hif_power_down().
ath10k_sdio_irq_disable() only be called in ath10k_hif_stop().
ath10k_core_unregister() will call ath10k_hif_power_down() to stop hif
bus, so ath10k_sdio_hif_tx_sg() won't be called anymore.
Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00189
Signed-off-by: Kang Yang <quic_kangyang(a)quicinc.com>
Tested-by: David Ruth <druth(a)chromium.org>
Reviewed-by: David Ruth <druth(a)chromium.org>
Link: https://patch.msgid.link/20241008022246.1010-1-quic_kangyang@quicinc.com
Signed-off-by: Jeff Johnson <quic_jjohnson(a)quicinc.com>
Signed-off-by: Alva Lan <alvalan9(a)foxmail.com>
---
drivers/net/wireless/ath/ath10k/sdio.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index 9d1b0890f310..418e40560f59 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -3,6 +3,7 @@
* Copyright (c) 2004-2011 Atheros Communications Inc.
* Copyright (c) 2011-2012,2017 Qualcomm Atheros, Inc.
* Copyright (c) 2016-2017 Erik Stromdahl <erik.stromdahl(a)gmail.com>
+ * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved.
*/
#include <linux/module.h>
@@ -2649,9 +2650,9 @@ static void ath10k_sdio_remove(struct sdio_func *func)
netif_napi_del(&ar->napi);
- ath10k_core_destroy(ar);
-
destroy_workqueue(ar_sdio->workqueue);
+
+ ath10k_core_destroy(ar);
}
static const struct sdio_device_id ath10k_sdio_devices[] = {
--
2.34.1
Hi,
Do we have any rule regarding whether a patch that adds a new test case
in tools/testing/selftests/ can be considered for backport?
For example, consider commit 0a5d2efa3827 ("selftests/bpf: Add test case
for the freeing of bpf_timer"), it adds a test case for the issue
addressed in the same series -- commit 58f038e6d209 ("bpf: Cancel the
running bpf_timer through kworker for PREEMPT_RT"). The latter has been
backported to 6.12.y.
Would commit 0a5d2efa3827 be a worthwhile add to 6.12.y as well?
IMO having such test case added would be helpful to check whether the
backported fix really works (assuming someone is willing to do the extra
work of finding, testing, and sending such tests); yet it does not seem
to fit into the current stable kernel rule set of:
- It or an equivalent fix must already exist in Linux mainline (upstream).
- It must be obviously correct and tested.
- It cannot be bigger than 100 lines, with context.
- It must follow the Documentation/process/submitting-patches.rst rules.
- It must either fix a real bug that bothers people or just add a device ID
Appreciate any clarification and/or feedback on this matter.
Thanks,
Shung-Hsi Yu
From: Sven Eckelmann <sven(a)narfation.org>
An OGMv1 and OGMv2 packet receive processing were not only limited by the
number of bytes in the received packet but also by the nodes maximum
aggregation packet size limit. But this limit is relevant for TX and not
for RX. It must not be enforced by batadv_(i)v_ogm_aggr_packet to avoid
loss of information in case of a different limit for sender and receiver.
This has a minor side effect for B.A.T.M.A.N. IV because the
batadv_iv_ogm_aggr_packet is also used for the preprocessing for the TX.
But since the aggregation code itself will not allow more than
BATADV_MAX_AGGREGATION_BYTES bytes, this check was never triggering (in
this context) prior of removing it.
Cc: stable(a)vger.kernel.org
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_iv_ogm.c | 3 +--
net/batman-adv/bat_v_ogm.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 07ae5dd1f150..b12645949ae5 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -325,8 +325,7 @@ batadv_iv_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/* send a batman ogm to a given interface */
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c
index e503ee0d896b..8f89ffe6020c 100644
--- a/net/batman-adv/bat_v_ogm.c
+++ b/net/batman-adv/bat_v_ogm.c
@@ -839,8 +839,7 @@ batadv_v_ogm_aggr_packet(int buff_pos, int packet_len,
/* check if there is enough space for the optional TVLV */
next_buff_pos += ntohs(ogm2_packet->tvlv_len);
- return (next_buff_pos <= packet_len) &&
- (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES);
+ return next_buff_pos <= packet_len;
}
/**
--
2.39.5
Hello:
This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni(a)redhat.com>:
On Fri, 14 Mar 2025 21:11:30 +0100 you wrote:
> Here are 3 unrelated fixes for the net tree.
>
> - Patch 1: fix data stream corruption when ending up not sending an
> ADD_ADDR.
>
> - Patch 2: fix missing getsockopt(IPV6_V6ONLY) support -- the set part
> is supported.
>
> [...]
Here is the summary with links:
- [net,1/3] mptcp: Fix data stream corruption in the address announcement
https://git.kernel.org/netdev/net/c/2c1f97a52cb8
- [net,2/3] mptcp: sockopt: fix getting IPV6_V6ONLY
(no matching commit)
- [net,3/3] mptcp: sockopt: fix getting freebind & transparent
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html