---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit? thanks, greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114 Author: William Liu will@willsroot.io Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree
[ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]
netem_enqueue's duplication prevention logic breaks when a netem resides in a qdisc tree with other netems - this can lead to a soft lockup and OOM loop in netem_dequeue, as seen in [1]. Ensure that a duplicating netem cannot exist in a tree with other netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered too specific a use case to extend such a struct, though this would be a resilient fix and address other previous and potential future DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a per cpu variable. However, netem_dequeue can call enqueue on its child, and the depth restriction could be bypassed if the child is a netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb to handle the netem_dequeue case and track a packet's involvement in duplication. This is an overly complex approach, and Jamal notes that the skb cb can be overwritten to circumvent this safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral path contains a netem. However, filters and actions can cause a packet to change paths when re-enqueued to the root from netem duplication, leading us to the current solution: prevent a duplicating netem from inhabiting the same tree as other netems.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D... [2] https://lwn.net/Articles/719297/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu will@willsroot.io Reported-by: Savino Dicanosa savy@syst3mfailure.io Signed-off-by: William Liu will@willsroot.io Signed-off-by: Savino Dicanosa savy@syst3mfailure.io Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
zyc