Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Best regards, zyc
---- William Liu will@willsroot.io 在 Wed, 2025-12-10 11:31:02 写到:---
Hi zyc, The netdev maintainers and I are aware of this issue. This change was made to prevent a local DOS bug in netem that was extremely trivial to trigger, and some maintainers considered your type of configuration as an extremely rare use case (but imo still a very valid one). I provided a summary of this fix as well as other proposed fixes in both the commit log and [1]. There is currently a patch proposed in [2] to resolve this which technically obeys man page semantics but was previously rejected. Basically, the DOS can be easily resolved by enqueuing duplicated packets to the same netem qdisc, but other maintainers presented valid concerns about changing user visible behavior from the past 2 decades and the original commit message mentioned enqueuing from root was necessary to "avoid problems with qlen accounting with nested qdisc." tc_skb_extensions sounds like a reasonable solution to me that changes no existing behavior and adds no additional restrictions [3], but it has not been further explored or checked for soundness. Hypothetically, it would fix both the DOS bug and allow packets to retain the behavior of enqueuing from root. You can follow the status of the fix in [2] - I do not plan to further volunteer myself and my free time in this specific fix process due to the consistent pattern of unprofessional and patronizing emails from Cong Wang. I am confident the other netdev maintainers should reach a resolution soon. Best, Will [1] https://lore.kernel.org/netdev/PKMd5btHYmJcKSiIJdtxQvZBEfuS4RQkBnE4M-TZkjUq_... [2] https://lore.kernel.org/netdev/20251126195244.88124-4-xiyou.wangcong@gmail.c... [3] https://lore.kernel.org/netdev/CAM0EoMmBdZBzfUAms5-0hH5qF5ODvxWfgqrbHaGT6p3-...
Hi Will,
Thanks for the details. We will read the links you provide.
Zhang Yang Chao
---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit? thanks, greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114 Author: William Liu will@willsroot.io Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree
[ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]
netem_enqueue's duplication prevention logic breaks when a netem resides in a qdisc tree with other netems - this can lead to a soft lockup and OOM loop in netem_dequeue, as seen in [1]. Ensure that a duplicating netem cannot exist in a tree with other netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered too specific a use case to extend such a struct, though this would be a resilient fix and address other previous and potential future DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a per cpu variable. However, netem_dequeue can call enqueue on its child, and the depth restriction could be bypassed if the child is a netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb to handle the netem_dequeue case and track a packet's involvement in duplication. This is an overly complex approach, and Jamal notes that the skb cb can be overwritten to circumvent this safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral path contains a netem. However, filters and actions can cause a packet to change paths when re-enqueued to the root from netem duplication, leading us to the current solution: prevent a duplicating netem from inhabiting the same tree as other netems.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D... [2] https://lwn.net/Articles/719297/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu will@willsroot.io Reported-by: Savino Dicanosa savy@syst3mfailure.io Signed-off-by: William Liu will@willsroot.io Signed-off-by: Savino Dicanosa savy@syst3mfailure.io Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
zyc
On Wed, Dec 03, 2025 at 07:05:00PM +0800, zyc zyc wrote:
---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit? thanks, greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114 Author: William Liu will@willsroot.io Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree[ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]
So is this also an issue for you in the latest 6.17 release (or 6.18)? If not, what commit fixed this issue? If so, please contact all of the developers involved and they will be glad to work to resolve this regression in the mainline tree first.
thanks,
greg k-h
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit?
thanks,
greg k-h
---- Greg KH gregkh@linuxfoundation.org 在 Wed, 2025-12-03 19:30:42 写到:---
On Wed, Dec 03, 2025 at 07:05:00PM +0800, zyc zyc wrote:
---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit?
thanks,
greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114 Author: William Liu will@willsroot.io Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree [ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]So is this also an issue for you in the latest 6.17 release (or 6.18)? If not, what commit fixed this issue? If so, please contact all of the developers involved and they will be glad to work to resolve this regression in the mainline tree first. thanks, greg k-h
Hi Greg,
I can only test 6.12 stable kernels. Let me add Will.
Best, zyc
---- zyc zyc zyc199902@zohomail.cn 在 Fri, 2025-12-05 18:31:00 写到:---
---- Greg KH gregkh@linuxfoundation.org 在 Wed, 2025-12-03 19:30:42 写到:---
On Wed, Dec 03, 2025 at 07:05:00PM +0800, zyc zyc wrote:
---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
Hello, maintainer
I would like to report what appears to be a regression in 6.12.50 kernel release related to netem. It rejects our configuration with the message: Error: netem: cannot mix duplicating netems with other netems in tree.
This breaks setups that previously worked correctly for many years.
Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
two ECMP next hops with different misbehaviour characteristics, or
an HA firewall cluster where only one node is replaying frames, or
two LAG / ToR paths where one path intermittently duplicates packets.
In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication. This regression breaks existing automated tests, training environments, and network simulation pipelines.
I would be happy to provide our reproducer if needed.
Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit?
thanks,
greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114 Author: William Liu will@willsroot.io Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree [ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]So is this also an issue for you in the latest 6.17 release (or 6.18)? If not, what commit fixed this issue? If so, please contact all of the developers involved and they will be glad to work to resolve this regression in the mainline tree first.
thanks,
greg k-h
Hi Greg, I can only test 6.12 stable kernels. Let me add Will. Best, zyc
Hello Will
Could you help? This breaks our lab simulation.
Best, Zhang Yang Chao
Hi zyc,
The netdev maintainers and I are aware of this issue. This change was made to prevent a local DOS bug in netem that was extremely trivial to trigger, and some maintainers considered your type of configuration as an extremely rare use case (but imo still a very valid one). I provided a summary of this fix as well as other proposed fixes in both the commit log and [1].
There is currently a patch proposed in [2] to resolve this which technically obeys man page semantics but was previously rejected. Basically, the DOS can be easily resolved by enqueuing duplicated packets to the same netem qdisc, but other maintainers presented valid concerns about changing user visible behavior from the past 2 decades and the original commit message mentioned enqueuing from root was necessary to "avoid problems with qlen accounting with nested qdisc."
tc_skb_extensions sounds like a reasonable solution to me that changes no existing behavior and adds no additional restrictions [3], but it has not been further explored or checked for soundness. Hypothetically, it would fix both the DOS bug and allow packets to retain the behavior of enqueuing from root.
You can follow the status of the fix in [2] - I do not plan to further volunteer myself and my free time in this specific fix process due to the consistent pattern of unprofessional and patronizing emails from Cong Wang. I am confident the other netdev maintainers should reach a resolution soon.
Best, Will
[1] https://lore.kernel.org/netdev/PKMd5btHYmJcKSiIJdtxQvZBEfuS4RQkBnE4M-TZkjUq_... [2] https://lore.kernel.org/netdev/20251126195244.88124-4-xiyou.wangcong@gmail.c... [3] https://lore.kernel.org/netdev/CAM0EoMmBdZBzfUAms5-0hH5qF5ODvxWfgqrbHaGT6p3-...
On Monday, December 8th, 2025 at 9:59 AM, zyc zyc zyc199902@zohomail.cn wrote:
---- zyc zyc zyc199902@zohomail.cn 在 Fri, 2025-12-05 18:31:00 写到:---
---- Greg KH gregkh@linuxfoundation.org 在 Wed, 2025-12-03 19:30:42 写到:---
On Wed, Dec 03, 2025 at 07:05:00PM +0800, zyc zyc wrote:
---- Greg KH gregkh@linuxfoundation.org 在 Tue, 2025-12-02 19:30:09 写到:---
On Tue, Dec 02, 2025 at 06:39:00PM +0800, zyc zyc wrote:
Hello,
Resend my last email without HTML.
---- zyc zyc zyc199902@zohomail.cn 在 Sat, 2025-11-29 18:57:01 写到:---
> Hello, maintainer
> I would like to report what appears to be a regression in 6.12.50 kernel release related to netem.
> It rejects our configuration with the message:
> Error: netem: cannot mix duplicating netems with other netems in tree.
> This breaks setups that previously worked correctly for many years.
> Our team uses multiple netem qdiscs in the same HTB branch, arranged in a parallel fashion using a prio fan-out. Each branch of the prio qdisc has its own distinct netem instance with different duplication characteristics.
> This is used to emulate our production conditions where a single logical path fans out into two downstream segments, for example:
> two ECMP next hops with different misbehaviour characteristics, or
> an HA firewall cluster where only one node is replaying frames, or
> two LAG / ToR paths where one path intermittently duplicates packets.
> In our environments, only a subset of flows are affected, and different downstream devices may cause different styles of duplication.
> This regression breaks existing automated tests, training environments, and network simulation pipelines.
> I would be happy to provide our reproducer if needed.
> Thank you for your time and for maintaining Linux kernel.
Can you use 'git bisect' to find the offending commit?
thanks,
greg k-h
Hi Greg,
The error came from this commit:
commit 795cb393e38977aa991e70a9363da0ee734b2114
Author: William Liu will@willsroot.io
Date: Tue Jul 8 16:43:26 2025 +0000
net/sched: Restrict conditions for adding duplicating netems to qdisc tree
[ Upstream commit ec8e0e3d7adef940cdf9475e2352c0680189d14e ]
So is this also an issue for you in the latest 6.17 release (or 6.18)?
If not, what commit fixed this issue? If so, please contact all of the
developers involved and they will be glad to work to resolve this
regression in the mainline tree first.
thanks,
greg k-h
Hi Greg,
I can only test 6.12 stable kernels. Let me add Will.
Best,
zyc
Hello Will
Could you help? This breaks our lab simulation.
Best, Zhang Yang Chao
linux-stable-mirror@lists.linaro.org