Patch "netem: apply correct delay when rate throttling" has been added to the 4.9-stable tree - Linux-stable-mirror

18 Mar 2018

This is a note to let you know that I've just added the patch titled
netem: apply correct delay when rate throttling
to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is:
     netem-apply-correct-delay-when-rate-throttling.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let stable@vger.kernel.org know about it.
...
From foo@baz Sun Mar 18 16:55:33 CET 2018
From: Nik Unger njunger@uwaterloo.ca
Date: Mon, 13 Mar 2017 10:16:58 -0700
Subject: netem: apply correct delay when rate throttling
From: Nik Unger njunger@uwaterloo.ca
[ Upstream commit 5080f39e8c72e01cf37e8359023e7018e2a4901e ]
I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:
1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
  machines
The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html
There appear to be two underlying bugs causing these effects:
- The first issue causes long delays when the rate is slow and no
  delay is configured (e.g., "rate 512kbit"). This is because SKBs are
  not orphaned when no delay is configured, so orphaning does not
  occur until *after* the rate-induced delay has been applied. For
  this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
  dramatically increases the measured bandwidth.
- The second issue is that rate-induced delays are not correctly
  applied, allowing SKB delays to occur in parallel. The indended
  approach is to compute the delay for an SKB and to add this delay to
  the end of the current queue. However, the code does not detect
  existing SKBs in the queue due to improperly testing sch->q.qlen,
  which is nonzero even when packets exist only in the
  rbtree. Consequently, new SKBs do not wait for the current queue to
  empty. When packet delays vary significantly (e.g., if packet sizes
  are different), then this also causes unintended reordering.
I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.
Signed-off-by: Nik Unger njunger@uwaterloo.ca
Signed-off-by: Stephen Hemminger stephen@networkplumber.org
Signed-off-by: David S. Miller davem@davemloft.net
Signed-off-by: Sasha Levin alexander.levin@microsoft.com
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
---
 net/sched/sch_netem.c |   26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -462,7 +462,7 @@ static int netem_enqueue(struct sk_buff
    /* If a delay is expected, orphan the skb. (orphaning usually takes
     * place at TX completion time, so _before_ the link transit delay)
     */
-	if (q->latency || q->jitter)
+	if (q->latency || q->jitter || q->rate)
    	skb_orphan_partial(skb);
/*
@@ -530,21 +530,31 @@ static int netem_enqueue(struct sk_buff
    	now = psched_get_time();
if (q->rate) {
-			struct sk_buff *last;
+			struct netem_skb_cb *last = NULL;
+
+			if (sch->q.tail)
+				last = netem_skb_cb(sch->q.tail);
+			if (q->t_root.rb_node) {
+				struct sk_buff *t_skb;
+				struct netem_skb_cb *t_last;
+
+				t_skb = netem_rb_to_skb(rb_last(&q->t_root));
+				t_last = netem_skb_cb(t_skb);
+				if (!last ||
+				    t_last->time_to_send > last->time_to_send) {
+					last = t_last;
+				}
+			}
-			if (sch->q.qlen)
-				last = sch->q.tail;
-			else
-				last = netem_rb_to_skb(rb_last(&q->t_root));
    		if (last) {
    			/*
    			 * Last packet in queue is reference point (now),
    			 * calculate this time bonus and subtract
    			 * from delay.
    			 */
-				delay -= netem_skb_cb(last)->time_to_send - now;
+				delay -= last->time_to_send - now;
    			delay = max_t(psched_tdiff_t, 0, delay);
-				now = netem_skb_cb(last)->time_to_send;
+				now = last->time_to_send;
    		}
delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q);
Patches currently in stable-queue which might be from njunger@uwaterloo.ca are
queue-4.9/netem-apply-correct-delay-when-rate-throttling.patch