Hello,
we recently upgraded the Linux kernel from 5.11.21 to 5.12.12 in our video stream receiver appliance and noticed compression artifacts on video streams that were previously looking fine. We are receiving UDP multicast MPEG TS streams through an FFMpeg / libav layer which does the socket and lower level protocol handling. For affected kernels it spills the log with messages like
[mpegts @ 0x7fa130000900] Packet corrupt (stream = 0, dts = 6870802195). [mpegts @ 0x7fa11c000900] Packet corrupt (stream = 0, dts = 6870821068).
Bisecting identified commit 18f25dc399901426dff61e676ba603ff52c666f7 as the one introducing the problem in the mainline kernel. It was backported to the 5.12 series in 450687386cd16d081b58cd7a342acff370a96078. Some random observations which may help to understand what's going on:
* the problem exists in Linux 5.13 * reverting that commit on top of 5.13 makes the problem go away * Linux 5.10.45 is fine * no relevant output in dmesg * can be reproduced on different hardware (Intel, AMD, different NICs, ...) * we do use the bonding driver on the systems (but I did not yet verify that this is related) * we do not use vxlan (mentioned in the commit message) * the relevant code in FFMpeg identifying packet corruption is here: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mpegts.c#L2758
And the bonding configuration:
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v5.10.45
Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp2s0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0
Slave Interface: enp2s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
Slave Interface: enp3s0 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
If there is anything else I can do to help tracking this down please let me know.
Regards, -Matthias Treydte
[ adding Paolo, author of 18f25dc39990 ]
On 7/1/21 4:47 AM, Matthias Treydte wrote:
Hello,
we recently upgraded the Linux kernel from 5.11.21 to 5.12.12 in our video stream receiver appliance and noticed compression artifacts on video streams that were previously looking fine. We are receiving UDP multicast MPEG TS streams through an FFMpeg / libav layer which does the socket and lower level protocol handling. For affected kernels it spills the log with messages like
[mpegts @ 0x7fa130000900] Packet corrupt (stream = 0, dts = 6870802195). [mpegts @ 0x7fa11c000900] Packet corrupt (stream = 0, dts = 6870821068).
Bisecting identified commit 18f25dc399901426dff61e676ba603ff52c666f7 as the one introducing the problem in the mainline kernel. It was backported to the 5.12 series in 450687386cd16d081b58cd7a342acff370a96078. Some random observations which may help to understand what's going on:
* the problem exists in Linux 5.13 * reverting that commit on top of 5.13 makes the problem go away * Linux 5.10.45 is fine * no relevant output in dmesg * can be reproduced on different hardware (Intel, AMD, different NICs, ...) * we do use the bonding driver on the systems (but I did not yet verify that this is related) * we do not use vxlan (mentioned in the commit message) * the relevant code in FFMpeg identifying packet corruption is here: https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mpegts.c#L2758
And the bonding configuration:
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v5.10.45
Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp2s0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0
Slave Interface: enp2s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
Slave Interface: enp3s0 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
If there is anything else I can do to help tracking this down please let me know.
Regards, -Matthias Treydte
On Thu, Jul 1, 2021 at 11:39 AM David Ahern dsahern@gmail.com wrote:
[ adding Paolo, author of 18f25dc39990 ]
On 7/1/21 4:47 AM, Matthias Treydte wrote:
Hello,
we recently upgraded the Linux kernel from 5.11.21 to 5.12.12 in our video stream receiver appliance and noticed compression artifacts on video streams that were previously looking fine. We are receiving UDP multicast MPEG TS streams through an FFMpeg / libav layer which does the socket and lower level protocol handling. For affected kernels it spills the log with messages like
[mpegts @ 0x7fa130000900] Packet corrupt (stream = 0, dts = 6870802195). [mpegts @ 0x7fa11c000900] Packet corrupt (stream = 0, dts = 6870821068).
Bisecting identified commit 18f25dc399901426dff61e676ba603ff52c666f7 as the one introducing the problem in the mainline kernel. It was backported to the 5.12 series in 450687386cd16d081b58cd7a342acff370a96078. Some random observations which may help to understand what's going on:
- the problem exists in Linux 5.13
- reverting that commit on top of 5.13 makes the problem go away
- Linux 5.10.45 is fine
- no relevant output in dmesg
- can be reproduced on different hardware (Intel, AMD, different
NICs, ...)
- we do use the bonding driver on the systems (but I did not yet
verify that this is related)
- we do not use vxlan (mentioned in the commit message)
- the relevant code in FFMpeg identifying packet corruption is here:
https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mpegts.c#L2758
And the bonding configuration:
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v5.10.45
Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp2s0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0
Slave Interface: enp2s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
Slave Interface: enp3s0 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
If there is anything else I can do to help tracking this down please let me know.
That library does not enable UDP_GRO. You do not have any UDP based tunnel devices (besides vxlan) configured, either, right?
Then no socket lookup should take place, so sk is NULL.
It is also unlikely that the device has either of NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD configured. This can be checked with `ethtool -K $DEV`, shown as "rx-gro-list" and "rx-udp-gro-forwarding", respectively.
Then udp_gro_receive_segment is not called.
So this should just return the packet without applying any GRO.
I'm referring to this block of code in udp_gro_receive:
if (!sk || !udp_sk(sk)->gro_receive) { if (skb->dev->features & NETIF_F_GRO_FRAGLIST) NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled : 1;
if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) || (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) pp = call_gro_receive(udp_gro_receive_segment, head, skb); return pp; }
I don't see what could be up.
One possible short-term workaround is to disable GRO. If this commit is implicated, that should fix it. At some obvious possible cycle cost.
Hello,
On Thu, 2021-07-01 at 20:31 -0400, Willem de Bruijn wrote:
On Thu, Jul 1, 2021 at 11:39 AM David Ahern dsahern@gmail.com wrote:
[ adding Paolo, author of 18f25dc39990 ]
On 7/1/21 4:47 AM, Matthias Treydte wrote:
Hello,
we recently upgraded the Linux kernel from 5.11.21 to 5.12.12 in our video stream receiver appliance and noticed compression artifacts on video streams that were previously looking fine. We are receiving UDP multicast MPEG TS streams through an FFMpeg / libav layer which does the socket and lower level protocol handling. For affected kernels it spills the log with messages like
[mpegts @ 0x7fa130000900] Packet corrupt (stream = 0, dts = 6870802195). [mpegts @ 0x7fa11c000900] Packet corrupt (stream = 0, dts = 6870821068).
Bisecting identified commit 18f25dc399901426dff61e676ba603ff52c666f7 as the one introducing the problem in the mainline kernel. It was backported to the 5.12 series in 450687386cd16d081b58cd7a342acff370a96078. Some random observations which may help to understand what's going on:
- the problem exists in Linux 5.13
- reverting that commit on top of 5.13 makes the problem go away
- Linux 5.10.45 is fine
- no relevant output in dmesg
- can be reproduced on different hardware (Intel, AMD, different
NICs, ...)
- we do use the bonding driver on the systems (but I did not yet
verify that this is related)
- we do not use vxlan (mentioned in the commit message)
- the relevant code in FFMpeg identifying packet corruption is here:
https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mpegts.c#L2758
And the bonding configuration:
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v5.10.45
Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: enp2s0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0
Slave Interface: enp2s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
Slave Interface: enp3s0 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:ee:73:XX:XX:XX Slave queue ID: 0
If there is anything else I can do to help tracking this down please let me know.
Thank you for the report!
According to the above I would wild guess that the GRO layer is wrongly/unexpectly aggregating some UDP packets, but I do agree with Willem, it looks like the code pointed by the bisecting should not do anything evil here, but I'm likely missing some items.
Could you please: - tell how frequent is the pkt corruption, even a rough estimate of the frequency.
- provide the features exposed by the relevant devices: ethtool -k <nic name>
- check, if possibly, how exactly the pkts are corrupted. Wrong size? bad csum? what else?
- ideally a short pcap trace comprising the problematic packets would be great!
Thanks!
Paolo
Quoting Paolo Abeni pabeni@redhat.com:
Finally got your mail, seems it was delayed, sorry.
- check, if possibly, how exactly the pkts are corrupted. Wrong size?
bad csum? what else?
- ideally a short pcap trace comprising the problematic packets would
be great!
While trying to produce something I came up with an interesting observation which seems to indicate that the problem lies in the combination of GRO and fragmented UDP packets:
Tuning the sending side (for my testing purposes also FFMpeg) to send UDP packets of 1316 bytes tops makes the problem go away in the receiver. The value must be an exact multiple of 188 (the MPEG TS frame size) to cause FFMpeg not to send fragmented packets at all.
Using this we were able to do the following on our normal desktop machines:
The sending side uses an command like this:
ffmpeg \ -stream_loop -1 \ -re -i "$video" \ -c:v copy -c:a copy \ -f mpegts "udp://239.12.23.0:1935"
and the receiver (in our case using Linux 5.12.14) "mpv udp://239.12.23.0:1935" to see the stream. For our test $video was just some h264 encoded MKV I had laying around. The receiver sees compression artifacts and the "Packet corrupt" messages in the console. Now there are two ways to improve this situation:
1) The receiver uses ethtool to disable GRO 2) The sender changes the URL to be "udp://239.12.23.0:1935?pkt_size=1316"
At this point I assume there are better ways to reproduce this using netcat or the like. But being a video guy, well, here we are. ;-)
My knowledge about the inner workings of Linux' IP stack are lacking, but because tcpdump "sees" packets before they are reassembled and the problem seems to exist only with packets that were fragmented and reassembled (as they are presented to libav), I have the feeling that a pcap file would not be that helpful with this, right?
Regards, -Matthias
Quoting Willem de Bruijn willemdebruijn.kernel@gmail.com:
That library does not enable UDP_GRO. You do not have any UDP based tunnel devices (besides vxlan) configured, either, right?
The configuration is really minimal by now, I also took the bonding out of the equation. We have systemd configure "en*" with mDNS and DHCP enabled and that's it. The problem remains.
I also found new hardware on my desk today (some Intel SoC), showing exactly the same symptoms. So it's really nothing to do with the hardware.
It is also unlikely that the device has either of NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD configured. This can be checked with `ethtool -K $DEV`, shown as "rx-gro-list" and "rx-udp-gro-forwarding", respectively.
The full output of "ethtool -k enp5s0" from that SoC:
Features for enp5s0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: on tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-tunnel-remcsum-segmentation: off [fixed] tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] tx-udp-segmentation: on tx-gso-list: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] hw-tc-offload: on esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: off [fixed] tls-hw-tx-offload: off [fixed] tls-hw-rx-offload: off [fixed] rx-gro-hw: off [fixed] tls-hw-record: off [fixed] rx-gro-list: off macsec-hw-offload: off [fixed] rx-udp-gro-forwarding: off hsr-tag-ins-offload: off [fixed] hsr-tag-rm-offload: off [fixed] hsr-fwd-offload: off [fixed] hsr-dup-offload: off [fixed]
That's the only NIC on this board:
# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:30:d6:24:99:67 brd ff:ff:ff:ff:ff:ff
One possible short-term workaround is to disable GRO.
Indeed, "ethtool -K enp5s0 gro off" fixes the problem, and calling it with "gro on" brings it back.
And to answer Paolo's questions from his mail to the list (@Paolo: I'm not subscribed, please also send to me directly so I don't miss your mail)
Could you please:
- tell how frequent is the pkt corruption, even a rough estimate of the
frequency.
# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l 167
So there are 167 detected failures in 5 minutes, while the system is receiving at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter at least, but seems about right)
Next I'll try to capture some broken packets and reply in a separate mail, I'll have to figure out a good way to do this first.
Thanks for your help, -Matthias
Hello,
On Fri, 2021-07-02 at 14:36 +0200, Matthias Treydte wrote:
And to answer Paolo's questions from his mail to the list (@Paolo: I'm not subscribed, please also send to me directly so I don't miss your mail)
(yup, that is what I did ?!?)
Could you please:
- tell how frequent is the pkt corruption, even a rough estimate of the
frequency.
# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l 167
So there are 167 detected failures in 5 minutes, while the system is receiving at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter at least, but seems about right)
Intersting. The relevant UDP GRO features are already off, and this happens infrequently. Something is happening on a per packet basis, I can't guess what.
It looks like you should be able to collect more info WRT the packet corruption enabling debug logging at ffmpeg level, but I guess that will flood the logfile.
If you have the kernel debuginfo and the 'perf' tool available, could you please try:
perf probe -a 'udp_gro_receive sk sk->__sk_common.skc_dport' perf probe -a 'udp_gro_receive_segment'
# neet to wait until at least a pkt corruption happens, 10 second # should be more then enough perf record -a -e probe:udp_gro_receive -e probe:udp_gro_receive_segment sleep 10
perf script | gzip > perf_script.gz
and share the above? I fear it could be too big for the ML, feel free to send it directly to me.
Next I'll try to capture some broken packets and reply in a separate mail, I'll have to figure out a good way to do this first.
Looks like there is corrupted packet every ~2K UDP ones. If you capture a few thousends consecutive ones, than wireshark should probably help finding the suspicious ones.
Thanks!
Paolo
Hi,
On Fri, 2021-07-02 at 16:06 +0200, Paolo Abeni wrote:
On Fri, 2021-07-02 at 14:36 +0200, Matthias Treydte wrote:
And to answer Paolo's questions from his mail to the list (@Paolo: I'm not subscribed, please also send to me directly so I don't miss your mail)
(yup, that is what I did ?!?)
Could you please:
- tell how frequent is the pkt corruption, even a rough estimate of the
frequency.
# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l 167
So there are 167 detected failures in 5 minutes, while the system is receiving at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter at least, but seems about right)
I'm sorry for the high-frequency spamming.
Could you please try the following patch ? (only compile-tested)
I fear some packets may hang in the GRO engine for no reasons. --- diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 54e06b88af69..458c888337a5 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -526,6 +526,8 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) || (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) pp = call_gro_receive(udp_gro_receive_segment, head, skb); + else + goto out; return pp; }
Quoting Paolo Abeni pabeni@redhat.com:
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 54e06b88af69..458c888337a5 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -526,6 +526,8 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) || (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) pp = call_gro_receive(udp_gro_receive_segment, head, skb);
else
goto out; return pp; }
Impressive! This patch, applied to 5.13, fixes the problem. What I like even more is that it again confirms my suspicion that an "if" without an "else" is always a code smell. :-)
With this and the reproducer in my previous mail, is there still value in doing the "perf" stuff?
Thanks, -Matthias
On Fri, 2021-07-02 at 17:23 +0200, Matthias Treydte wrote:
Quoting Paolo Abeni pabeni@redhat.com:
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 54e06b88af69..458c888337a5 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -526,6 +526,8 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) || (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) pp = call_gro_receive(udp_gro_receive_segment, head, skb);
else
goto out; return pp; }
Impressive! This patch, applied to 5.13, fixes the problem. What I like even more is that it again confirms my suspicion that an "if" without an "else" is always a code smell. :-)
Thank you for the quick feedback! I'll submit formally soon, after more tests. I'll probably change the code to be something hopefully more readable, as follow:
--- diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 7a670757f37a..b3aabc886763 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -551,8 +551,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) || (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) - pp = call_gro_receive(udp_gro_receive_segment, head, skb); - return pp; + return call_gro_receive(udp_gro_receive_segment, head, skb); + + /* no GRO, be sure flush the current packet */ + goto out; } ---
With this and the reproducer in my previous mail, is there still value in doing the "perf" stuff?
Not needed, thank you!
Would be great instead if you could have a spin to the proposed variant above - not stritly needed, I'm really asking for a few extra miles here ;)
Cheers,
Paolo
Quoting Paolo Abeni pabeni@redhat.com:
Would be great instead if you could have a spin to the proposed variant above - not stritly needed, I'm really asking for a few extra miles here ;)
Although this variant lacks the "else", to my surprise I have to admit that it works just as well. :-)
It was a pleasure and thanks for your work, -Matthias
linux-stable-mirror@lists.linaro.org