Hi Eric,
A kernel bug report was opened against Ubuntu [0]. It was found that reverting the following commit resolved this bug:
commit b2504a5dbef3305ef41988ad270b0e8ec289331c Author: Eric Dumazet edumazet@google.com Date: Tue Jan 31 10:20:32 2017 -0800
net: reduce skb_warn_bad_offload() noise
The regression was introduced as of v4.11-rc1 and still exists in current mainline. I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request? This commit did in fact resolve another bug[1], but in the process introduced this regression. Thanks,
Joe
On Mon, Dec 11, 2017 at 3:35 PM, Joseph Salisbury joseph.salisbury@canonical.com wrote:
Hi Eric,
A kernel bug report was opened against Ubuntu [0]. It was found that reverting the following commit resolved this bug:
The recorded trace in that bug is against 4.10.0 with some backports. Given that commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") is implicated, I guess that that was backported from 4.11-rc1.
The WARN shows
e1000e: caps=(0x00000030002149a9, 0x0000000000000000) len=1701 data_len=1659 gso_size=1480 gso_type=2 ip_summed=0
The numbering changed in 4.14, but for this kernel
SKB_GSO_UDP = 1 << 1,
so this is a UFO packet with CHECKSUM_NONE. The stack shows
kernel: [570943.494549] skb_warn_bad_offload+0xd1/0x120 kernel: [570943.494550] __skb_gso_segment+0x17d/0x190 kernel: [570943.494564] validate_xmit_skb+0x14f/0x2a0 kernel: [570943.494565] validate_xmit_skb_list+0x43/0x70
so if that patch has been backported, then this must trigger in __skb_gso_segment on the return path from skb_mac_gso_segment.
Did you backport
commit 8d63bee643f1fb53e472f0e135cae4eb99d62d19 Author: Willem de Bruijn willemb@google.com Date: Tue Aug 8 14:22:55 2017 -0400
net: avoid skb_warn_bad_offload false positives on UFO
skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set.
Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE.
When __skb_gso_segment is called from the receive path, this triggers the warning again.
Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed.
See also this thread for context: http://patchwork.ozlabs.org/patch/799015/
Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net
Note that UFO was removed in 4.14 and that skb_warn_bad_offload can happen for various types of packets, so there may be multiple independent bug reports. I'm investigating two other non-UFO reports just now.
On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
Note that UFO was removed in 4.14 and that skb_warn_bad_offload can happen for various types of packets, so there may be multiple independent bug reports. I'm investigating two other non-UFO reports just now.
Meta-comment, now that UFO is gone from mainline, I'm wondering if I should just delete it from 4.4 and 4.9 as well. Any objections for that? I'd like to make it easy to maintain these kernels for a while, and having them diverge like this, with all of the issues around UFO, seems like it will just make life harder for myself if I leave it in.
Any opinions?
thanks,
greg k-h
On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
Note that UFO was removed in 4.14 and that skb_warn_bad_offload can happen for various types of packets, so there may be multiple independent bug reports. I'm investigating two other non-UFO reports just now.
Meta-comment, now that UFO is gone from mainline, I'm wondering if I should just delete it from 4.4 and 4.9 as well. Any objections for that? I'd like to make it easy to maintain these kernels for a while, and having them diverge like this, with all of the issues around UFO, seems like it will just make life harder for myself if I leave it in.
Any opinions?
Some of that removal had to be reverted with commit 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") for VM live migration between kernels.
Any backports probably should squash that in at the least. Just today another thread discussed that that patch may not address all open issues still, so it may be premature to backport at this point. http://lkml.kernel.org/r/d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de
From: Willem de Bruijn willemdebruijn.kernel@gmail.com Date: Mon, 11 Dec 2017 16:56:56 -0500
On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
Note that UFO was removed in 4.14 and that skb_warn_bad_offload can happen for various types of packets, so there may be multiple independent bug reports. I'm investigating two other non-UFO reports just now.
Meta-comment, now that UFO is gone from mainline, I'm wondering if I should just delete it from 4.4 and 4.9 as well. Any objections for that? I'd like to make it easy to maintain these kernels for a while, and having them diverge like this, with all of the issues around UFO, seems like it will just make life harder for myself if I leave it in.
Any opinions?
Some of that removal had to be reverted with commit 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") for VM live migration between kernels.
Any backports probably should squash that in at the least. Just today another thread discussed that that patch may not address all open issues still, so it may be premature to backport at this point. http://lkml.kernel.org/r/d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de
I would probably discourage backporting the UFO removal, at least for now.
On Tue, Dec 12, 2017 at 09:10:11AM -0500, David Miller wrote:
From: Willem de Bruijn willemdebruijn.kernel@gmail.com Date: Mon, 11 Dec 2017 16:56:56 -0500
On Mon, Dec 11, 2017 at 4:44 PM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Dec 11, 2017 at 04:25:26PM -0500, Willem de Bruijn wrote:
Note that UFO was removed in 4.14 and that skb_warn_bad_offload can happen for various types of packets, so there may be multiple independent bug reports. I'm investigating two other non-UFO reports just now.
Meta-comment, now that UFO is gone from mainline, I'm wondering if I should just delete it from 4.4 and 4.9 as well. Any objections for that? I'd like to make it easy to maintain these kernels for a while, and having them diverge like this, with all of the issues around UFO, seems like it will just make life harder for myself if I leave it in.
Any opinions?
Some of that removal had to be reverted with commit 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") for VM live migration between kernels.
Any backports probably should squash that in at the least. Just today another thread discussed that that patch may not address all open issues still, so it may be premature to backport at this point. http://lkml.kernel.org/r/d71df64e-e65f-4db4-6f2e-c002c15fcbe4@01019freenet.de
I would probably discourage backporting the UFO removal, at least for now.
Ok, thanks for letting me know, I'll ask again in 6 months or so :)
greg k-h
From: Joseph Salisbury joseph.salisbury@canonical.com Date: Mon, 11 Dec 2017 15:35:34 -0500
A kernel bug report was opened against Ubuntu [0]. It was found that reverting the following commit resolved this bug:
commit b2504a5dbef3305ef41988ad270b0e8ec289331c Author: Eric Dumazet edumazet@google.com Date: Tue Jan 31 10:20:32 2017 -0800
net: reduce skb_warn_bad_offload() noise
The regression was introduced as of v4.11-rc1 and still exists in current mainline. I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request? This commit did in fact resolve another bug[1], but in the process introduced this regression.
It helps if you can consolidate the information obtained in your bug tracking here in the email so that people on this list can get an idea of what the problem scope might be without having to go to your special bug tracking site.
This is really not about us being snobs about this mailing list, it's about you wanting to get a result. And you'll get a better result faster if you post the details here on the lsit because most developers are not going to go to your bug tracking site to read the bug comments.
Also, this isn't a functional regression, it is just that we are generating warnings that we didn't before. It doesn't mean that Eric's patch is wrong, it could just be that his new check is triggering for a bug that has always been there.
Scanning the bug myself it seems that the critical required component is IPSEC, and IPSEC has it's own way of doing segmentation offload.
Thanks.
linux-stable-mirror@lists.linaro.org