On Thu, Jul 25, 2024 at 10:27 AM Willem de Bruijn willemdebruijn.kernel@gmail.com wrote:
On Thu, Jul 25, 2024 at 5:22 AM Denis Arefev arefev@swemel.ru wrote:
I checked the patch on three reproducers and all three DEFINITELY broke the core.
There are two malfunctions.
- No flag skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
If it is not set then __skb_linearize will not be executed in skb_checksum_help. sk_buff remains fragmented (non-linear) and this is the first warning. OR add skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG. OR ask Eric Dumazet (cef401de7be8c). Is checking if (skb_has_shared_frag(skb)) so important? in the skb_checksum_help function, is it enough if (skb_is_nonlinear(skb)) ?
Thanks for sharing the reproducers. Having a look.
Reproduced https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
That is against a v6.1 kernel, and the syzkaller page reports that it did not fail against a recent upstream commit. Will take a closer look at that.
But on v6.1, at least, the following did catch it:
@@ -72,6 +72,18 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, if (thlen < sizeof(*th)) goto out;
+ if (skb->ip_summed == CHECKSUM_PARTIAL && + skb->csum_start != skb->transport_header) { + skb_dump(KERN_INFO, skb, false); + goto out; + } +
And the geometry of the bad packet at that point:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0 [ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244 [ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0)) [ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536 ip_summed=3 complete_sw=0 valid=0 level=0)
Sharing sketch patch for any feedback. A few downsides:
The patch adds a branch in the semi hot path of TCP software segmentation for every packet. Including for the more common kernel stack generated packets. And it needs the same test in two locations in net/ipv4/udp_offload.c, for USO and UFO.
It is tempting to move it to the if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) branch below, as then it is limited to SKB_GSO_DODGY. But that does not catch dodgy packets that need software segmentation. Conversely, we could check in skb_segment before calling skb_checksum_help.
I'll be out for four days over the weekend. May have to delay until next week.
Should we revert that and create a new fix against the original issue?
We can, no strong preference.
On second thought, since this has to go to all the stable trees, let's keep it a single patch. Rather than a revert + new fix.