The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0f6925b3e8da0dbbb52447ca8a8b42b371aac7db Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Fri, 2 Apr 2021 06:26:02 -0700
Subject: [PATCH] virtio_net: Do not pull payload in skb->head
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
uses more memory but also cause packet consumers to go over
a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy :
It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing
headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net
to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs")
Reported-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Link: https://www.spinics.net/lists/netdev/msg731397.html
Co-Developed-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: virtualization(a)lists.linux-foundation.org
Acked-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..0824e6999e49 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -406,9 +406,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
offset += hdr_padded_len;
p += hdr_padded_len;
- copy = len;
- if (copy > skb_tailroom(skb))
- copy = skb_tailroom(skb);
+ /* Copy all frame if it fits skb->head, otherwise
+ * we let virtio_net_hdr_to_skb() and GRO pull headers as needed.
+ */
+ if (len <= skb_tailroom(skb))
+ copy = len;
+ else
+ copy = ETH_HLEN + metasize;
skb_put_data(skb, p, copy);
if (metasize) {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 98775d7fa696..b465f8f3e554 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -65,14 +65,18 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
skb_reset_mac_header(skb);
if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
- u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
- u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
+ u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
+
+ if (!pskb_may_pull(skb, needed))
+ return -EINVAL;
if (!skb_partial_csum_set(skb, start, off))
return -EINVAL;
p_off = skb_transport_offset(skb) + thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
} else {
/* gso packets without NEEDS_CSUM do not set transport_offset.
@@ -102,14 +106,14 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
}
p_off = keys.control.thoff + thlen;
- if (p_off > skb_headlen(skb) ||
+ if (!pskb_may_pull(skb, p_off) ||
keys.basic.ip_proto != ip_proto)
return -EINVAL;
skb_set_transport_header(skb, keys.control.thoff);
} else if (gso_type) {
p_off = thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0f6925b3e8da0dbbb52447ca8a8b42b371aac7db Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Fri, 2 Apr 2021 06:26:02 -0700
Subject: [PATCH] virtio_net: Do not pull payload in skb->head
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
uses more memory but also cause packet consumers to go over
a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy :
It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing
headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net
to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs")
Reported-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Link: https://www.spinics.net/lists/netdev/msg731397.html
Co-Developed-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: virtualization(a)lists.linux-foundation.org
Acked-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..0824e6999e49 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -406,9 +406,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
offset += hdr_padded_len;
p += hdr_padded_len;
- copy = len;
- if (copy > skb_tailroom(skb))
- copy = skb_tailroom(skb);
+ /* Copy all frame if it fits skb->head, otherwise
+ * we let virtio_net_hdr_to_skb() and GRO pull headers as needed.
+ */
+ if (len <= skb_tailroom(skb))
+ copy = len;
+ else
+ copy = ETH_HLEN + metasize;
skb_put_data(skb, p, copy);
if (metasize) {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 98775d7fa696..b465f8f3e554 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -65,14 +65,18 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
skb_reset_mac_header(skb);
if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
- u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
- u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
+ u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
+
+ if (!pskb_may_pull(skb, needed))
+ return -EINVAL;
if (!skb_partial_csum_set(skb, start, off))
return -EINVAL;
p_off = skb_transport_offset(skb) + thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
} else {
/* gso packets without NEEDS_CSUM do not set transport_offset.
@@ -102,14 +106,14 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
}
p_off = keys.control.thoff + thlen;
- if (p_off > skb_headlen(skb) ||
+ if (!pskb_may_pull(skb, p_off) ||
keys.basic.ip_proto != ip_proto)
return -EINVAL;
skb_set_transport_header(skb, keys.control.thoff);
} else if (gso_type) {
p_off = thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0f6925b3e8da0dbbb52447ca8a8b42b371aac7db Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Fri, 2 Apr 2021 06:26:02 -0700
Subject: [PATCH] virtio_net: Do not pull payload in skb->head
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
uses more memory but also cause packet consumers to go over
a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy :
It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing
headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net
to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs")
Reported-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Link: https://www.spinics.net/lists/netdev/msg731397.html
Co-Developed-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: virtualization(a)lists.linux-foundation.org
Acked-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..0824e6999e49 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -406,9 +406,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
offset += hdr_padded_len;
p += hdr_padded_len;
- copy = len;
- if (copy > skb_tailroom(skb))
- copy = skb_tailroom(skb);
+ /* Copy all frame if it fits skb->head, otherwise
+ * we let virtio_net_hdr_to_skb() and GRO pull headers as needed.
+ */
+ if (len <= skb_tailroom(skb))
+ copy = len;
+ else
+ copy = ETH_HLEN + metasize;
skb_put_data(skb, p, copy);
if (metasize) {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 98775d7fa696..b465f8f3e554 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -65,14 +65,18 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
skb_reset_mac_header(skb);
if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
- u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
- u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
+ u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+ u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
+
+ if (!pskb_may_pull(skb, needed))
+ return -EINVAL;
if (!skb_partial_csum_set(skb, start, off))
return -EINVAL;
p_off = skb_transport_offset(skb) + thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
} else {
/* gso packets without NEEDS_CSUM do not set transport_offset.
@@ -102,14 +106,14 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
}
p_off = keys.control.thoff + thlen;
- if (p_off > skb_headlen(skb) ||
+ if (!pskb_may_pull(skb, p_off) ||
keys.basic.ip_proto != ip_proto)
return -EINVAL;
skb_set_transport_header(skb, keys.control.thoff);
} else if (gso_type) {
p_off = thlen;
- if (p_off > skb_headlen(skb))
+ if (!pskb_may_pull(skb, p_off))
return -EINVAL;
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 144748eb0c445091466c9b741ebd0bfcc5914f3d Mon Sep 17 00:00:00 2001
From: John Fastabend <john.fastabend(a)gmail.com>
Date: Thu, 1 Apr 2021 15:00:40 -0700
Subject: [PATCH] bpf, sockmap: Fix incorrect fwd_alloc accounting
Incorrect accounting fwd_alloc can result in a warning when the socket
is torn down,
[18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230
[...]
[18455.319543] Call Trace:
[18455.319556] inet_csk_destroy_sock+0xba/0x1f0
[18455.319577] tcp_rcv_state_process+0x1b4e/0x2380
[18455.319593] ? lock_downgrade+0x3a0/0x3a0
[18455.319617] ? tcp_finish_connect+0x1e0/0x1e0
[18455.319631] ? sk_reset_timer+0x15/0x70
[18455.319646] ? tcp_schedule_loss_probe+0x1b2/0x240
[18455.319663] ? lock_release+0xb2/0x3f0
[18455.319676] ? __release_sock+0x8a/0x1b0
[18455.319690] ? lock_downgrade+0x3a0/0x3a0
[18455.319704] ? lock_release+0x3f0/0x3f0
[18455.319717] ? __tcp_close+0x2c6/0x790
[18455.319736] ? tcp_v4_do_rcv+0x168/0x370
[18455.319750] tcp_v4_do_rcv+0x168/0x370
[18455.319767] __release_sock+0xbc/0x1b0
[18455.319785] __tcp_close+0x2ee/0x790
[18455.319805] tcp_close+0x20/0x80
This currently happens because on redirect case we do skb_set_owner_r()
with the original sock. This increments the fwd_alloc memory accounting
on the original sock. Then on redirect we may push this into the queue
of the psock we are redirecting to. When the skb is flushed from the
queue we give the memory back to the original sock. The problem is if
the original sock is destroyed/closed with skbs on another psocks queue
then the original sock will not have a way to reclaim the memory before
being destroyed. Then above warning will be thrown
sockA sockB
sk_psock_strp_read()
sk_psock_verdict_apply()
-- SK_REDIRECT --
sk_psock_skb_redirect()
skb_queue_tail(psock_other->ingress_skb..)
sk_close()
sock_map_unref()
sk_psock_put()
sk_psock_drop()
sk_psock_zap_ingress()
At this point we have torn down our own psock, but have the outstanding
skb in psock_other. Note that SK_PASS doesn't have this problem because
the sk_psock_drop() logic releases the skb, its still associated with
our psock.
To resolve lets only account for sockets on the ingress queue that are
still associated with the current socket. On the redirect case we will
check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting
until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
or received with sk_psock_skb_ingress memory will be claimed on psock_other.
Fixes: 6fa9201a89898 ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self")
Reported-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@jo…
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 1261512d6807..5def3a2e85be 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -488,6 +488,7 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb
if (unlikely(!msg))
return -EAGAIN;
sk_msg_init(msg);
+ skb_set_owner_r(skb, sk);
return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg);
}
@@ -790,7 +791,6 @@ static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int
{
switch (verdict) {
case __SK_REDIRECT:
- skb_set_owner_r(skb, sk);
sk_psock_skb_redirect(skb);
break;
case __SK_PASS:
@@ -808,10 +808,6 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
rcu_read_lock();
prog = READ_ONCE(psock->progs.skb_verdict);
if (likely(prog)) {
- /* We skip full set_owner_r here because if we do a SK_PASS
- * or SK_DROP we can skip skb memory accounting and use the
- * TLS context.
- */
skb->sk = psock->sk;
tcp_skb_bpf_redirect_clear(skb);
ret = sk_psock_bpf_run(psock, prog, skb);
@@ -880,12 +876,13 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
kfree_skb(skb);
goto out;
}
- skb_set_owner_r(skb, sk);
prog = READ_ONCE(psock->progs.skb_verdict);
if (likely(prog)) {
+ skb->sk = sk;
tcp_skb_bpf_redirect_clear(skb);
ret = sk_psock_bpf_run(psock, prog, skb);
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
+ skb->sk = NULL;
}
sk_psock_verdict_apply(psock, skb, ret);
out:
@@ -956,12 +953,13 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
kfree_skb(skb);
goto out;
}
- skb_set_owner_r(skb, sk);
prog = READ_ONCE(psock->progs.skb_verdict);
if (likely(prog)) {
+ skb->sk = sk;
tcp_skb_bpf_redirect_clear(skb);
ret = sk_psock_bpf_run(psock, prog, skb);
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
+ skb->sk = NULL;
}
sk_psock_verdict_apply(psock, skb, ret);
out:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca7a83e2487ad0bc9a3e0e7a8645354aa1782f13 Mon Sep 17 00:00:00 2001
From: Ciara Loftus <ciara.loftus(a)intel.com>
Date: Wed, 31 Mar 2021 06:12:18 +0000
Subject: [PATCH] libbpf: Only create rx and tx XDP rings when necessary
Prior to this commit xsk_socket__create(_shared) always attempted to create
the rx and tx rings for the socket. However this causes an issue when the
socket being setup is that which shares the fd with the UMEM. If a
previous call to this function failed with this socket after the rings were
set up, a subsequent call would always fail because the rings are not torn
down after the first call and when we try to set them up again we encounter
an error because they already exist. Solve this by remembering whether the
rings were set up by introducing new bools to struct xsk_umem which
represent the ring setup status and using them to determine whether or
not to set up the rings.
Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ciara Loftus <ciara.loftus(a)intel.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-4-ciara.loftus@intel.com
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5098d9e3b55a..d24b5cc720ec 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -59,6 +59,8 @@ struct xsk_umem {
int fd;
int refcount;
struct list_head ctx_list;
+ bool rx_ring_setup_done;
+ bool tx_ring_setup_done;
};
struct xsk_ctx {
@@ -857,6 +859,7 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
struct xsk_ctx *ctx;
int err, ifindex;
bool unmap = umem->fill_save != fill;
+ bool rx_setup_done = false, tx_setup_done = false;
if (!umem || !xsk_ptr || !(rx || tx))
return -EFAULT;
@@ -884,6 +887,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
}
} else {
xsk->fd = umem->fd;
+ rx_setup_done = umem->rx_ring_setup_done;
+ tx_setup_done = umem->tx_ring_setup_done;
}
ctx = xsk_get_ctx(umem, ifindex, queue_id);
@@ -902,7 +907,7 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
}
xsk->ctx = ctx;
- if (rx) {
+ if (rx && !rx_setup_done) {
err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
&xsk->config.rx_size,
sizeof(xsk->config.rx_size));
@@ -910,8 +915,10 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
err = -errno;
goto out_put_ctx;
}
+ if (xsk->fd == umem->fd)
+ umem->rx_ring_setup_done = true;
}
- if (tx) {
+ if (tx && !tx_setup_done) {
err = setsockopt(xsk->fd, SOL_XDP, XDP_TX_RING,
&xsk->config.tx_size,
sizeof(xsk->config.tx_size));
@@ -919,6 +926,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
err = -errno;
goto out_put_ctx;
}
+ if (xsk->fd == umem->fd)
+ umem->rx_ring_setup_done = true;
}
err = xsk_get_mmap_offsets(xsk->fd, &off);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a12f8836145ffe37e9c8733dce18c22fb668b66 Mon Sep 17 00:00:00 2001
From: Anirudh Rayabharam <mail(a)anirudhrb.com>
Date: Wed, 7 Apr 2021 22:57:22 +0530
Subject: [PATCH] net: hso: fix null-ptr-deref during tty device unregistration
Multiple ttys try to claim the same the minor number causing a double
unregistration of the same device. The first unregistration succeeds
but the next one results in a null-ptr-deref.
The get_free_serial_index() function returns an available minor number
but doesn't assign it immediately. The assignment is done by the caller
later. But before this assignment, calls to get_free_serial_index()
would return the same minor number.
Fix this by modifying get_free_serial_index to assign the minor number
immediately after one is found to be and rename it to obtain_minor()
to better reflect what it does. Similary, rename set_serial_by_index()
to release_minor() and modify it to free up the minor number of the
given hso_serial. Every obtain_minor() should have corresponding
release_minor() call.
Fixes: 72dc1c096c705 ("HSO: add option hso driver")
Reported-by: syzbot+c49fe6089f295a05e6f8(a)syzkaller.appspotmail.com
Tested-by: syzbot+c49fe6089f295a05e6f8(a)syzkaller.appspotmail.com
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Anirudh Rayabharam <mail(a)anirudhrb.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index 31d51346786a..9bc58e64b5b7 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -611,7 +611,7 @@ static struct hso_serial *get_serial_by_index(unsigned index)
return serial;
}
-static int get_free_serial_index(void)
+static int obtain_minor(struct hso_serial *serial)
{
int index;
unsigned long flags;
@@ -619,8 +619,10 @@ static int get_free_serial_index(void)
spin_lock_irqsave(&serial_table_lock, flags);
for (index = 0; index < HSO_SERIAL_TTY_MINORS; index++) {
if (serial_table[index] == NULL) {
+ serial_table[index] = serial->parent;
+ serial->minor = index;
spin_unlock_irqrestore(&serial_table_lock, flags);
- return index;
+ return 0;
}
}
spin_unlock_irqrestore(&serial_table_lock, flags);
@@ -629,15 +631,12 @@ static int get_free_serial_index(void)
return -1;
}
-static void set_serial_by_index(unsigned index, struct hso_serial *serial)
+static void release_minor(struct hso_serial *serial)
{
unsigned long flags;
spin_lock_irqsave(&serial_table_lock, flags);
- if (serial)
- serial_table[index] = serial->parent;
- else
- serial_table[index] = NULL;
+ serial_table[serial->minor] = NULL;
spin_unlock_irqrestore(&serial_table_lock, flags);
}
@@ -2230,6 +2229,7 @@ static int hso_stop_serial_device(struct hso_device *hso_dev)
static void hso_serial_tty_unregister(struct hso_serial *serial)
{
tty_unregister_device(tty_drv, serial->minor);
+ release_minor(serial);
}
static void hso_serial_common_free(struct hso_serial *serial)
@@ -2253,24 +2253,22 @@ static void hso_serial_common_free(struct hso_serial *serial)
static int hso_serial_common_create(struct hso_serial *serial, int num_urbs,
int rx_size, int tx_size)
{
- int minor;
int i;
tty_port_init(&serial->port);
- minor = get_free_serial_index();
- if (minor < 0)
+ if (obtain_minor(serial))
goto exit2;
/* register our minor number */
serial->parent->dev = tty_port_register_device_attr(&serial->port,
- tty_drv, minor, &serial->parent->interface->dev,
+ tty_drv, serial->minor, &serial->parent->interface->dev,
serial->parent, hso_serial_dev_groups);
- if (IS_ERR(serial->parent->dev))
+ if (IS_ERR(serial->parent->dev)) {
+ release_minor(serial);
goto exit2;
+ }
- /* fill in specific data for later use */
- serial->minor = minor;
serial->magic = HSO_SERIAL_MAGIC;
spin_lock_init(&serial->serial_lock);
serial->num_rx_urbs = num_urbs;
@@ -2667,9 +2665,6 @@ static struct hso_device *hso_create_bulk_serial_device(
serial->write_data = hso_std_serial_write_data;
- /* and record this serial */
- set_serial_by_index(serial->minor, serial);
-
/* setup the proc dirs and files if needed */
hso_log_port(hso_dev);
@@ -2726,9 +2721,6 @@ struct hso_device *hso_create_mux_serial_device(struct usb_interface *interface,
serial->shared_int->ref_count++;
mutex_unlock(&serial->shared_int->shared_int_lock);
- /* and record this serial */
- set_serial_by_index(serial->minor, serial);
-
/* setup the proc dirs and files if needed */
hso_log_port(hso_dev);
@@ -3113,7 +3105,6 @@ static void hso_free_interface(struct usb_interface *interface)
cancel_work_sync(&serial_table[i]->async_get_intf);
hso_serial_tty_unregister(serial);
kref_put(&serial_table[i]->ref, hso_serial_ref_free);
- set_serial_by_index(i, NULL);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a12f8836145ffe37e9c8733dce18c22fb668b66 Mon Sep 17 00:00:00 2001
From: Anirudh Rayabharam <mail(a)anirudhrb.com>
Date: Wed, 7 Apr 2021 22:57:22 +0530
Subject: [PATCH] net: hso: fix null-ptr-deref during tty device unregistration
Multiple ttys try to claim the same the minor number causing a double
unregistration of the same device. The first unregistration succeeds
but the next one results in a null-ptr-deref.
The get_free_serial_index() function returns an available minor number
but doesn't assign it immediately. The assignment is done by the caller
later. But before this assignment, calls to get_free_serial_index()
would return the same minor number.
Fix this by modifying get_free_serial_index to assign the minor number
immediately after one is found to be and rename it to obtain_minor()
to better reflect what it does. Similary, rename set_serial_by_index()
to release_minor() and modify it to free up the minor number of the
given hso_serial. Every obtain_minor() should have corresponding
release_minor() call.
Fixes: 72dc1c096c705 ("HSO: add option hso driver")
Reported-by: syzbot+c49fe6089f295a05e6f8(a)syzkaller.appspotmail.com
Tested-by: syzbot+c49fe6089f295a05e6f8(a)syzkaller.appspotmail.com
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Anirudh Rayabharam <mail(a)anirudhrb.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index 31d51346786a..9bc58e64b5b7 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -611,7 +611,7 @@ static struct hso_serial *get_serial_by_index(unsigned index)
return serial;
}
-static int get_free_serial_index(void)
+static int obtain_minor(struct hso_serial *serial)
{
int index;
unsigned long flags;
@@ -619,8 +619,10 @@ static int get_free_serial_index(void)
spin_lock_irqsave(&serial_table_lock, flags);
for (index = 0; index < HSO_SERIAL_TTY_MINORS; index++) {
if (serial_table[index] == NULL) {
+ serial_table[index] = serial->parent;
+ serial->minor = index;
spin_unlock_irqrestore(&serial_table_lock, flags);
- return index;
+ return 0;
}
}
spin_unlock_irqrestore(&serial_table_lock, flags);
@@ -629,15 +631,12 @@ static int get_free_serial_index(void)
return -1;
}
-static void set_serial_by_index(unsigned index, struct hso_serial *serial)
+static void release_minor(struct hso_serial *serial)
{
unsigned long flags;
spin_lock_irqsave(&serial_table_lock, flags);
- if (serial)
- serial_table[index] = serial->parent;
- else
- serial_table[index] = NULL;
+ serial_table[serial->minor] = NULL;
spin_unlock_irqrestore(&serial_table_lock, flags);
}
@@ -2230,6 +2229,7 @@ static int hso_stop_serial_device(struct hso_device *hso_dev)
static void hso_serial_tty_unregister(struct hso_serial *serial)
{
tty_unregister_device(tty_drv, serial->minor);
+ release_minor(serial);
}
static void hso_serial_common_free(struct hso_serial *serial)
@@ -2253,24 +2253,22 @@ static void hso_serial_common_free(struct hso_serial *serial)
static int hso_serial_common_create(struct hso_serial *serial, int num_urbs,
int rx_size, int tx_size)
{
- int minor;
int i;
tty_port_init(&serial->port);
- minor = get_free_serial_index();
- if (minor < 0)
+ if (obtain_minor(serial))
goto exit2;
/* register our minor number */
serial->parent->dev = tty_port_register_device_attr(&serial->port,
- tty_drv, minor, &serial->parent->interface->dev,
+ tty_drv, serial->minor, &serial->parent->interface->dev,
serial->parent, hso_serial_dev_groups);
- if (IS_ERR(serial->parent->dev))
+ if (IS_ERR(serial->parent->dev)) {
+ release_minor(serial);
goto exit2;
+ }
- /* fill in specific data for later use */
- serial->minor = minor;
serial->magic = HSO_SERIAL_MAGIC;
spin_lock_init(&serial->serial_lock);
serial->num_rx_urbs = num_urbs;
@@ -2667,9 +2665,6 @@ static struct hso_device *hso_create_bulk_serial_device(
serial->write_data = hso_std_serial_write_data;
- /* and record this serial */
- set_serial_by_index(serial->minor, serial);
-
/* setup the proc dirs and files if needed */
hso_log_port(hso_dev);
@@ -2726,9 +2721,6 @@ struct hso_device *hso_create_mux_serial_device(struct usb_interface *interface,
serial->shared_int->ref_count++;
mutex_unlock(&serial->shared_int->shared_int_lock);
- /* and record this serial */
- set_serial_by_index(serial->minor, serial);
-
/* setup the proc dirs and files if needed */
hso_log_port(hso_dev);
@@ -3113,7 +3105,6 @@ static void hso_free_interface(struct usb_interface *interface)
cancel_work_sync(&serial_table[i]->async_get_intf);
hso_serial_tty_unregister(serial);
kref_put(&serial_table[i]->ref, hso_serial_ref_free);
- set_serial_by_index(i, NULL);
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a91d3f02b04b2fb18c2dfa8b6c4e5a40a2753f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jacek=20Bu=C5=82atek?= <jacekx.bulatek(a)intel.com>
Date: Fri, 26 Feb 2021 13:19:29 -0800
Subject: [PATCH] ice: Fix for dereference of NULL pointer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Add handling of allocation fault for ice_vsi_list_map_info.
Also *fi should not be NULL pointer, it is a reference to raw
data field, so remove this variable and use the reference
directly.
Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming")
Signed-off-by: Jacek Bułatek <jacekx.bulatek(a)intel.com>
Co-developed-by: Haiyue Wang <haiyue.wang(a)intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang(a)intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c
index 67c965a3f5d2..387d3f6cd71e 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -1238,6 +1238,9 @@ ice_add_update_vsi_list(struct ice_hw *hw,
ice_create_vsi_list_map(hw, &vsi_handle_arr[0], 2,
vsi_list_id);
+ if (!m_entry->vsi_list_info)
+ return ICE_ERR_NO_MEMORY;
+
/* If this entry was large action then the large action needs
* to be updated to point to FWD to VSI list
*/
@@ -2220,6 +2223,7 @@ ice_vsi_uses_fltr(struct ice_fltr_mgmt_list_entry *fm_entry, u16 vsi_handle)
return ((fm_entry->fltr_info.fltr_act == ICE_FWD_TO_VSI &&
fm_entry->fltr_info.vsi_handle == vsi_handle) ||
(fm_entry->fltr_info.fltr_act == ICE_FWD_TO_VSI_LIST &&
+ fm_entry->vsi_list_info &&
(test_bit(vsi_handle, fm_entry->vsi_list_info->vsi_map))));
}
@@ -2292,14 +2296,12 @@ ice_add_to_vsi_fltr_list(struct ice_hw *hw, u16 vsi_handle,
return ICE_ERR_PARAM;
list_for_each_entry(fm_entry, lkup_list_head, list_entry) {
- struct ice_fltr_info *fi;
-
- fi = &fm_entry->fltr_info;
- if (!fi || !ice_vsi_uses_fltr(fm_entry, vsi_handle))
+ if (!ice_vsi_uses_fltr(fm_entry, vsi_handle))
continue;
status = ice_add_entry_to_vsi_fltr_list(hw, vsi_handle,
- vsi_list_head, fi);
+ vsi_list_head,
+ &fm_entry->fltr_info);
if (status)
return status;
}