[PATCH 5.7 00/20] 5.7.12-rc1 review

List overview All Threads
Download

newer

older

[PATCH 5.4 00/19] 5.4.55-rc1 review

[PATCH 4.19 00/17] 4.19.136-rc1...

Greg Kroah-Hartman

30 Jul 2020 30 Jul '20

8:03 a.m.

This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.12-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.

thanks,

greg k-h

------------- Pseudo-Shortlog of commits:

Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.7.12-rc1

Peng Fan peng.fan@nxp.com regmap: debugfs: check count when read regmap file

Jens Axboe axboe@kernel.dk io_uring: ensure double poll additions work with both request types

Tung Nguyen tung.q.nguyen@dektech.com.au tipc: allow to build NACK message in link timeout function

Kuniyuki Iwashima kuniyu@amazon.co.jp udp: Improve load balancing for SO_REUSEPORT.

Kuniyuki Iwashima kuniyu@amazon.co.jp udp: Copy has_conns in reuseport_grow().

Xin Long lucien.xin@gmail.com sctp: shrink stream outq when fails to do addstream reconf

Xin Long lucien.xin@gmail.com sctp: shrink stream outq only when new outcnt < old outcnt

Dan Carpenter dan.carpenter@oracle.com AX.25: Prevent integer overflows in connect and sendmsg

Yuchung Cheng ycheng@google.com tcp: allow at most one TLP probe per flight

David Howells dhowells@redhat.com rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA

Weilong Chen chenweilong@huawei.com rtnetlink: Fix memory(net_device) leak when ->newlink fails

Cong Wang xiyou.wangcong@gmail.com qrtr: orphan socket in qrtr_release()

Miaohe Lin linmiaohe@huawei.com net: udp: Fix wrong clean up for IS_UDPLITE macro

Xiongfeng Wang wangxiongfeng2@huawei.com net-sysfs: add a newline when printing 'tx_timeout' by sysfs

wenxu wenxu@ucloud.cn net/sched: act_ct: fix restore the qdisc_skb_cb after defrag

Wei Yongjun weiyongjun1@huawei.com ip6_gre: fix null-ptr-deref in ip6gre_init_net()

Xie He xie.he.0141@gmail.com drivers/net/wan/x25_asy: Fix to make it work

Subash Abhinov Kasiviswanathan subashab@codeaurora.org dev: Defer free of skbs in flush_backlog

Peilin Ye yepeilin.cs@gmail.com AX.25: Prevent out-of-bounds read in ax25_sendmsg()

Peilin Ye yepeilin.cs@gmail.com AX.25: Fix out-of-bounds read in ax25_connect()

-------------

Diffstat:

Show replies by date

Greg Kroah-Hartman

30 Jul 30 Jul

8:03 a.m.

New subject: [PATCH 5.7 01/20] AX.25: Fix out-of-bounds read in ax25_connect()

From: Peilin Ye yepeilin.cs@gmail.com

[ Upstream commit 2f2a7ffad5c6cbf3d438e813cfdc88230e185ba6 ]

Checks on `addr_len` and `fsa->fsa_ax25.sax25_ndigis` are insufficient. ax25_connect() can go out of bounds when `fsa->fsa_ax25.sax25_ndigis` equals to 7 or 8. Fix it.

This issue has been reported as a KMSAN uninit-value bug, because in such a case, ax25_connect() reaches into the uninitialized portion of the `struct sockaddr_storage` statically allocated in __sys_connect().

It is safe to remove `fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS` because `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`.

Reported-by: syzbot+c82752228ed975b0a623@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=55ef9d629f3b3d7d70b69558015b63b48d01af6... Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1187,7 +1187,9 @@ static int __must_check ax25_connect(str if (addr_len > sizeof(struct sockaddr_ax25) && fsa->fsa_ax25.sax25_ndigis != 0) { /* Valid number of digipeaters ? */ - if (fsa->fsa_ax25.sax25_ndigis < 1 || fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS) { + if (fsa->fsa_ax25.sax25_ndigis < 1 || + addr_len < sizeof(struct sockaddr_ax25) + + sizeof(ax25_address) * fsa->fsa_ax25.sax25_ndigis) { err = -EINVAL; goto out_release; }

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 02/20] AX.25: Prevent out-of-bounds read in ax25_sendmsg()

From: Peilin Ye yepeilin.cs@gmail.com

[ Upstream commit 8885bb0621f01a6c82be60a91e5fc0f6e2f71186 ]

Checks on `addr_len` and `usax->sax25_ndigis` are insufficient. ax25_sendmsg() can go out of bounds when `usax->sax25_ndigis` equals to 7 or 8. Fix it.

It is safe to remove `usax->sax25_ndigis > AX25_MAX_DIGIS`, since `addr_len` is guaranteed to be less than or equal to `sizeof(struct full_sockaddr_ax25)`

Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1509,7 +1509,8 @@ static int ax25_sendmsg(struct socket *s struct full_sockaddr_ax25 *fsa = (struct full_sockaddr_ax25 *)usax;

/* Valid number of digipeaters ? */ - if (usax->sax25_ndigis < 1 || usax->sax25_ndigis > AX25_MAX_DIGIS) { + if (usax->sax25_ndigis < 1 || addr_len < sizeof(struct sockaddr_ax25) + + sizeof(ax25_address) * usax->sax25_ndigis) { err = -EINVAL; goto out; }

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 03/20] dev: Defer free of skbs in flush_backlog

From: Subash Abhinov Kasiviswanathan subashab@codeaurora.org

[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue. Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context") Signed-off-by: Subash Abhinov Kasiviswanathan subashab@codeaurora.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/dev.c +++ b/net/core/dev.c @@ -5504,7 +5504,7 @@ static void flush_backlog(struct work_st skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->input_pkt_queue); - kfree_skb(skb); + dev_kfree_skb_irq(skb); input_queue_head_incr(sd); } }

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 04/20] drivers/net/wan/x25_asy: Fix to make it work

From: Xie He xie.he.0141@gmail.com

[ Upstream commit 8fdcabeac39824fe67480fd9508d80161c541854 ]

This driver is not working because of problems of its receiving code. This patch fixes it to make it work.

When the driver receives an LAPB frame, it should first pass the frame to the LAPB module to process. After processing, the LAPB module passes the data (the packet) back to the driver, the driver should then add a one-byte pseudo header and pass the data to upper layers.

The changes to the "x25_asy_bump" function and the "x25_asy_data_indication" function are to correctly implement this procedure.

Also, the "x25_asy_unesc" function ignores any frame that is shorter than 3 bytes. However the shortest frames are 2-byte long. So we need to change it to allow 2-byte frames to pass.

Cc: Eric Dumazet edumazet@google.com Cc: Martin Schiller ms@dev.tdt.de Signed-off-by: Xie He xie.he.0141@gmail.com Reviewed-by: Martin Schiller ms@dev.tdt.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wan/x25_asy.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)

--- a/drivers/net/wan/x25_asy.c +++ b/drivers/net/wan/x25_asy.c @@ -183,7 +183,7 @@ static inline void x25_asy_unlock(struct netif_wake_queue(sl->dev); }

-/* Send one completely decapsulated IP datagram to the IP layer. */ +/* Send an LAPB frame to the LAPB module to process. */

static void x25_asy_bump(struct x25_asy *sl) { @@ -195,13 +195,12 @@ static void x25_asy_bump(struct x25_asy count = sl->rcount; dev->stats.rx_bytes += count;

- skb = dev_alloc_skb(count+1); + skb = dev_alloc_skb(count); if (skb == NULL) { netdev_warn(sl->dev, "memory squeeze, dropping packet\n"); dev->stats.rx_dropped++; return; } - skb_push(skb, 1); /* LAPB internal control */ skb_put_data(skb, sl->rbuff, count); skb->protocol = x25_type_trans(skb, sl->dev); err = lapb_data_received(skb->dev, skb); @@ -209,7 +208,6 @@ static void x25_asy_bump(struct x25_asy kfree_skb(skb); printk(KERN_DEBUG "x25_asy: data received err - %d\n", err); } else { - netif_rx(skb); dev->stats.rx_packets++; } } @@ -356,12 +354,21 @@ static netdev_tx_t x25_asy_xmit(struct s */

/* - * Called when I frame data arrives. We did the work above - throw it - * at the net layer. + * Called when I frame data arrive. We add a pseudo header for upper + * layers and pass it to upper layers. */

static int x25_asy_data_indication(struct net_device *dev, struct sk_buff *skb) { + if (skb_cow(skb, 1)) { + kfree_skb(skb); + return NET_RX_DROP; + } + skb_push(skb, 1); + skb->data[0] = X25_IFACE_DATA; + + skb->protocol = x25_type_trans(skb, dev); + return netif_rx(skb); }

@@ -657,7 +664,7 @@ static void x25_asy_unesc(struct x25_asy switch (s) { case X25_END: if (!test_and_clear_bit(SLF_ERROR, &sl->flags) && - sl->rcount > 2) + sl->rcount >= 2) x25_asy_bump(sl); clear_bit(SLF_ESCAPE, &sl->flags); sl->rcount = 0;

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 05/20] ip6_gre: fix null-ptr-deref in ip6gre_init_net()

From: Wei Yongjun weiyongjun1@huawei.com

[ Upstream commit 46ef5b89ec0ecf290d74c4aee844f063933c4da4 ]

KASAN report null-ptr-deref error when register_netdev() failed:

KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7] CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12 Call Trace: ip6gre_init_net+0x4ab/0x580 ? ip6gre_tunnel_uninit+0x3f0/0x3f0 ops_init+0xa8/0x3c0 setup_net+0x2de/0x7e0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? ops_init+0x3c0/0x3c0 ? kasan_unpoison_shadow+0x33/0x40 ? __kasan_kmalloc.constprop.0+0xc2/0xd0 copy_net_ns+0x27d/0x530 create_new_namespaces+0x382/0xa30 unshare_nsproxy_namespaces+0xa1/0x1d0 ksys_unshare+0x39c/0x780 ? walk_process_tree+0x2a0/0x2a0 ? trace_hardirqs_on+0x4a/0x1b0 ? _raw_spin_unlock_irq+0x1f/0x30 ? syscall_trace_enter+0x1a7/0x330 ? do_syscall_64+0x1c/0xa0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x56/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9

ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving 'ign->fb_tunnel_dev' to local variable ndev.

Fixes: dafabb6590cb ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_gre.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1562,17 +1562,18 @@ static void ip6gre_destroy_tunnels(struc static int __net_init ip6gre_init_net(struct net *net) { struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); + struct net_device *ndev; int err;

if (!net_has_fallback_tunnels(net)) return 0; - ign->fb_tunnel_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6gre0", - NET_NAME_UNKNOWN, - ip6gre_tunnel_setup); - if (!ign->fb_tunnel_dev) { + ndev = alloc_netdev(sizeof(struct ip6_tnl), "ip6gre0", + NET_NAME_UNKNOWN, ip6gre_tunnel_setup); + if (!ndev) { err = -ENOMEM; goto err_alloc_dev; } + ign->fb_tunnel_dev = ndev; dev_net_set(ign->fb_tunnel_dev, net); /* FB netdevice is special: we have one, and only one per netns. * Allowing to move it to another netns is clearly unsafe. @@ -1592,7 +1593,7 @@ static int __net_init ip6gre_init_net(st return 0;

err_reg_dev: - free_netdev(ign->fb_tunnel_dev); + free_netdev(ndev); err_alloc_dev: return err; }

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 06/20] net/sched: act_ct: fix restore the qdisc_skb_cb after defrag

From: wenxu wenxu@ucloud.cn

[ Upstream commit ae372cb1750f6c95370f92fe5f5620e0954663ba ]

The fragment packets do defrag in tcf_ct_handle_fragments will clear the skb->cb which make the qdisc_skb_cb clear too. So the qdsic_skb_cb should be store before defrag and restore after that. It also update the pkt_len after all the fragments finish the defrag to one packet and make the following actions counter correct.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu wenxu@ucloud.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_ct.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)

--- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -671,9 +671,10 @@ static int tcf_ct_ipv6_is_fragment(struc }

static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, - u8 family, u16 zone) + u8 family, u16 zone, bool *defrag) { enum ip_conntrack_info ctinfo; + struct qdisc_skb_cb cb; struct nf_conn *ct; int err = 0; bool frag; @@ -691,6 +692,7 @@ static int tcf_ct_handle_fragments(struc return err;

skb_get(skb); + cb = *qdisc_skb_cb(skb);

if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; @@ -701,6 +703,9 @@ static int tcf_ct_handle_fragments(struc local_bh_enable(); if (err && err != -EINPROGRESS) goto out_free; + + if (!err) + *defrag = true; } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; @@ -709,12 +714,16 @@ static int tcf_ct_handle_fragments(struc err = nf_ct_frag6_gather(net, skb, user); if (err && err != -EINPROGRESS) goto out_free; + + if (!err) + *defrag = true; #else err = -EOPNOTSUPP; goto out_free; #endif }

+ *qdisc_skb_cb(skb) = cb; skb_clear_hash(skb); skb->ignore_df = 1; return err; @@ -912,6 +921,7 @@ static int tcf_ct_act(struct sk_buff *sk int nh_ofs, err, retval; struct tcf_ct_params *p; bool skip_add = false; + bool defrag = false; struct nf_conn *ct; u8 family;

@@ -942,7 +952,7 @@ static int tcf_ct_act(struct sk_buff *sk */ nh_ofs = skb_network_offset(skb); skb_pull_rcsum(skb, nh_ofs); - err = tcf_ct_handle_fragments(net, skb, family, p->zone); + err = tcf_ct_handle_fragments(net, skb, family, p->zone, &defrag); if (err == -EINPROGRESS) { retval = TC_ACT_STOLEN; goto out; @@ -1010,6 +1020,8 @@ out_push:

out: tcf_action_update_bstats(&c->common, skb); + if (defrag) + qdisc_skb_cb(skb)->pkt_len = skb->len; return retval;

drop:

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 07/20] net-sysfs: add a newline when printing tx_timeout by sysfs

From: Xiongfeng Wang wangxiongfeng2@huawei.com

[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout 0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/net-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1077,7 +1077,7 @@ static ssize_t tx_timeout_show(struct ne trans_timeout = queue->trans_timeout; spin_unlock_irq(&queue->_xmit_lock);

- return sprintf(buf, "%lu", trans_timeout); + return sprintf(buf, fmt_ulong, trans_timeout); }

static unsigned int get_netdev_queue_index(struct netdev_queue *queue)

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 08/20] net: udp: Fix wrong clean up for IS_UDPLITE macro

From: Miaohe Lin linmiaohe@huawei.com

[ Upstream commit b0a422772fec29811e293c7c0e6f991c0fd9241d ]

We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked.

Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp.c | 2 +- net/ipv6/udp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2048,7 +2048,7 @@ static int udp_queue_rcv_one_skb(struct /* * UDP-Lite specific tests, ignored on UDP sockets */ - if ((is_udplite & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) { + if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {

/* * MIB statistics other than incrementing the error count are --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -643,7 +643,7 @@ static int udpv6_queue_rcv_one_skb(struc /* * UDP-Lite specific tests, ignored on UDP sockets (see net/ipv4/udp.c). */ - if ((is_udplite & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) { + if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {

if (up->pcrlen == 0) { /* full coverage was set */ net_dbg_ratelimited("UDPLITE6: partial coverage %d while full coverage %d requested\n",

Greg Kroah-Hartman

8:03 a.m.

New subject: [PATCH 5.7 09/20] qrtr: orphan socket in qrtr_release()

From: Cong Wang xiyou.wangcong@gmail.com

[ Upstream commit af9f691f0f5bdd1ade65a7b84927639882d7c3e5 ]

We have to detach sock from socket in qrtr_release(), otherwise skb->sk may still reference to this socket when the skb is released in tun->queue, particularly sk->sk_wq still points to &sock->wq, which leads to a UAF.

Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space") Cc: Bjorn Andersson bjorn.andersson@linaro.org Cc: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/qrtr/qrtr.c | 1 + 1 file changed, 1 insertion(+)

--- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -1180,6 +1180,7 @@ static int qrtr_release(struct socket *s sk->sk_state_change(sk);

sock_set_flag(sk, SOCK_DEAD); + sock_orphan(sk); sock->sk = NULL;

if (!sock_flag(sk, SOCK_ZAPPED))

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 10/20] rtnetlink: Fix memory(net_device) leak when ->newlink fails

From: Weilong Chen chenweilong@huawei.com

[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

When vlan_newlink call register_vlan_dev fails, it might return error with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should free the memory. But currently rtnl_newlink only free the memory which state is NETREG_UNINITIALIZED.

BUG: memory leak unreferenced object 0xffff8881051de000 (size 4096): comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s) hex dump (first 32 bytes): 76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00 vlan2........... 00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00 .E(............. backtrace: [<0000000047527e31>] kmalloc_node include/linux/slab.h:578 [inline] [<0000000047527e31>] kvmalloc_node+0x33/0xd0 mm/util.c:574 [<000000002b59e3bc>] kvmalloc include/linux/mm.h:753 [inline] [<000000002b59e3bc>] kvzalloc include/linux/mm.h:761 [inline] [<000000002b59e3bc>] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929 [<000000006076752a>] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067 [<00000000572b3be5>] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329 [<00000000e84ea553>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397 [<0000000052c7c0a9>] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460 [<000000004b5cb379>] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469 [<00000000c71c20d3>] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] [<00000000c71c20d3>] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329 [<00000000cca72fa9>] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918 [<000000009221ebf7>] sock_sendmsg_nosec net/socket.c:652 [inline] [<000000009221ebf7>] sock_sendmsg+0x109/0x140 net/socket.c:672 [<000000001c30ffe4>] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352 [<00000000b71ca6f3>] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406 [<0000000007297384>] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439 [<000000000eb29b11>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<000000006839b4d0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Weilong Chen chenweilong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/rtnetlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3337,7 +3337,8 @@ replay: */ if (err < 0) { /* If device is not registered at all, free it now */ - if (dev->reg_state == NETREG_UNINITIALIZED) + if (dev->reg_state == NETREG_UNINITIALIZED || + dev->reg_state == NETREG_UNREGISTERED) free_netdev(dev); goto out; }

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 11/20] rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA

From: David Howells dhowells@redhat.com

[ Upstream commit 639f181f0ee20d3249dbc55f740f0167267180f0 ]

rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read as this particular error doesn't get stored in ->sk_err by the networking core.

Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive errors (there's no way for it to report which call, if any, the error was caused by).

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rxrpc/recvmsg.c | 2 +- net/rxrpc/sendmsg.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -464,7 +464,7 @@ try_again: list_empty(&rx->recvmsg_q) && rx->sk.sk_state != RXRPC_SERVER_LISTENING) { release_sock(&rx->sk); - return -ENODATA; + return -EAGAIN; }

if (list_empty(&rx->recvmsg_q)) { --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -306,7 +306,7 @@ static int rxrpc_send_data(struct rxrpc_ /* this should be in poll */ sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);

- if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) + if (sk->sk_shutdown & SEND_SHUTDOWN) return -EPIPE;

more = msg->msg_flags & MSG_MORE;

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 12/20] tcp: allow at most one TLP probe per flight

From: Yuchung Cheng ycheng@google.com

[ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]

Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially.

The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight.

Note that if the sender is app-limited, TLP retransmits old data and did not have this issue.

Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/tcp.h | 4 +++- net/ipv4/tcp_input.c | 11 ++++++----- net/ipv4/tcp_output.c | 13 ++++++++----- 3 files changed, 17 insertions(+), 11 deletions(-)

--- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -217,6 +217,8 @@ struct tcp_sock { } rack; u16 advmss; /* Advertised MSS */ u8 compressed_ack; + u8 tlp_retrans:1, /* TLP is a retransmission */ + unused:7; u32 chrono_start; /* Start time in jiffies of a TCP chrono */ u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ u8 chrono_type:2, /* current chronograph type */ @@ -239,7 +241,7 @@ struct tcp_sock { save_syn:1, /* Save headers of SYN packet */ is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */ syn_smc:1; /* SYN includes SMC */ - u32 tlp_high_seq; /* snd_nxt at the time of TLP retransmit. */ + u32 tlp_high_seq; /* snd_nxt at the time of TLP */

u32 tcp_tx_delay; /* delay (in usec) added to TX packets */ u64 tcp_wstamp_ns; /* departure time for next sent data packet */ --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3506,10 +3506,8 @@ static void tcp_replace_ts_recent(struct } }

-/* This routine deals with acks during a TLP episode. - * We mark the end of a TLP episode on receiving TLP dupack or when - * ack is after tlp_high_seq. - * Ref: loss detection algorithm in draft-dukkipati-tcpm-tcp-loss-probe. +/* This routine deals with acks during a TLP episode and ends an episode by + * resetting tlp_high_seq. Ref: TLP algorithm in draft-ietf-tcpm-rack */ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag) { @@ -3518,7 +3516,10 @@ static void tcp_process_tlp_ack(struct s if (before(ack, tp->tlp_high_seq)) return;

- if (flag & FLAG_DSACKING_ACK) { + if (!tp->tlp_retrans) { + /* TLP of new data has been acknowledged */ + tp->tlp_high_seq = 0; + } else if (flag & FLAG_DSACKING_ACK) { /* This DSACK means original and TLP probe arrived; no loss */ tp->tlp_high_seq = 0; } else if (after(ack, tp->tlp_high_seq)) { --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2625,6 +2625,11 @@ void tcp_send_loss_probe(struct sock *sk int pcount; int mss = tcp_current_mss(sk);

+ /* At most one outstanding TLP */ + if (tp->tlp_high_seq) + goto rearm_timer; + + tp->tlp_retrans = 0; skb = tcp_send_head(sk); if (skb && tcp_snd_wnd_test(tp, skb, mss)) { pcount = tp->packets_out; @@ -2642,10 +2647,6 @@ void tcp_send_loss_probe(struct sock *sk return; }

- /* At most one outstanding TLP retransmission. */ - if (tp->tlp_high_seq) - goto rearm_timer; - if (skb_still_in_host_queue(sk, skb)) goto rearm_timer;

@@ -2667,10 +2668,12 @@ void tcp_send_loss_probe(struct sock *sk if (__tcp_retransmit_skb(sk, skb, 1)) goto rearm_timer;

+ tp->tlp_retrans = 1; + +probe_sent: /* Record snd_nxt for loss detection. */ tp->tlp_high_seq = tp->snd_nxt;

-probe_sent: NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPLOSSPROBES); /* Reset s.t. tcp_rearm_rto will restart timer from now */ inet_csk(sk)->icsk_pending = 0;

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 13/20] AX.25: Prevent integer overflows in connect and sendmsg

From: Dan Carpenter dan.carpenter@oracle.com

[ Upstream commit 17ad73e941b71f3bec7523ea4e9cbc3752461c2d ]

We recently added some bounds checking in ax25_connect() and ax25_sendmsg() and we so we removed the AX25_MAX_DIGIS checks because they were no longer required.

Unfortunately, I believe they are required to prevent integer overflows so I have added them back.

Fixes: 8885bb0621f0 ("AX.25: Prevent out-of-bounds read in ax25_sendmsg()") Fixes: 2f2a7ffad5c6 ("AX.25: Fix out-of-bounds read in ax25_connect()") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ax25/af_ax25.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1188,6 +1188,7 @@ static int __must_check ax25_connect(str fsa->fsa_ax25.sax25_ndigis != 0) { /* Valid number of digipeaters ? */ if (fsa->fsa_ax25.sax25_ndigis < 1 || + fsa->fsa_ax25.sax25_ndigis > AX25_MAX_DIGIS || addr_len < sizeof(struct sockaddr_ax25) + sizeof(ax25_address) * fsa->fsa_ax25.sax25_ndigis) { err = -EINVAL; @@ -1509,7 +1510,9 @@ static int ax25_sendmsg(struct socket *s struct full_sockaddr_ax25 *fsa = (struct full_sockaddr_ax25 *)usax;

/* Valid number of digipeaters ? */ - if (usax->sax25_ndigis < 1 || addr_len < sizeof(struct sockaddr_ax25) + + if (usax->sax25_ndigis < 1 || + usax->sax25_ndigis > AX25_MAX_DIGIS || + addr_len < sizeof(struct sockaddr_ax25) + sizeof(ax25_address) * usax->sax25_ndigis) { err = -EINVAL; goto out;

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 14/20] sctp: shrink stream outq only when new outcnt < old outcnt

From: Xin Long lucien.xin@gmail.com

[ Upstream commit 8f13399db22f909a35735bf8ae2f932e0c8f0e30 ]

It's not necessary to go list_for_each for outq->out_chunk_list when new outcnt >= old outcnt, as no chunk with higher sid than new (outcnt - 1) exists in the outqueue.

While at it, also move the list_for_each code in a new function sctp_stream_shrink_out(), which will be used in the next patch.

Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/stream.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)

--- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -22,17 +22,11 @@ #include <net/sctp/sm.h> #include <net/sctp/stream_sched.h>

-/* Migrates chunks from stream queues to new stream queues if needed, - * but not across associations. Also, removes those chunks to streams - * higher than the new max. - */ -static void sctp_stream_outq_migrate(struct sctp_stream *stream, - struct sctp_stream *new, __u16 outcnt) +static void sctp_stream_shrink_out(struct sctp_stream *stream, __u16 outcnt) { struct sctp_association *asoc; struct sctp_chunk *ch, *temp; struct sctp_outq *outq; - int i;

asoc = container_of(stream, struct sctp_association, stream); outq = &asoc->outqueue; @@ -56,6 +50,19 @@ static void sctp_stream_outq_migrate(str

sctp_chunk_free(ch); } +} + +/* Migrates chunks from stream queues to new stream queues if needed, + * but not across associations. Also, removes those chunks to streams + * higher than the new max. + */ +static void sctp_stream_outq_migrate(struct sctp_stream *stream, + struct sctp_stream *new, __u16 outcnt) +{ + int i; + + if (stream->outcnt > outcnt) + sctp_stream_shrink_out(stream, outcnt);

if (new) { /* Here we actually move the old ext stuff into the new

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 15/20] sctp: shrink stream outq when fails to do addstream reconf

From: Xin Long lucien.xin@gmail.com

[ Upstream commit 3ecdda3e9ad837cf9cb41b6faa11b1af3a5abc0c ]

When adding a stream with stream reconf, the new stream firstly is in CLOSED state but new out chunks can still be enqueued. Then once gets the confirmation from the peer, the state will change to OPEN.

However, if the peer denies, it needs to roll back the stream. But when doing that, it only sets the stream outcnt back, and the chunks already in the new stream don't get purged. It caused these chunks can still be dequeued in sctp_outq_dequeue_data().

As its stream is still in CLOSE, the chunk will be enqueued to the head again by sctp_outq_head_data(). This chunk will never be sent out, and the chunks after it can never be dequeued. The assoc will be 'hung' in a dead loop of sending this chunk.

To fix it, this patch is to purge these chunks already in the new stream by calling sctp_stream_shrink_out() when failing to do the addstream reconf.

Fixes: 11ae76e67a17 ("sctp: implement receiver-side procedures for the Reconf Response Parameter") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/stream.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -1044,11 +1044,13 @@ struct sctp_chunk *sctp_process_strreset nums = ntohs(addstrm->number_of_streams); number = stream->outcnt - nums;

- if (result == SCTP_STRRESET_PERFORMED) + if (result == SCTP_STRRESET_PERFORMED) { for (i = number; i < stream->outcnt; i++) SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN; - else + } else { + sctp_stream_shrink_out(stream, number); stream->outcnt = number; + }

*evp = sctp_ulpevent_make_stream_change_event(asoc, flags, 0, nums, GFP_ATOMIC);

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 16/20] udp: Copy has_conns in reuseport_grow().

From: Kuniyuki Iwashima kuniyu@amazon.co.jp

[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score.

However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn willemb@google.com Reviewed-by: Benjamin Herrenschmidt benh@amazon.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.co.jp Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_reuseport.c | 1 + 1 file changed, 1 insertion(+)

--- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -101,6 +101,7 @@ static struct sock_reuseport *reuseport_ more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; + more_reuse->has_conns = reuse->has_conns;

memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *));

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 17/20] udp: Improve load balancing for SO_REUSEPORT.

From: Kuniyuki Iwashima kuniyu@amazon.co.jp

[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]

Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group.

Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets.

So, the result of reuseport_select_sock() should be used for load balancing.

The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock().

index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0%

If most of the sockets are connected, this can be a problem, but it still works better than now.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn willemb@google.com Reviewed-by: Benjamin Herrenschmidt benh@amazon.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.co.jp Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp.c | 15 +++++++++------ net/ipv6/udp.c | 15 +++++++++------ 2 files changed, 18 insertions(+), 12 deletions(-)

--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -413,7 +413,7 @@ static struct sock *udp4_lib_lookup2(str struct udp_hslot *hslot2, struct sk_buff *skb) { - struct sock *sk, *result; + struct sock *sk, *result, *reuseport_result; int score, badness; u32 hash = 0;

@@ -423,17 +423,20 @@ static struct sock *udp4_lib_lookup2(str score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { + reuseport_result = NULL; + if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - result = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - if (result && !reuseport_has_conns(sk, false)) - return result; + reuseport_result = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); + if (reuseport_result && !reuseport_has_conns(sk, false)) + return reuseport_result; } + + result = reuseport_result ? : sk; badness = score; - result = sk; } } return result; --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -148,7 +148,7 @@ static struct sock *udp6_lib_lookup2(str int dif, int sdif, struct udp_hslot *hslot2, struct sk_buff *skb) { - struct sock *sk, *result; + struct sock *sk, *result, *reuseport_result; int score, badness; u32 hash = 0;

@@ -158,17 +158,20 @@ static struct sock *udp6_lib_lookup2(str score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { + reuseport_result = NULL; + if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);

- result = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - if (result && !reuseport_has_conns(sk, false)) - return result; + reuseport_result = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); + if (reuseport_result && !reuseport_has_conns(sk, false)) + return reuseport_result; } - result = sk; + + result = reuseport_result ? : sk; badness = score; } }

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 18/20] tipc: allow to build NACK message in link timeout function

From: Tung Nguyen tung.q.nguyen@dektech.com.au

[ Upstream commit 6ef9dcb78046b346b5508ca1659848b136a343c2 ]

Commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") eliminated sending of the 'gap' indicator in regular ACK messages and only allowed to build NACK message with enabled probe/probe_reply. However, necessary correction for building NACK message was missed in tipc_link_timeout() function. This leads to significant delay and link reset (due to retransmission failure) in lossy environment.

This commit fixes it by setting the 'probe' flag to 'true' when the receive deferred queue is not empty. As a result, NACK message will be built to send back to another peer.

Fixes: 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Tung Nguyen tung.q.nguyen@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -813,11 +813,11 @@ int tipc_link_timeout(struct tipc_link * state |= l->bc_rcvlink->rcv_unacked; state |= l->rcv_unacked; state |= !skb_queue_empty(&l->transmq); - state |= !skb_queue_empty(&l->deferdq); probe = mstate->probing; probe |= l->silent_intv_cnt; if (probe || mstate->monitoring) l->silent_intv_cnt++; + probe |= !skb_queue_empty(&l->deferdq); if (l->snd_nxt == l->checkpoint) { tipc_link_update_cwin(l, 0, 0); probe = true;

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 19/20] io_uring: ensure double poll additions work with both request types

From: Jens Axboe axboe@kernel.dk

commit 807abcb0883439af5ead73f3308310453b97b624 upstream.

The double poll additions were centered around doing POLL_ADD on file descriptors that use more than one waitqueue (typically one for read, one for write) when being polled. However, it can also end up being triggered for when we use poll triggered retry. For that case, we cannot safely use req->io, as that could be used by the request type itself.

Add a second io_poll_iocb pointer in the structure we allocate for poll based retry, and ensure we use the right one from the two paths.

Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org

--- fs/io_uring.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-)

--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -581,6 +581,7 @@ enum {

struct async_poll { struct io_poll_iocb poll; + struct io_poll_iocb *double_poll; struct io_wq_work work; };

@@ -4220,9 +4221,9 @@ static bool io_poll_rewait(struct io_kio return false; }

-static void io_poll_remove_double(struct io_kiocb *req) +static void io_poll_remove_double(struct io_kiocb *req, void *data) { - struct io_poll_iocb *poll = (struct io_poll_iocb *) req->io; + struct io_poll_iocb *poll = data;

lockdep_assert_held(&req->ctx->completion_lock);

@@ -4242,7 +4243,7 @@ static void io_poll_complete(struct io_k { struct io_ring_ctx *ctx = req->ctx;

- io_poll_remove_double(req); + io_poll_remove_double(req, req->io); req->poll.done = true; io_cqring_fill_event(req, error ? error : mangle_poll(mask)); io_commit_cqring(ctx); @@ -4285,21 +4286,21 @@ static int io_poll_double_wake(struct wa int sync, void *key) { struct io_kiocb *req = wait->private; - struct io_poll_iocb *poll = (struct io_poll_iocb *) req->io; + struct io_poll_iocb *poll = req->apoll->double_poll; __poll_t mask = key_to_poll(key);

/* for instances that support it check for an event match first: */ if (mask && !(mask & poll->events)) return 0;

- if (req->poll.head) { + if (poll && poll->head) { bool done;

- spin_lock(&req->poll.head->lock); - done = list_empty(&req->poll.wait.entry); + spin_lock(&poll->head->lock); + done = list_empty(&poll->wait.entry); if (!done) - list_del_init(&req->poll.wait.entry); - spin_unlock(&req->poll.head->lock); + list_del_init(&poll->wait.entry); + spin_unlock(&poll->head->lock); if (!done) __io_async_wake(req, poll, mask, io_poll_task_func); } @@ -4319,7 +4320,8 @@ static void io_init_poll_iocb(struct io_ }

static void __io_queue_proc(struct io_poll_iocb *poll, struct io_poll_table *pt, - struct wait_queue_head *head) + struct wait_queue_head *head, + struct io_poll_iocb **poll_ptr) { struct io_kiocb *req = pt->req;

@@ -4330,7 +4332,7 @@ static void __io_queue_proc(struct io_po */ if (unlikely(poll->head)) { /* already have a 2nd entry, fail a third attempt */ - if (req->io) { + if (*poll_ptr) { pt->error = -EINVAL; return; } @@ -4342,7 +4344,7 @@ static void __io_queue_proc(struct io_po io_init_poll_iocb(poll, req->poll.events, io_poll_double_wake); refcount_inc(&req->refs); poll->wait.private = req; - req->io = (void *) poll; + *poll_ptr = poll; }

pt->error = 0; @@ -4354,8 +4356,9 @@ static void io_async_queue_proc(struct f struct poll_table_struct *p) { struct io_poll_table *pt = container_of(p, struct io_poll_table, pt); + struct async_poll *apoll = pt->req->apoll;

- __io_queue_proc(&pt->req->apoll->poll, pt, head); + __io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll); }

static void io_sq_thread_drop_mm(struct io_ring_ctx *ctx) @@ -4409,6 +4412,7 @@ static void io_async_task_func(struct ca memcpy(&req->work, &apoll->work, sizeof(req->work));

if (canceled) { + kfree(apoll->double_poll); kfree(apoll); io_cqring_ev_posted(ctx); end_req: @@ -4426,6 +4430,7 @@ end_req: __io_queue_sqe(req, NULL); mutex_unlock(&ctx->uring_lock);

+ kfree(apoll->double_poll); kfree(apoll); }

@@ -4497,7 +4502,6 @@ static bool io_arm_poll_handler(struct i struct async_poll *apoll; struct io_poll_table ipt; __poll_t mask, ret; - bool had_io;

if (!req->file || !file_can_poll(req->file)) return false; @@ -4509,10 +4513,10 @@ static bool io_arm_poll_handler(struct i apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC); if (unlikely(!apoll)) return false; + apoll->double_poll = NULL;

req->flags |= REQ_F_POLLED; memcpy(&apoll->work, &req->work, sizeof(req->work)); - had_io = req->io != NULL;

get_task_struct(current); req->task = current; @@ -4531,12 +4535,10 @@ static bool io_arm_poll_handler(struct i ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask, io_async_wake); if (ret) { - ipt.error = 0; - /* only remove double add if we did it here */ - if (!had_io) - io_poll_remove_double(req); + io_poll_remove_double(req, apoll->double_poll); spin_unlock_irq(&ctx->completion_lock); memcpy(&req->work, &apoll->work, sizeof(req->work)); + kfree(apoll->double_poll); kfree(apoll); return false; } @@ -4567,11 +4569,13 @@ static bool io_poll_remove_one(struct io bool do_complete;

if (req->opcode == IORING_OP_POLL_ADD) { - io_poll_remove_double(req); + io_poll_remove_double(req, req->io); do_complete = __io_poll_remove_one(req, &req->poll); } else { struct async_poll *apoll = req->apoll;

+ io_poll_remove_double(req, apoll->double_poll); + /* non-poll requests have submit ref still */ do_complete = __io_poll_remove_one(req, &apoll->poll); if (do_complete) { @@ -4582,6 +4586,7 @@ static bool io_poll_remove_one(struct io * final reference. */ memcpy(&req->work, &apoll->work, sizeof(req->work)); + kfree(apoll->double_poll); kfree(apoll); } } @@ -4682,7 +4687,7 @@ static void io_poll_queue_proc(struct fi { struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);

- __io_queue_proc(&pt->req->poll, pt, head); + __io_queue_proc(&pt->req->poll, pt, head, (struct io_poll_iocb **) &pt->req->io); }

static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)

Greg Kroah-Hartman

8:04 a.m.

New subject: [PATCH 5.7 20/20] regmap: debugfs: check count when read regmap file

From: Peng Fan peng.fan@nxp.com

commit 74edd08a4fbf51d65fd8f4c7d8289cd0f392bd91 upstream.

When executing the following command, we met kernel dump. dmesg -c > /dev/null; cd /sys; for i in `ls /sys/kernel/debug/regmap/* -d`; do echo "Checking regmap in $i"; cat $i/registers; done && grep -ri "0x02d0" *;

It is because the count value is too big, and kmalloc fails. So add an upper bound check to allow max size `PAGE_SIZE << (MAX_ORDER - 1)`.

Signed-off-by: Peng Fan peng.fan@nxp.com Link: https://lore.kernel.org/r/1584064687-12964-1-git-send-email-peng.fan@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org

--- drivers/base/regmap/regmap-debugfs.c | 6 ++++++ 1 file changed, 6 insertions(+)

--- a/drivers/base/regmap/regmap-debugfs.c +++ b/drivers/base/regmap/regmap-debugfs.c @@ -227,6 +227,9 @@ static ssize_t regmap_read_debugfs(struc if (*ppos < 0 || !count) return -EINVAL;

+ if (count > (PAGE_SIZE << (MAX_ORDER - 1))) + count = PAGE_SIZE << (MAX_ORDER - 1); + buf = kmalloc(count, GFP_KERNEL); if (!buf) return -ENOMEM; @@ -371,6 +374,9 @@ static ssize_t regmap_reg_ranges_read_fi if (*ppos < 0 || !count) return -EINVAL;

+ if (count > (PAGE_SIZE << (MAX_ORDER - 1))) + count = PAGE_SIZE << (MAX_ORDER - 1); + buf = kmalloc(count, GFP_KERNEL); if (!buf) return -ENOMEM;

Guenter Roeck

4:48 p.m.

On Thu, Jul 30, 2020 at 10:03:50AM +0200, Greg Kroah-Hartman wrote:

...

This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 431 pass: 431 fail: 0

Guenter

Greg Kroah-Hartman

31 Jul 31 Jul

5:15 p.m.

On Thu, Jul 30, 2020 at 09:48:23AM -0700, Guenter Roeck wrote:

...

On Thu, Jul 30, 2020 at 10:03:50AM +0200, Greg Kroah-Hartman wrote:

...
This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 431 pass: 431 fail: 0

Thanks for testing all of these and letting me know.

greg k-h

Naresh Kamboju

8:59 a.m.

On Thu, 30 Jul 2020 at 13:35, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:

...

This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.12-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.

thanks,

greg k-h

Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.

Summary ------------------------------------------------------------------------

kernel: 5.7.12-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.7.y git commit: 3d6db9c814407889db6cd20aba0aabe36e463171 git describe: v5.7.11-21-g3d6db9c81440 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.7-oe/build/v5.7.11-21-g...

No regressions (compared to build v5.7.10-199-g3d6db9c81440)

No fixes (compared to build v5.7.10-199-g3d6db9c81440)

Ran 36182 total tests in the following environments and test suites.

Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - juno-r2-compat - juno-r2-kasan - nxp-ls2088 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86 - x86-kasan

Test Suites ----------- * build * install-android-platform-tools-r2600 * install-android-platform-tools-r2800 * kselftest * kselftest/drivers * kselftest/filesystems * kselftest/net * linux-log-parser * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-syscalls-tests * perf * v4l2-compliance * kvm-unit-tests * libhugetlbfs * ltp-cap_bounds-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-sched-tests * ltp-open-posix-tests * network-basic-tests * igt-gpu-tools * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-native/drivers * kselftest-vsyscall-mode-native/filesystems * kselftest-vsyscall-mode-native/net * kselftest-vsyscall-mode-none * kselftest-vsyscall-mode-none/drivers * kselftest-vsyscall-mode-none/filesystems * kselftest-vsyscall-mode-none/net

-- Linaro LKFT https://lkft.linaro.org

Jon Hunter

12:53 p.m.

On Thu, 30 Jul 2020 10:03:50 +0200, Greg Kroah-Hartman wrote:

...

This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.12-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.

thanks,

greg k-h

All tests passing for Tegra ...

Test results for stable-v5.7: 11 builds: 11 pass, 0 fail 26 boots: 26 pass, 0 fail 56 tests: 56 pass, 0 fail

Linux version: 5.7.12-rc1-g3d6db9c81440 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04

Jon

Greg Kroah-Hartman

5:15 p.m.

On Fri, Jul 31, 2020 at 12:53:22PM +0000, Jon Hunter wrote:

...

On Thu, 30 Jul 2020 10:03:50 +0200, Greg Kroah-Hartman wrote:

...
This is the start of the stable review cycle for the 5.7.12 release. There are 20 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.

Responses should be made by Sat, 01 Aug 2020 07:44:05 +0000. Anything received after that time might be too late.

The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.12-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.

thanks,

greg k-h

All tests passing for Tegra ...

Test results for stable-v5.7: 11 builds: 11 pass, 0 fail 26 boots: 26 pass, 0 fail 56 tests: 56 pass, 0 fail

Linux version: 5.7.12-rc1-g3d6db9c81440 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04

Wonderful, thanks for testing all of these and letting me know.

greg k-h

1991

days inactive

1992

days old

linux-stable-mirror@lists.linaro.org

25 comments

participants

tags (0)

participants (4)

Greg Kroah-Hartman
Guenter Roeck
Jon Hunter
Naresh Kamboju