This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.136-rc1
Yan, Zheng zyan@redhat.com ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato ysato@users.sourceforge.jp Fix allyesconfig output.
Miroslav Lichvar mlichvar@redhat.com drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Jann Horn jannh@google.com sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov vdronov@redhat.com Bluetooth: hci_uart: check for missing tty operations
Sunil Muthuswamy sunilmut@microsoft.com hv_sock: Add support for delayed close
Joerg Roedel jroedel@suse.de iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Dmitry Safonov dima@arista.com iommu/vt-d: Don't queue_iova() if there is no flush queue
Luke Nowakowski-Krijger lnowakow@eng.ucsd.edu media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington bcodding@redhat.com NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov andreyknvl@google.com media: pvrusb2: use a different format for warnings
Oliver Neukum oneukum@suse.com media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam festevam@gmail.com ath10k: Change the warning message string
Sean Young sean@mess.org media: au0828: fix null dereference in error path
Phong Tran tranmanphong@gmail.com ISDN: hfcsusb: checking idx of ep configuration
Todd Kjos tkjos@android.com binder: fix possible UAF when freeing buffer
Will Deacon will.deacon@arm.com arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
Abhishek Sahu absahu@codeaurora.org i2c: qup: fixed releasing dma without flush operation completion
allen yan yanwei@marvell.com arm64: dts: marvell: Fix A37xx UART0 register size
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Fix lookup revalidate of regular files
Trond Myklebust trond.myklebust@hammerspace.com NFS: Refactor nfs_lookup_revalidate()
Trond Myklebust trond.myklebust@hammerspace.com NFS: Fix dentry revalidation on NFSv4 lookup
Sunil Muthuswamy sunilmut@microsoft.com vsock: correct removal of socket from the list
Stefan Hajnoczi stefanha@redhat.com VSOCK: use TCP state constants for sk_state
-------------
Diffstat:
.../devicetree/bindings/serial/mvebu-uart.txt | 2 +- Makefile | 4 +- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- arch/arm64/include/asm/compat.h | 1 + arch/sh/boards/Kconfig | 14 +- drivers/android/binder.c | 16 +- drivers/bluetooth/hci_ath.c | 3 + drivers/bluetooth/hci_bcm.c | 3 + drivers/bluetooth/hci_intel.c | 3 + drivers/bluetooth/hci_ldisc.c | 13 + drivers/bluetooth/hci_mrvl.c | 3 + drivers/bluetooth/hci_uart.h | 1 + drivers/i2c/busses/i2c-qup.c | 2 + drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iova.c | 18 +- drivers/isdn/hardware/mISDN/hfcsusb.c | 3 + drivers/media/radio/radio-raremono.c | 30 ++- drivers/media/usb/au0828/au0828-core.c | 12 +- drivers/media/usb/cpia2/cpia2_usb.c | 3 +- drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +- drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +- drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +- drivers/net/wireless/ath/ath10k/usb.c | 2 +- drivers/pps/pps.c | 8 + fs/ceph/caps.c | 7 +- fs/exec.c | 2 +- fs/nfs/client.c | 4 +- fs/nfs/dir.c | 295 +++++++++++---------- fs/nfs/nfs4proc.c | 15 +- include/linux/iova.h | 6 + include/linux/sched/numa_balancing.h | 4 +- include/net/af_vsock.h | 3 - kernel/fork.c | 2 +- kernel/sched/fair.c | 24 +- net/vmw_vsock/af_vsock.c | 84 +++--- net/vmw_vsock/hyperv_transport.c | 118 ++++++--- net/vmw_vsock/virtio_transport.c | 2 +- net/vmw_vsock/virtio_transport_common.c | 22 +- net/vmw_vsock/vmci_transport.c | 34 +-- net/vmw_vsock/vmci_transport_notify.c | 2 +- net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +- 41 files changed, 472 insertions(+), 311 deletions(-)
From: Stefan Hajnoczi stefanha@redhat.com
commit 3b4477d2dcf2709d0be89e2a8dced3d0f4a017f2 upstream.
There are two state fields: socket->state and sock->sk_state. The socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the sock->sk_state typically uses values that match TCP state constants (TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention and instead uses SS_* constants for both fields.
The sk_state field will be exposed to userspace through the vsock_diag interface for ss(8), netstat(8), and other programs.
This patch switches sk_state to TCP state constants so that the meaning of this field is consistent with other address families. Not just AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
The following mapping was used to convert the code:
SS_FREE -> TCP_CLOSE SS_UNCONNECTED -> TCP_CLOSE SS_CONNECTING -> TCP_SYN_SENT SS_CONNECTED -> TCP_ESTABLISHED SS_DISCONNECTING -> TCP_CLOSING VSOCK_SS_LISTEN -> TCP_LISTEN
In __vsock_create() the sk_state initialization was dropped because sock_init_data() already initializes sk_state to TCP_CLOSE.
Signed-off-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: David S. Miller davem@davemloft.net [Adjusted net/vmw_vsock/hyperv_transport.c since the commit b4562ca7925a ("hv_sock: add locking in the open/close/release code paths") and the commit c9d3fe9da094 ("VSOCK: fix outdated sk_state value in hvs_release()") were backported before 3b4477d2dcf2.] Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/af_vsock.h | 3 - net/vmw_vsock/af_vsock.c | 46 +++++++++++++++------------ net/vmw_vsock/hyperv_transport.c | 12 +++---- net/vmw_vsock/virtio_transport.c | 2 - net/vmw_vsock/virtio_transport_common.c | 22 ++++++------ net/vmw_vsock/vmci_transport.c | 34 +++++++++---------- net/vmw_vsock/vmci_transport_notify.c | 2 - net/vmw_vsock/vmci_transport_notify_qstate.c | 2 - 8 files changed, 64 insertions(+), 59 deletions(-)
--- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -22,9 +22,6 @@
#include "vsock_addr.h"
-/* vsock-specific sock->sk_state constants */ -#define VSOCK_SS_LISTEN 255 - #define LAST_RESERVED_PORT 1023
#define vsock_sk(__sk) ((struct vsock_sock *)__sk) --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -36,7 +36,7 @@ * not support simultaneous connects (two "client" sockets connecting). * * - "Server" sockets are referred to as listener sockets throughout this - * implementation because they are in the VSOCK_SS_LISTEN state. When a + * implementation because they are in the TCP_LISTEN state. When a * connection request is received (the second kind of socket mentioned above), * we create a new socket and refer to it as a pending socket. These pending * sockets are placed on the pending connection list of the listener socket. @@ -82,6 +82,15 @@ * argument, we must ensure the reference count is increased to ensure the * socket isn't freed before the function is run; the deferred function will * then drop the reference. + * + * - sk->sk_state uses the TCP state constants because they are widely used by + * other address families and exposed to userspace tools like ss(8): + * + * TCP_CLOSE - unconnected + * TCP_SYN_SENT - connecting + * TCP_ESTABLISHED - connected + * TCP_CLOSING - disconnecting + * TCP_LISTEN - listening */
#include <linux/types.h> @@ -485,7 +494,7 @@ static void vsock_pending_work(struct wo if (vsock_in_connected_table(vsk)) vsock_remove_connected(vsk);
- sk->sk_state = SS_FREE; + sk->sk_state = TCP_CLOSE;
out: release_sock(sk); @@ -626,7 +635,6 @@ struct sock *__vsock_create(struct net *
sk->sk_destruct = vsock_sk_destruct; sk->sk_backlog_rcv = vsock_queue_rcv_skb; - sk->sk_state = 0; sock_reset_flag(sk, SOCK_DONE);
INIT_LIST_HEAD(&vsk->bound_table); @@ -902,7 +910,7 @@ static unsigned int vsock_poll(struct fi /* Listening sockets that have connections in their accept * queue can be read. */ - if (sk->sk_state == VSOCK_SS_LISTEN + if (sk->sk_state == TCP_LISTEN && !vsock_is_accept_queue_empty(sk)) mask |= POLLIN | POLLRDNORM;
@@ -931,7 +939,7 @@ static unsigned int vsock_poll(struct fi }
/* Connected sockets that can produce data can be written. */ - if (sk->sk_state == SS_CONNECTED) { + if (sk->sk_state == TCP_ESTABLISHED) { if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { bool space_avail_now = false; int ret = transport->notify_poll_out( @@ -953,7 +961,7 @@ static unsigned int vsock_poll(struct fi * POLLOUT|POLLWRNORM when peer is closed and nothing to read, * but local send is not shutdown. */ - if (sk->sk_state == SS_UNCONNECTED) { + if (sk->sk_state == TCP_CLOSE) { if (!(sk->sk_shutdown & SEND_SHUTDOWN)) mask |= POLLOUT | POLLWRNORM;
@@ -1123,9 +1131,9 @@ static void vsock_connect_timeout(struct sk = sk_vsock(vsk);
lock_sock(sk); - if (sk->sk_state == SS_CONNECTING && + if (sk->sk_state == TCP_SYN_SENT && (sk->sk_shutdown != SHUTDOWN_MASK)) { - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sk->sk_err = ETIMEDOUT; sk->sk_error_report(sk); cancel = 1; @@ -1171,7 +1179,7 @@ static int vsock_stream_connect(struct s err = -EALREADY; break; default: - if ((sk->sk_state == VSOCK_SS_LISTEN) || + if ((sk->sk_state == TCP_LISTEN) || vsock_addr_cast(addr, addr_len, &remote_addr) != 0) { err = -EINVAL; goto out; @@ -1194,7 +1202,7 @@ static int vsock_stream_connect(struct s if (err) goto out;
- sk->sk_state = SS_CONNECTING; + sk->sk_state = TCP_SYN_SENT;
err = transport->connect(vsk); if (err < 0) @@ -1214,7 +1222,7 @@ static int vsock_stream_connect(struct s timeout = vsk->connect_timeout; prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
- while (sk->sk_state != SS_CONNECTED && sk->sk_err == 0) { + while (sk->sk_state != TCP_ESTABLISHED && sk->sk_err == 0) { if (flags & O_NONBLOCK) { /* If we're not going to block, we schedule a timeout * function to generate a timeout on the connection @@ -1235,13 +1243,13 @@ static int vsock_stream_connect(struct s
if (signal_pending(current)) { err = sock_intr_errno(timeout); - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sock->state = SS_UNCONNECTED; vsock_transport_cancel_pkt(vsk); goto out_wait; } else if (timeout == 0) { err = -ETIMEDOUT; - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sock->state = SS_UNCONNECTED; vsock_transport_cancel_pkt(vsk); goto out_wait; @@ -1252,7 +1260,7 @@ static int vsock_stream_connect(struct s
if (sk->sk_err) { err = -sk->sk_err; - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sock->state = SS_UNCONNECTED; } else { err = 0; @@ -1285,7 +1293,7 @@ static int vsock_accept(struct socket *s goto out; }
- if (listener->sk_state != VSOCK_SS_LISTEN) { + if (listener->sk_state != TCP_LISTEN) { err = -EINVAL; goto out; } @@ -1375,7 +1383,7 @@ static int vsock_listen(struct socket *s }
sk->sk_max_ack_backlog = backlog; - sk->sk_state = VSOCK_SS_LISTEN; + sk->sk_state = TCP_LISTEN;
err = 0;
@@ -1555,7 +1563,7 @@ static int vsock_stream_sendmsg(struct s
/* Callers should not provide a destination with stream sockets. */ if (msg->msg_namelen) { - err = sk->sk_state == SS_CONNECTED ? -EISCONN : -EOPNOTSUPP; + err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP; goto out; }
@@ -1566,7 +1574,7 @@ static int vsock_stream_sendmsg(struct s goto out; }
- if (sk->sk_state != SS_CONNECTED || + if (sk->sk_state != TCP_ESTABLISHED || !vsock_addr_bound(&vsk->local_addr)) { err = -ENOTCONN; goto out; @@ -1690,7 +1698,7 @@ vsock_stream_recvmsg(struct socket *sock
lock_sock(sk);
- if (sk->sk_state != SS_CONNECTED) { + if (sk->sk_state != TCP_ESTABLISHED) { /* Recvmsg is supposed to return 0 if a peer performs an * orderly shutdown. Differentiate between that case and when a * peer has not connected or a local shutdown occured with the --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -297,7 +297,7 @@ static void hvs_close_connection(struct
lock_sock(sk);
- sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sock_set_flag(sk, SOCK_DONE); vsk->peer_shutdown |= SEND_SHUTDOWN | RCV_SHUTDOWN;
@@ -336,8 +336,8 @@ static void hvs_open_connection(struct v
lock_sock(sk);
- if ((conn_from_host && sk->sk_state != VSOCK_SS_LISTEN) || - (!conn_from_host && sk->sk_state != SS_CONNECTING)) + if ((conn_from_host && sk->sk_state != TCP_LISTEN) || + (!conn_from_host && sk->sk_state != TCP_SYN_SENT)) goto out;
if (conn_from_host) { @@ -349,7 +349,7 @@ static void hvs_open_connection(struct v if (!new) goto out;
- new->sk_state = SS_CONNECTING; + new->sk_state = TCP_SYN_SENT; vnew = vsock_sk(new); hvs_new = vnew->trans; hvs_new->chan = chan; @@ -383,7 +383,7 @@ static void hvs_open_connection(struct v hvs_set_channel_pending_send_size(chan);
if (conn_from_host) { - new->sk_state = SS_CONNECTED; + new->sk_state = TCP_ESTABLISHED; sk->sk_ack_backlog++;
hvs_addr_init(&vnew->local_addr, if_type); @@ -396,7 +396,7 @@ static void hvs_open_connection(struct v
vsock_enqueue_accept(sk, new); } else { - sk->sk_state = SS_CONNECTED; + sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED;
vsock_insert_connected(vsock_sk(sk)); --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -417,7 +417,7 @@ static void virtio_vsock_event_fill(stru static void virtio_vsock_reset_sock(struct sock *sk) { lock_sock(sk); - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sk->sk_err = ECONNRESET; sk->sk_error_report(sk); release_sock(sk); --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -716,7 +716,7 @@ static void virtio_transport_do_close(st sock_set_flag(sk, SOCK_DONE); vsk->peer_shutdown = SHUTDOWN_MASK; if (vsock_stream_has_data(vsk) <= 0) - sk->sk_state = SS_DISCONNECTING; + sk->sk_state = TCP_CLOSING; sk->sk_state_change(sk);
if (vsk->close_work_scheduled && @@ -756,8 +756,8 @@ static bool virtio_transport_close(struc { struct sock *sk = &vsk->sk;
- if (!(sk->sk_state == SS_CONNECTED || - sk->sk_state == SS_DISCONNECTING)) + if (!(sk->sk_state == TCP_ESTABLISHED || + sk->sk_state == TCP_CLOSING)) return true;
/* Already received SHUTDOWN from peer, reply with RST */ @@ -816,7 +816,7 @@ virtio_transport_recv_connecting(struct
switch (le16_to_cpu(pkt->hdr.op)) { case VIRTIO_VSOCK_OP_RESPONSE: - sk->sk_state = SS_CONNECTED; + sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED; vsock_insert_connected(vsk); sk->sk_state_change(sk); @@ -836,7 +836,7 @@ virtio_transport_recv_connecting(struct
destroy: virtio_transport_reset(vsk, pkt); - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sk->sk_err = skerr; sk->sk_error_report(sk); return err; @@ -872,7 +872,7 @@ virtio_transport_recv_connected(struct s vsk->peer_shutdown |= SEND_SHUTDOWN; if (vsk->peer_shutdown == SHUTDOWN_MASK && vsock_stream_has_data(vsk) <= 0) - sk->sk_state = SS_DISCONNECTING; + sk->sk_state = TCP_CLOSING; if (le32_to_cpu(pkt->hdr.flags)) sk->sk_state_change(sk); break; @@ -943,7 +943,7 @@ virtio_transport_recv_listen(struct sock
lock_sock_nested(child, SINGLE_DEPTH_NESTING);
- child->sk_state = SS_CONNECTED; + child->sk_state = TCP_ESTABLISHED;
vchild = vsock_sk(child); vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid), @@ -1031,18 +1031,18 @@ void virtio_transport_recv_pkt(struct vi sk->sk_write_space(sk);
switch (sk->sk_state) { - case VSOCK_SS_LISTEN: + case TCP_LISTEN: virtio_transport_recv_listen(sk, pkt); virtio_transport_free_pkt(pkt); break; - case SS_CONNECTING: + case TCP_SYN_SENT: virtio_transport_recv_connecting(sk, pkt); virtio_transport_free_pkt(pkt); break; - case SS_CONNECTED: + case TCP_ESTABLISHED: virtio_transport_recv_connected(sk, pkt); break; - case SS_DISCONNECTING: + case TCP_CLOSING: virtio_transport_recv_disconnecting(sk, pkt); virtio_transport_free_pkt(pkt); break; --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -776,7 +776,7 @@ static int vmci_transport_recv_stream_cb /* The local context ID may be out of date, update it. */ vsk->local_addr.svm_cid = dst.svm_cid;
- if (sk->sk_state == SS_CONNECTED) + if (sk->sk_state == TCP_ESTABLISHED) vmci_trans(vsk)->notify_ops->handle_notify_pkt( sk, pkt, true, &dst, &src, &bh_process_pkt); @@ -834,7 +834,9 @@ static void vmci_transport_handle_detach * left in our consume queue. */ if (vsock_stream_has_data(vsk) <= 0) { - if (sk->sk_state == SS_CONNECTING) { + sk->sk_state = TCP_CLOSE; + + if (sk->sk_state == TCP_SYN_SENT) { /* The peer may detach from a queue pair while * we are still in the connecting state, i.e., * if the peer VM is killed after attaching to @@ -843,12 +845,10 @@ static void vmci_transport_handle_detach * event like a reset. */
- sk->sk_state = SS_UNCONNECTED; sk->sk_err = ECONNRESET; sk->sk_error_report(sk); return; } - sk->sk_state = SS_UNCONNECTED; } sk->sk_state_change(sk); } @@ -916,17 +916,17 @@ static void vmci_transport_recv_pkt_work vsock_sk(sk)->local_addr.svm_cid = pkt->dg.dst.context;
switch (sk->sk_state) { - case VSOCK_SS_LISTEN: + case TCP_LISTEN: vmci_transport_recv_listen(sk, pkt); break; - case SS_CONNECTING: + case TCP_SYN_SENT: /* Processing of pending connections for servers goes through * the listening socket, so see vmci_transport_recv_listen() * for that path. */ vmci_transport_recv_connecting_client(sk, pkt); break; - case SS_CONNECTED: + case TCP_ESTABLISHED: vmci_transport_recv_connected(sk, pkt); break; default: @@ -975,7 +975,7 @@ static int vmci_transport_recv_listen(st vsock_sk(pending)->local_addr.svm_cid = pkt->dg.dst.context;
switch (pending->sk_state) { - case SS_CONNECTING: + case TCP_SYN_SENT: err = vmci_transport_recv_connecting_server(sk, pending, pkt); @@ -1105,7 +1105,7 @@ static int vmci_transport_recv_listen(st vsock_add_pending(sk, pending); sk->sk_ack_backlog++;
- pending->sk_state = SS_CONNECTING; + pending->sk_state = TCP_SYN_SENT; vmci_trans(vpending)->produce_size = vmci_trans(vpending)->consume_size = qp_size; vmci_trans(vpending)->queue_pair_size = qp_size; @@ -1229,11 +1229,11 @@ vmci_transport_recv_connecting_server(st * the socket will be valid until it is removed from the queue. * * If we fail sending the attach below, we remove the socket from the - * connected list and move the socket to SS_UNCONNECTED before + * connected list and move the socket to TCP_CLOSE before * releasing the lock, so a pending slow path processing of an incoming * packet will not see the socket in the connected state in that case. */ - pending->sk_state = SS_CONNECTED; + pending->sk_state = TCP_ESTABLISHED;
vsock_insert_connected(vpending);
@@ -1264,7 +1264,7 @@ vmci_transport_recv_connecting_server(st
destroy: pending->sk_err = skerr; - pending->sk_state = SS_UNCONNECTED; + pending->sk_state = TCP_CLOSE; /* As long as we drop our reference, all necessary cleanup will handle * when the cleanup function drops its reference and our destruct * implementation is called. Note that since the listen handler will @@ -1302,7 +1302,7 @@ vmci_transport_recv_connecting_client(st * accounting (it can already be found since it's in the bound * table). */ - sk->sk_state = SS_CONNECTED; + sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED; vsock_insert_connected(vsk); sk->sk_state_change(sk); @@ -1370,7 +1370,7 @@ vmci_transport_recv_connecting_client(st destroy: vmci_transport_send_reset(sk, pkt);
- sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; sk->sk_err = skerr; sk->sk_error_report(sk); return err; @@ -1558,7 +1558,7 @@ static int vmci_transport_recv_connected sock_set_flag(sk, SOCK_DONE); vsk->peer_shutdown = SHUTDOWN_MASK; if (vsock_stream_has_data(vsk) <= 0) - sk->sk_state = SS_DISCONNECTING; + sk->sk_state = TCP_CLOSING;
sk->sk_state_change(sk); break; @@ -1826,7 +1826,7 @@ static int vmci_transport_connect(struct err = vmci_transport_send_conn_request( sk, vmci_trans(vsk)->queue_pair_size); if (err < 0) { - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; return err; } } else { @@ -1836,7 +1836,7 @@ static int vmci_transport_connect(struct sk, vmci_trans(vsk)->queue_pair_size, supported_proto_versions); if (err < 0) { - sk->sk_state = SS_UNCONNECTED; + sk->sk_state = TCP_CLOSE; return err; }
--- a/net/vmw_vsock/vmci_transport_notify.c +++ b/net/vmw_vsock/vmci_transport_notify.c @@ -355,7 +355,7 @@ vmci_transport_notify_pkt_poll_in(struct * queue. Ask for notifications when there is something to * read. */ - if (sk->sk_state == SS_CONNECTED) { + if (sk->sk_state == TCP_ESTABLISHED) { if (!send_waiting_read(sk, 1)) return -1;
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c +++ b/net/vmw_vsock/vmci_transport_notify_qstate.c @@ -176,7 +176,7 @@ vmci_transport_notify_pkt_poll_in(struct * queue. Ask for notifications when there is something to * read. */ - if (sk->sk_state == SS_CONNECTED) + if (sk->sk_state == TCP_ESTABLISHED) vsock_block_update_write_window(sk); *data_ready_now = false; }
From: Sunil Muthuswamy sunilmut@microsoft.com
commit d5afa82c977ea06f7119058fa0eb8519ea501031 upstream.
The current vsock code for removal of socket from the list is both subject to race and inefficient. It takes the lock, checks whether the socket is in the list, drops the lock and if the socket was on the list, deletes it from the list. This is subject to race because as soon as the lock is dropped once it is checked for presence, that condition cannot be relied upon for any decision. It is also inefficient because if the socket is present in the list, it takes the lock twice.
Signed-off-by: Sunil Muthuswamy sunilmut@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/vmw_vsock/af_vsock.c | 38 +++++++------------------------------- 1 file changed, 7 insertions(+), 31 deletions(-)
--- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -288,7 +288,8 @@ EXPORT_SYMBOL_GPL(vsock_insert_connected void vsock_remove_bound(struct vsock_sock *vsk) { spin_lock_bh(&vsock_table_lock); - __vsock_remove_bound(vsk); + if (__vsock_in_bound_table(vsk)) + __vsock_remove_bound(vsk); spin_unlock_bh(&vsock_table_lock); } EXPORT_SYMBOL_GPL(vsock_remove_bound); @@ -296,7 +297,8 @@ EXPORT_SYMBOL_GPL(vsock_remove_bound); void vsock_remove_connected(struct vsock_sock *vsk) { spin_lock_bh(&vsock_table_lock); - __vsock_remove_connected(vsk); + if (__vsock_in_connected_table(vsk)) + __vsock_remove_connected(vsk); spin_unlock_bh(&vsock_table_lock); } EXPORT_SYMBOL_GPL(vsock_remove_connected); @@ -332,35 +334,10 @@ struct sock *vsock_find_connected_socket } EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
-static bool vsock_in_bound_table(struct vsock_sock *vsk) -{ - bool ret; - - spin_lock_bh(&vsock_table_lock); - ret = __vsock_in_bound_table(vsk); - spin_unlock_bh(&vsock_table_lock); - - return ret; -} - -static bool vsock_in_connected_table(struct vsock_sock *vsk) -{ - bool ret; - - spin_lock_bh(&vsock_table_lock); - ret = __vsock_in_connected_table(vsk); - spin_unlock_bh(&vsock_table_lock); - - return ret; -} - void vsock_remove_sock(struct vsock_sock *vsk) { - if (vsock_in_bound_table(vsk)) - vsock_remove_bound(vsk); - - if (vsock_in_connected_table(vsk)) - vsock_remove_connected(vsk); + vsock_remove_bound(vsk); + vsock_remove_connected(vsk); } EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -491,8 +468,7 @@ static void vsock_pending_work(struct wo * incoming packets can't find this socket, and to reduce the reference * count. */ - if (vsock_in_connected_table(vsk)) - vsock_remove_connected(vsk); + vsock_remove_connected(vsk);
sk->sk_state = TCP_CLOSE;
From: Trond Myklebust trond.myklebust@hammerspace.com
commit be189f7e7f03de35887e5a85ddcf39b91b5d7fc1 upstream.
We need to ensure that inode and dentry revalidation occurs correctly on reopen of a file that is already open. Currently, we can end up not revalidating either in the case of NFSv4.0, due to the 'cached open' path. Let's fix that by ensuring that we only do cached open for the special cases of open recovery and delegation return.
Reported-by: Stan Hu stanhu@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Qian Lu luqia@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/nfs4proc.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
--- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1317,12 +1317,20 @@ static bool nfs4_mode_match_open_stateid return false; }
-static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode) +static int can_open_cached(struct nfs4_state *state, fmode_t mode, + int open_mode, enum open_claim_type4 claim) { int ret = 0;
if (open_mode & (O_EXCL|O_TRUNC)) goto out; + switch (claim) { + case NFS4_OPEN_CLAIM_NULL: + case NFS4_OPEN_CLAIM_FH: + goto out; + default: + break; + } switch (mode & (FMODE_READ|FMODE_WRITE)) { case FMODE_READ: ret |= test_bit(NFS_O_RDONLY_STATE, &state->flags) != 0 @@ -1617,7 +1625,7 @@ static struct nfs4_state *nfs4_try_open_
for (;;) { spin_lock(&state->owner->so_lock); - if (can_open_cached(state, fmode, open_mode)) { + if (can_open_cached(state, fmode, open_mode, claim)) { update_open_stateflags(state, fmode); spin_unlock(&state->owner->so_lock); goto out_return_state; @@ -2141,7 +2149,8 @@ static void nfs4_open_prepare(struct rpc if (data->state != NULL) { struct nfs_delegation *delegation;
- if (can_open_cached(data->state, data->o_arg.fmode, data->o_arg.open_flags)) + if (can_open_cached(data->state, data->o_arg.fmode, + data->o_arg.open_flags, claim)) goto out_no_action; rcu_read_lock(); delegation = rcu_dereference(NFS_I(data->state->inode)->delegation);
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 5ceb9d7fdaaf6d8ced6cd7861cf1deb9cd93fa47 upstream.
Refactor the code in nfs_lookup_revalidate() as a stepping stone towards optimising and fixing nfs4_lookup_revalidate().
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Qian Lu luqia@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/dir.c | 222 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 126 insertions(+), 96 deletions(-)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1059,6 +1059,100 @@ int nfs_neg_need_reval(struct inode *dir return !nfs_check_verifier(dir, dentry, flags & LOOKUP_RCU); }
+static int +nfs_lookup_revalidate_done(struct inode *dir, struct dentry *dentry, + struct inode *inode, int error) +{ + switch (error) { + case 1: + dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is valid\n", + __func__, dentry); + return 1; + case 0: + nfs_mark_for_revalidate(dir); + if (inode && S_ISDIR(inode->i_mode)) { + /* Purge readdir caches. */ + nfs_zap_caches(inode); + /* + * We can't d_drop the root of a disconnected tree: + * its d_hash is on the s_anon list and d_drop() would hide + * it from shrink_dcache_for_unmount(), leading to busy + * inodes on unmount and further oopses. + */ + if (IS_ROOT(dentry)) + return 1; + } + dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is invalid\n", + __func__, dentry); + return 0; + } + dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) lookup returned error %d\n", + __func__, dentry, error); + return error; +} + +static int +nfs_lookup_revalidate_negative(struct inode *dir, struct dentry *dentry, + unsigned int flags) +{ + int ret = 1; + if (nfs_neg_need_reval(dir, dentry, flags)) { + if (flags & LOOKUP_RCU) + return -ECHILD; + ret = 0; + } + return nfs_lookup_revalidate_done(dir, dentry, NULL, ret); +} + +static int +nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry, + struct inode *inode) +{ + nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); + return nfs_lookup_revalidate_done(dir, dentry, inode, 1); +} + +static int +nfs_lookup_revalidate_dentry(struct inode *dir, struct dentry *dentry, + struct inode *inode) +{ + struct nfs_fh *fhandle; + struct nfs_fattr *fattr; + struct nfs4_label *label; + int ret; + + ret = -ENOMEM; + fhandle = nfs_alloc_fhandle(); + fattr = nfs_alloc_fattr(); + label = nfs4_label_alloc(NFS_SERVER(inode), GFP_KERNEL); + if (fhandle == NULL || fattr == NULL || IS_ERR(label)) + goto out; + + ret = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, label); + if (ret < 0) { + if (ret == -ESTALE || ret == -ENOENT) + ret = 0; + goto out; + } + ret = 0; + if (nfs_compare_fh(NFS_FH(inode), fhandle)) + goto out; + if (nfs_refresh_inode(inode, fattr) < 0) + goto out; + + nfs_setsecurity(inode, fattr, label); + nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); + + /* set a readdirplus hint that we had a cache miss */ + nfs_force_use_readdirplus(dir); + ret = 1; +out: + nfs_free_fattr(fattr); + nfs_free_fhandle(fhandle); + nfs4_label_free(label); + return nfs_lookup_revalidate_done(dir, dentry, inode, ret); +} + /* * This is called every time the dcache has a lookup hit, * and we should check whether we can really trust that @@ -1070,58 +1164,36 @@ int nfs_neg_need_reval(struct inode *dir * If the parent directory is seen to have changed, we throw out the * cached dentry and do a new lookup. */ -static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) +static int +nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry, + unsigned int flags) { - struct inode *dir; struct inode *inode; - struct dentry *parent; - struct nfs_fh *fhandle = NULL; - struct nfs_fattr *fattr = NULL; - struct nfs4_label *label = NULL; int error;
- if (flags & LOOKUP_RCU) { - parent = ACCESS_ONCE(dentry->d_parent); - dir = d_inode_rcu(parent); - if (!dir) - return -ECHILD; - } else { - parent = dget_parent(dentry); - dir = d_inode(parent); - } nfs_inc_stats(dir, NFSIOS_DENTRYREVALIDATE); inode = d_inode(dentry);
- if (!inode) { - if (nfs_neg_need_reval(dir, dentry, flags)) { - if (flags & LOOKUP_RCU) - return -ECHILD; - goto out_bad; - } - goto out_valid; - } + if (!inode) + return nfs_lookup_revalidate_negative(dir, dentry, flags);
if (is_bad_inode(inode)) { - if (flags & LOOKUP_RCU) - return -ECHILD; dfprintk(LOOKUPCACHE, "%s: %pd2 has dud inode\n", __func__, dentry); goto out_bad; }
if (NFS_PROTO(dir)->have_delegation(inode, FMODE_READ)) - goto out_set_verifier; + return nfs_lookup_revalidate_delegated(dir, dentry, inode);
/* Force a full look up iff the parent directory has changed */ if (!nfs_is_exclusive_create(dir, flags) && nfs_check_verifier(dir, dentry, flags & LOOKUP_RCU)) { error = nfs_lookup_verify_inode(inode, flags); if (error) { - if (flags & LOOKUP_RCU) - return -ECHILD; if (error == -ESTALE) - goto out_zap_parent; - goto out_error; + nfs_zap_caches(dir); + goto out_bad; } nfs_advise_use_readdirplus(dir); goto out_valid; @@ -1133,81 +1205,39 @@ static int nfs_lookup_revalidate(struct if (NFS_STALE(inode)) goto out_bad;
- error = -ENOMEM; - fhandle = nfs_alloc_fhandle(); - fattr = nfs_alloc_fattr(); - if (fhandle == NULL || fattr == NULL) - goto out_error; - - label = nfs4_label_alloc(NFS_SERVER(inode), GFP_NOWAIT); - if (IS_ERR(label)) - goto out_error; - trace_nfs_lookup_revalidate_enter(dir, dentry, flags); - error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, label); + error = nfs_lookup_revalidate_dentry(dir, dentry, inode); trace_nfs_lookup_revalidate_exit(dir, dentry, flags, error); - if (error == -ESTALE || error == -ENOENT) - goto out_bad; - if (error) - goto out_error; - if (nfs_compare_fh(NFS_FH(inode), fhandle)) - goto out_bad; - if ((error = nfs_refresh_inode(inode, fattr)) != 0) - goto out_bad; - - nfs_setsecurity(inode, fattr, label); - - nfs_free_fattr(fattr); - nfs_free_fhandle(fhandle); - nfs4_label_free(label); + return error; +out_valid: + return nfs_lookup_revalidate_done(dir, dentry, inode, 1); +out_bad: + if (flags & LOOKUP_RCU) + return -ECHILD; + return nfs_lookup_revalidate_done(dir, dentry, inode, 0); +}
- /* set a readdirplus hint that we had a cache miss */ - nfs_force_use_readdirplus(dir); +static int +nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct dentry *parent; + struct inode *dir; + int ret;
-out_set_verifier: - nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); - out_valid: if (flags & LOOKUP_RCU) { + parent = ACCESS_ONCE(dentry->d_parent); + dir = d_inode_rcu(parent); + if (!dir) + return -ECHILD; + ret = nfs_do_lookup_revalidate(dir, dentry, flags); if (parent != ACCESS_ONCE(dentry->d_parent)) return -ECHILD; - } else + } else { + parent = dget_parent(dentry); + ret = nfs_do_lookup_revalidate(d_inode(parent), dentry, flags); dput(parent); - dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is valid\n", - __func__, dentry); - return 1; -out_zap_parent: - nfs_zap_caches(dir); - out_bad: - WARN_ON(flags & LOOKUP_RCU); - nfs_free_fattr(fattr); - nfs_free_fhandle(fhandle); - nfs4_label_free(label); - nfs_mark_for_revalidate(dir); - if (inode && S_ISDIR(inode->i_mode)) { - /* Purge readdir caches. */ - nfs_zap_caches(inode); - /* - * We can't d_drop the root of a disconnected tree: - * its d_hash is on the s_anon list and d_drop() would hide - * it from shrink_dcache_for_unmount(), leading to busy - * inodes on unmount and further oopses. - */ - if (IS_ROOT(dentry)) - goto out_valid; } - dput(parent); - dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is invalid\n", - __func__, dentry); - return 0; -out_error: - WARN_ON(flags & LOOKUP_RCU); - nfs_free_fattr(fattr); - nfs_free_fhandle(fhandle); - nfs4_label_free(label); - dput(parent); - dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) lookup returned error %d\n", - __func__, dentry, error); - return error; + return ret; }
/*
From: Trond Myklebust trond.myklebust@hammerspace.com
commit c7944ebb9ce9461079659e9e6ec5baaf73724b3b upstream.
If we're revalidating an existing dentry in order to open a file, we need to ensure that we check the directory has not changed before we optimise away the lookup.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Qian Lu luqia@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/dir.c | 79 +++++++++++++++++++++++++++++------------------------------ 1 file changed, 39 insertions(+), 40 deletions(-)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1218,7 +1218,8 @@ out_bad: }
static int -nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) +__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags, + int (*reval)(struct inode *, struct dentry *, unsigned int)) { struct dentry *parent; struct inode *dir; @@ -1229,17 +1230,22 @@ nfs_lookup_revalidate(struct dentry *den dir = d_inode_rcu(parent); if (!dir) return -ECHILD; - ret = nfs_do_lookup_revalidate(dir, dentry, flags); + ret = reval(dir, dentry, flags); if (parent != ACCESS_ONCE(dentry->d_parent)) return -ECHILD; } else { parent = dget_parent(dentry); - ret = nfs_do_lookup_revalidate(d_inode(parent), dentry, flags); + ret = reval(d_inode(parent), dentry, flags); dput(parent); } return ret; }
+static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) +{ + return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate); +} + /* * A weaker form of d_revalidate for revalidating just the d_inode(dentry) * when we don't really care about the dentry name. This is called when a @@ -1590,62 +1596,55 @@ no_open: } EXPORT_SYMBOL_GPL(nfs_atomic_open);
-static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags) +static int +nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry, + unsigned int flags) { struct inode *inode; - int ret = 0;
if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY)) - goto no_open; + goto full_reval; if (d_mountpoint(dentry)) - goto no_open; - if (NFS_SB(dentry->d_sb)->caps & NFS_CAP_ATOMIC_OPEN_V1) - goto no_open; + goto full_reval;
inode = d_inode(dentry);
/* We can't create new files in nfs_open_revalidate(), so we * optimize away revalidation of negative dentries. */ - if (inode == NULL) { - struct dentry *parent; - struct inode *dir; - - if (flags & LOOKUP_RCU) { - parent = ACCESS_ONCE(dentry->d_parent); - dir = d_inode_rcu(parent); - if (!dir) - return -ECHILD; - } else { - parent = dget_parent(dentry); - dir = d_inode(parent); - } - if (!nfs_neg_need_reval(dir, dentry, flags)) - ret = 1; - else if (flags & LOOKUP_RCU) - ret = -ECHILD; - if (!(flags & LOOKUP_RCU)) - dput(parent); - else if (parent != ACCESS_ONCE(dentry->d_parent)) - return -ECHILD; - goto out; - } + if (inode == NULL) + goto full_reval; + + if (NFS_PROTO(dir)->have_delegation(inode, FMODE_READ)) + return nfs_lookup_revalidate_delegated(dir, dentry, inode);
/* NFS only supports OPEN on regular files */ if (!S_ISREG(inode->i_mode)) - goto no_open; + goto full_reval; + /* We cannot do exclusive creation on a positive dentry */ - if (flags & LOOKUP_EXCL) - goto no_open; + if (flags & (LOOKUP_EXCL | LOOKUP_REVAL)) + goto reval_dentry; + + /* Check if the directory changed */ + if (!nfs_check_verifier(dir, dentry, flags & LOOKUP_RCU)) + goto reval_dentry;
/* Let f_op->open() actually open (and revalidate) the file */ - ret = 1; + return 1; +reval_dentry: + if (flags & LOOKUP_RCU) + return -ECHILD; + return nfs_lookup_revalidate_dentry(dir, dentry, inode);;
-out: - return ret; +full_reval: + return nfs_do_lookup_revalidate(dir, dentry, flags); +}
-no_open: - return nfs_lookup_revalidate(dentry, flags); +static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags) +{ + return __nfs_lookup_revalidate(dentry, flags, + nfs4_do_lookup_revalidate); }
#endif /* CONFIG_NFSV4 */
From: allen yan yanwei@marvell.com
commit c737abc193d16e62e23e2fb585b8b7398ab380d8 upstream.
Armada-37xx UART0 registers are 0x200 bytes wide. Right next to them are the UART1 registers that should not be declared in this node.
Update the example in DT bindings document accordingly.
Signed-off-by: allen yan yanwei@marvell.com Signed-off-by: Miquel Raynal miquel.raynal@free-electrons.com Signed-off-by: Gregory CLEMENT gregory.clement@free-electrons.com Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/devicetree/bindings/serial/mvebu-uart.txt | 2 +- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/Documentation/devicetree/bindings/serial/mvebu-uart.txt +++ b/Documentation/devicetree/bindings/serial/mvebu-uart.txt @@ -8,6 +8,6 @@ Required properties: Example: serial@12000 { compatible = "marvell,armada-3700-uart"; - reg = <0x12000 0x400>; + reg = <0x12000 0x200>; interrupts = <43>; }; --- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi +++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi @@ -134,7 +134,7 @@
uart0: serial@12000 { compatible = "marvell,armada-3700-uart"; - reg = <0x12000 0x400>; + reg = <0x12000 0x200>; interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; };
From: Abhishek Sahu absahu@codeaurora.org
commit 7239872fb3400b21a8f5547257f9f86455867bd6 upstream.
The QUP BSLP BAM generates the following error sometimes if the current I2C DMA transfer fails and the flush operation has been scheduled
“bam-dma-engine 7884000.dma: Cannot free busy channel”
If any I2C error comes during BAM DMA transfer, then the QUP I2C interrupt will be generated and the flush operation will be carried out to make I2C consume all scheduled DMA transfer. Currently, the same completion structure is being used for BAM transfer which has already completed without reinit. It will make flush operation wait_for_completion_timeout completed immediately and will proceed for freeing the DMA resources where the descriptors are still in process.
Signed-off-by: Abhishek Sahu absahu@codeaurora.org Acked-by: Sricharan R sricharan@codeaurora.org Reviewed-by: Austin Christ austinwc@codeaurora.org Reviewed-by: Andy Gross andy.gross@linaro.org Signed-off-by: Wolfram Sang wsa@the-dreams.de Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-qup.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/i2c/busses/i2c-qup.c +++ b/drivers/i2c/busses/i2c-qup.c @@ -844,6 +844,8 @@ static int qup_i2c_bam_do_xfer(struct qu }
if (ret || qup->bus_err || qup->qup_err) { + reinit_completion(&qup->xfer); + if (qup_i2c_change_state(qup, QUP_RUN_STATE)) { dev_err(qup->dev, "change to run state timed out"); goto desc_err;
From: Will Deacon will.deacon@arm.com
commit 24951465cbd279f60b1fdc2421b3694405bcff42 upstream.
arch/arm/ defines a SIGMINSTKSZ of 2k, so we should use the same value for compat tasks.
Cc: Arnd Bergmann arnd@arndb.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Oleg Nesterov oleg@redhat.com Reviewed-by: Dave Martin Dave.Martin@arm.com Reported-by: Steve McIntyre steve.mcintyre@arm.com Tested-by: Steve McIntyre 93sam@debian.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/compat.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/include/asm/compat.h +++ b/arch/arm64/include/asm/compat.h @@ -234,6 +234,7 @@ static inline compat_uptr_t ptr_to_compa }
#define compat_user_stack_pointer() (user_stack_pointer(task_pt_regs(current))) +#define COMPAT_MINSIGSTKSZ 2048
static inline void __user *arch_compat_alloc_user_space(long len) {
From: Todd Kjos tkjos@android.com
commit a370003cc301d4361bae20c9ef615f89bf8d1e8a upstream.
There is a race between the binder driver cleaning up a completed transaction via binder_free_transaction() and a user calling binder_ioctl(BC_FREE_BUFFER) to release a buffer. It doesn't matter which is first but they need to be protected against running concurrently which can result in a UAF.
Signed-off-by: Todd Kjos tkjos@google.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/android/binder.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
--- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -1903,8 +1903,18 @@ static struct binder_thread *binder_get_
static void binder_free_transaction(struct binder_transaction *t) { - if (t->buffer) - t->buffer->transaction = NULL; + struct binder_proc *target_proc = t->to_proc; + + if (target_proc) { + binder_inner_proc_lock(target_proc); + if (t->buffer) + t->buffer->transaction = NULL; + binder_inner_proc_unlock(target_proc); + } + /* + * If the transaction has no target_proc, then + * t->buffer->transaction has already been cleared. + */ kfree(t); binder_stats_deleted(BINDER_STAT_TRANSACTION); } @@ -3426,10 +3436,12 @@ static int binder_thread_write(struct bi buffer->debug_id, buffer->transaction ? "active" : "finished");
+ binder_inner_proc_lock(proc); if (buffer->transaction) { buffer->transaction->buffer = NULL; buffer->transaction = NULL; } + binder_inner_proc_unlock(proc); if (buffer->async_transaction && buffer->target_node) { struct binder_node *buf_node; struct binder_work *w;
From: Phong Tran tranmanphong@gmail.com
commit f384e62a82ba5d85408405fdd6aeff89354deaa9 upstream.
The syzbot test with random endpoint address which made the idx is overflow in the table of endpoint configuations.
this adds the checking for fixing the error report from syzbot
KASAN: stack-out-of-bounds Read in hfcsusb_probe [1] The patch tested by syzbot [2]
Reported-by: syzbot+8750abbc3a46ef47d509@syzkaller.appspotmail.com
[1]: https://syzkaller.appspot.com/bug?id=30a04378dac680c5d521304a00a86156bb91352... [2]: https://groups.google.com/d/msg/syzkaller-bugs/_6HBdge8F3E/OJn7wVNpBAAJ
Signed-off-by: Phong Tran tranmanphong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c @@ -1963,6 +1963,9 @@ hfcsusb_probe(struct usb_interface *intf
/* get endpoint base */ idx = ((ep_addr & 0x7f) - 1) * 2; + if (idx > 15) + return -EIO; + if (ep_addr & 0x80) idx++; attr = ep->desc.bmAttributes;
From: Sean Young sean@mess.org
commit 6d0d1ff9ff21fbb06b867c13a1d41ce8ddcd8230 upstream.
au0828_usb_disconnect() gets the au0828_dev struct via usb_get_intfdata, so it needs to set up for the error paths.
Reported-by: syzbot+357d86bcb4cca1a2f572@syzkaller.appspotmail.com Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/usb/au0828/au0828-core.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/media/usb/au0828/au0828-core.c +++ b/drivers/media/usb/au0828/au0828-core.c @@ -623,6 +623,12 @@ static int au0828_usb_probe(struct usb_i /* Setup */ au0828_card_setup(dev);
+ /* + * Store the pointer to the au0828_dev so it can be accessed in + * au0828_usb_disconnect + */ + usb_set_intfdata(interface, dev); + /* Analog TV */ retval = au0828_analog_register(dev, interface); if (retval) { @@ -641,12 +647,6 @@ static int au0828_usb_probe(struct usb_i /* Remote controller */ au0828_rc_register(dev);
- /* - * Store the pointer to the au0828_dev so it can be accessed in - * au0828_usb_disconnect - */ - usb_set_intfdata(interface, dev); - pr_info("Registered device AU0828 [%s]\n", dev->board.name == NULL ? "Unset" : dev->board.name);
From: Fabio Estevam festevam@gmail.com
commit 265df32eae5845212ad9f55f5ae6b6dcb68b187b upstream.
The "WARNING" string confuses syzbot, which thinks it found a crash [1].
Change the string to avoid such problem.
[1] https://lkml.org/lkml/2019/5/9/243
Reported-by: syzbot+c1b25598aa60dcd47e78@syzkaller.appspotmail.com Suggested-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Fabio Estevam festevam@gmail.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/ath/ath10k/usb.c +++ b/drivers/net/wireless/ath/ath10k/usb.c @@ -1025,7 +1025,7 @@ static int ath10k_usb_probe(struct usb_i }
/* TODO: remove this once USB support is fully implemented */ - ath10k_warn(ar, "WARNING: ath10k USB support is incomplete, don't expect anything to work!\n"); + ath10k_warn(ar, "Warning: ath10k USB support is incomplete, don't expect anything to work!\n");
return 0;
From: Oliver Neukum oneukum@suse.com
commit eff73de2b1600ad8230692f00bc0ab49b166512a upstream.
Kasan reported a use after free in cpia2_usb_disconnect() It first freed everything and then woke up those waiting. The reverse order is correct.
Fixes: 6c493f8b28c67 ("[media] cpia2: major overhaul to get it in a working state again")
Signed-off-by: Oliver Neukum oneukum@suse.com Reported-by: syzbot+0c90fc937c84f97d0aa6@syzkaller.appspotmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/usb/cpia2/cpia2_usb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/media/usb/cpia2/cpia2_usb.c +++ b/drivers/media/usb/cpia2/cpia2_usb.c @@ -901,7 +901,6 @@ static void cpia2_usb_disconnect(struct cpia2_unregister_camera(cam); v4l2_device_disconnect(&cam->v4l2_dev); mutex_unlock(&cam->v4l2_lock); - v4l2_device_put(&cam->v4l2_dev);
if(cam->buffers) { DBG("Wakeup waiting processes\n"); @@ -913,6 +912,8 @@ static void cpia2_usb_disconnect(struct DBG("Releasing interface\n"); usb_driver_release_interface(&cpia2_driver, intf);
+ v4l2_device_put(&cam->v4l2_dev); + LOG("CPiA2 camera disconnected.\n"); }
From: Andrey Konovalov andreyknvl@google.com
commit 1753c7c4367aa1201e1e5d0a601897ab33444af1 upstream.
When the pvrusb2 driver detects that there's something wrong with the device, it prints a warning message. Right now those message are printed in two different formats:
1. ***WARNING*** message here 2. WARNING: message here
There's an issue with the second format. Syzkaller recognizes it as a message produced by a WARN_ON(), which is used to indicate a bug in the kernel. However pvrusb2 prints those warnings to indicate an issue with the device, not the bug in the kernel.
This patch changes the pvrusb2 driver to consistently use the first warning message format. This will unblock syzkaller testing of this driver.
Reported-by: syzbot+af8f8d2ac0d39b0ed3a0@syzkaller.appspotmail.com Reported-by: syzbot+170a86bf206dd2c6217e@syzkaller.appspotmail.com Signed-off-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 ++-- drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +++--- drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/media/usb/pvrusb2/pvrusb2-hdw.c +++ b/drivers/media/usb/pvrusb2/pvrusb2-hdw.c @@ -1680,7 +1680,7 @@ static int pvr2_decoder_enable(struct pv } if (!hdw->flag_decoder_missed) { pvr2_trace(PVR2_TRACE_ERROR_LEGS, - "WARNING: No decoder present"); + "***WARNING*** No decoder present"); hdw->flag_decoder_missed = !0; trace_stbit("flag_decoder_missed", hdw->flag_decoder_missed); @@ -2365,7 +2365,7 @@ struct pvr2_hdw *pvr2_hdw_create(struct if (hdw_desc->flag_is_experimental) { pvr2_trace(PVR2_TRACE_INFO, "**********"); pvr2_trace(PVR2_TRACE_INFO, - "WARNING: Support for this device (%s) is experimental.", + "***WARNING*** Support for this device (%s) is experimental.", hdw_desc->description); pvr2_trace(PVR2_TRACE_INFO, "Important functionality might not be entirely working."); --- a/drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c +++ b/drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c @@ -343,11 +343,11 @@ static int i2c_hack_cx25840(struct pvr2_
if ((ret != 0) || (*rdata == 0x04) || (*rdata == 0x0a)) { pvr2_trace(PVR2_TRACE_ERROR_LEGS, - "WARNING: Detected a wedged cx25840 chip; the device will not work."); + "***WARNING*** Detected a wedged cx25840 chip; the device will not work."); pvr2_trace(PVR2_TRACE_ERROR_LEGS, - "WARNING: Try power cycling the pvrusb2 device."); + "***WARNING*** Try power cycling the pvrusb2 device."); pvr2_trace(PVR2_TRACE_ERROR_LEGS, - "WARNING: Disabling further access to the device to prevent other foul-ups."); + "***WARNING*** Disabling further access to the device to prevent other foul-ups."); // This blocks all further communication with the part. hdw->i2c_func[0x44] = NULL; pvr2_hdw_render_useless(hdw); --- a/drivers/media/usb/pvrusb2/pvrusb2-std.c +++ b/drivers/media/usb/pvrusb2/pvrusb2-std.c @@ -353,7 +353,7 @@ struct v4l2_standard *pvr2_std_create_en bcnt = pvr2_std_id_to_str(buf,sizeof(buf),fmsk); pvr2_trace( PVR2_TRACE_ERROR_LEGS, - "WARNING: Failed to classify the following standard(s): %.*s", + "***WARNING*** Failed to classify the following standard(s): %.*s", bcnt,buf); }
From: Benjamin Coddington bcodding@redhat.com
commit 9f7761cf0409465075dadb875d5d4b8ef2f890c8 upstream.
Don't bail out before cleaning up a new allocation if the wait for searching for a matching nfs client is interrupted. Memory leaks.
Reported-by: syzbot+7fe11b49c1cc30e3fce2@syzkaller.appspotmail.com Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Signed-off-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -416,10 +416,10 @@ struct nfs_client *nfs_get_client(const clp = nfs_match_client(cl_init); if (clp) { spin_unlock(&nn->nfs_client_lock); - if (IS_ERR(clp)) - return clp; if (new) new->rpc_ops->free_client(new); + if (IS_ERR(clp)) + return clp; return nfs_found_client(cl_init, clp); } if (new) {
From: Luke Nowakowski-Krijger lnowakow@eng.ucsd.edu
commit c666355e60ddb4748ead3bdd983e3f7f2224aaf0 upstream.
Change devm_k*alloc to k*alloc to manually allocate memory
The manual allocation and freeing of memory is necessary because when the USB radio is disconnected, the memory associated with devm_k*alloc is freed. Meaning if we still have unresolved references to the radio device, then we get use-after-free errors.
This patch fixes this by manually allocating memory, and freeing it in the v4l2.release callback that gets called when the last radio device exits.
Reported-and-tested-by: syzbot+a4387f5b6b799f6becbf@syzkaller.appspotmail.com
Signed-off-by: Luke Nowakowski-Krijger lnowakow@eng.ucsd.edu Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl [hverkuil-cisco@xs4all.nl: cleaned up two small checkpatch.pl warnings] [hverkuil-cisco@xs4all.nl: prefix subject with driver name] Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/radio/radio-raremono.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-)
--- a/drivers/media/radio/radio-raremono.c +++ b/drivers/media/radio/radio-raremono.c @@ -283,6 +283,14 @@ static int vidioc_g_frequency(struct fil return 0; }
+static void raremono_device_release(struct v4l2_device *v4l2_dev) +{ + struct raremono_device *radio = to_raremono_dev(v4l2_dev); + + kfree(radio->buffer); + kfree(radio); +} + /* File system interface */ static const struct v4l2_file_operations usb_raremono_fops = { .owner = THIS_MODULE, @@ -307,12 +315,14 @@ static int usb_raremono_probe(struct usb struct raremono_device *radio; int retval = 0;
- radio = devm_kzalloc(&intf->dev, sizeof(struct raremono_device), GFP_KERNEL); - if (radio) - radio->buffer = devm_kmalloc(&intf->dev, BUFFER_LENGTH, GFP_KERNEL); - - if (!radio || !radio->buffer) + radio = kzalloc(sizeof(*radio), GFP_KERNEL); + if (!radio) + return -ENOMEM; + radio->buffer = kmalloc(BUFFER_LENGTH, GFP_KERNEL); + if (!radio->buffer) { + kfree(radio); return -ENOMEM; + }
radio->usbdev = interface_to_usbdev(intf); radio->intf = intf; @@ -336,7 +346,8 @@ static int usb_raremono_probe(struct usb if (retval != 3 || (get_unaligned_be16(&radio->buffer[1]) & 0xfff) == 0x0242) { dev_info(&intf->dev, "this is not Thanko's Raremono.\n"); - return -ENODEV; + retval = -ENODEV; + goto free_mem; }
dev_info(&intf->dev, "Thanko's Raremono connected: (%04X:%04X)\n", @@ -345,7 +356,7 @@ static int usb_raremono_probe(struct usb retval = v4l2_device_register(&intf->dev, &radio->v4l2_dev); if (retval < 0) { dev_err(&intf->dev, "couldn't register v4l2_device\n"); - return retval; + goto free_mem; }
mutex_init(&radio->lock); @@ -357,6 +368,7 @@ static int usb_raremono_probe(struct usb radio->vdev.ioctl_ops = &usb_raremono_ioctl_ops; radio->vdev.lock = &radio->lock; radio->vdev.release = video_device_release_empty; + radio->v4l2_dev.release = raremono_device_release;
usb_set_intfdata(intf, &radio->v4l2_dev);
@@ -372,6 +384,10 @@ static int usb_raremono_probe(struct usb } dev_err(&intf->dev, "could not register video device\n"); v4l2_device_unregister(&radio->v4l2_dev); + +free_mem: + kfree(radio->buffer); + kfree(radio); return retval; }
From: Dmitry Safonov dima@arista.com
commit effa467870c7612012885df4e246bdb8ffd8e44c upstream.
Intel VT-d driver was reworked to use common deferred flushing implementation. Previously there was one global per-cpu flush queue, afterwards - one per domain.
Before deferring a flush, the queue should be allocated and initialized.
Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush queue. It's probably worth to init it for static or unmanaged domains too, but it may be arguable - I'm leaving it to iommu folks.
Prevent queuing an iova flush if the domain doesn't have a queue. The defensive check seems to be worth to keep even if queue would be initialized for all kinds of domains. And is easy backportable.
On 4.19.43 stable kernel it has a user-visible effect: previously for devices in si domain there were crashes, on sata devices:
BUG: spinlock bad magic on CPU#6, swapper/0/1 lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x45/0x115 intel_unmap+0x107/0x113 intel_unmap_sg+0x6b/0x76 __ata_qc_complete+0x7f/0x103 ata_qc_complete+0x9b/0x26a ata_qc_complete_multiple+0xd0/0xe3 ahci_handle_port_interrupt+0x3ee/0x48a ahci_handle_port_intr+0x73/0xa9 ahci_single_level_irq_intr+0x40/0x60 __handle_irq_event_percpu+0x7f/0x19a handle_irq_event_percpu+0x32/0x72 handle_irq_event+0x38/0x56 handle_edge_irq+0x102/0x121 handle_irq+0x147/0x15c do_IRQ+0x66/0xf2 common_interrupt+0xf/0xf RIP: 0010:__do_softirq+0x8c/0x2df
The same for usb devices that use ehci-pci: BUG: spinlock bad magic on CPU#0, swapper/0/1 lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x77/0x145 intel_unmap+0x107/0x113 intel_unmap_page+0xe/0x10 usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d usb_hcd_unmap_urb_for_dma+0x17/0x100 unmap_urb_for_dma+0x22/0x24 __usb_hcd_giveback_urb+0x51/0xc3 usb_giveback_urb_bh+0x97/0xde tasklet_action_common.isra.4+0x5f/0xa1 tasklet_action+0x2d/0x30 __do_softirq+0x138/0x2df irq_exit+0x7d/0x8b smp_apic_timer_interrupt+0x10f/0x151 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
Cc: David Woodhouse dwmw2@infradead.org Cc: Joerg Roedel joro@8bytes.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: iommu@lists.linux-foundation.org Cc: stable@vger.kernel.org # 4.14+ Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Dmitry Safonov dima@arista.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de [v4.14-port notes: o minor conflict with untrusted IOMMU devices check under if-condition o setup_timer() near one chunk is timer_setup() in v5.3] Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iova.c | 18 ++++++++++++++---- include/linux/iova.h | 6 ++++++ 3 files changed, 21 insertions(+), 5 deletions(-)
--- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3702,7 +3702,7 @@ static void intel_unmap(struct device *d
freelist = domain_unmap(domain, start_pfn, last_pfn);
- if (intel_iommu_strict) { + if (intel_iommu_strict || !has_iova_flush_queue(&domain->iovad)) { iommu_flush_iotlb_psi(iommu, domain, start_pfn, nrpages, !freelist, 0); /* free iova */ --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -58,9 +58,14 @@ init_iova_domain(struct iova_domain *iov } EXPORT_SYMBOL_GPL(init_iova_domain);
+bool has_iova_flush_queue(struct iova_domain *iovad) +{ + return !!iovad->fq; +} + static void free_iova_flush_queue(struct iova_domain *iovad) { - if (!iovad->fq) + if (!has_iova_flush_queue(iovad)) return;
if (timer_pending(&iovad->fq_timer)) @@ -78,13 +83,14 @@ static void free_iova_flush_queue(struct int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor) { + struct iova_fq __percpu *queue; int cpu;
atomic64_set(&iovad->fq_flush_start_cnt, 0); atomic64_set(&iovad->fq_flush_finish_cnt, 0);
- iovad->fq = alloc_percpu(struct iova_fq); - if (!iovad->fq) + queue = alloc_percpu(struct iova_fq); + if (!queue) return -ENOMEM;
iovad->flush_cb = flush_cb; @@ -93,13 +99,17 @@ int init_iova_flush_queue(struct iova_do for_each_possible_cpu(cpu) { struct iova_fq *fq;
- fq = per_cpu_ptr(iovad->fq, cpu); + fq = per_cpu_ptr(queue, cpu); fq->head = 0; fq->tail = 0;
spin_lock_init(&fq->lock); }
+ smp_wmb(); + + iovad->fq = queue; + setup_timer(&iovad->fq_timer, fq_flush_timeout, (unsigned long)iovad); atomic_set(&iovad->fq_timer_on, 0);
--- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -154,6 +154,7 @@ struct iova *reserve_iova(struct iova_do void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to); void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn, unsigned long pfn_32bit); +bool has_iova_flush_queue(struct iova_domain *iovad); int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor); struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); @@ -234,6 +235,11 @@ static inline void init_iova_domain(stru { }
+bool has_iova_flush_queue(struct iova_domain *iovad) +{ + return false; +} + static inline int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
From: Joerg Roedel jroedel@suse.de
commit 201c1db90cd643282185a00770f12f95da330eca upstream.
The stub function for !CONFIG_IOMMU_IOVA needs to be 'static inline'.
Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no flush queue') Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/iova.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -235,7 +235,7 @@ static inline void init_iova_domain(stru { }
-bool has_iova_flush_queue(struct iova_domain *iovad) +static inline bool has_iova_flush_queue(struct iova_domain *iovad) { return false; }
From: Sunil Muthuswamy sunilmut@microsoft.com
commit a9eeb998c28d5506616426bd3a216bd5735a18b8 upstream.
Currently, hvsock does not implement any delayed or background close logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and the last reference to the socket is dropped, which leads to a call to .destruct where the socket can hang indefinitely waiting for the peer to close it's side. The can cause the user application to hang in the close() call.
This change implements proper STREAM(TCP) closing handshake mechanism by sending the FIN to the peer and the waiting for the peer's FIN to arrive for a given timeout. On timeout, it will try to terminate the connection (i.e. a RST). This is in-line with other socket providers such as virtio.
This change does not address the hang in the vmbus_hvsock_device_unregister where it waits indefinitely for the host to rescind the channel. That should be taken up as a separate fix.
Signed-off-by: Sunil Muthuswamy sunilmut@microsoft.com Reviewed-by: Dexuan Cui decui@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/vmw_vsock/hyperv_transport.c | 110 +++++++++++++++++++++++++++------------ 1 file changed, 78 insertions(+), 32 deletions(-)
--- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -35,6 +35,9 @@ /* The MTU is 16KB per the host side's design */ #define HVS_MTU_SIZE (1024 * 16)
+/* How long to wait for graceful shutdown of a connection */ +#define HVS_CLOSE_TIMEOUT (8 * HZ) + struct vmpipe_proto_header { u32 pkt_type; u32 data_size; @@ -290,19 +293,32 @@ static void hvs_channel_cb(void *ctx) sk->sk_write_space(sk); }
-static void hvs_close_connection(struct vmbus_channel *chan) +static void hvs_do_close_lock_held(struct vsock_sock *vsk, + bool cancel_timeout) { - struct sock *sk = get_per_channel_state(chan); - struct vsock_sock *vsk = vsock_sk(sk); - - lock_sock(sk); + struct sock *sk = sk_vsock(vsk);
- sk->sk_state = TCP_CLOSE; sock_set_flag(sk, SOCK_DONE); - vsk->peer_shutdown |= SEND_SHUTDOWN | RCV_SHUTDOWN; - + vsk->peer_shutdown = SHUTDOWN_MASK; + if (vsock_stream_has_data(vsk) <= 0) + sk->sk_state = TCP_CLOSING; sk->sk_state_change(sk); + if (vsk->close_work_scheduled && + (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) { + vsk->close_work_scheduled = false; + vsock_remove_sock(vsk); + + /* Release the reference taken while scheduling the timeout */ + sock_put(sk); + } +} + +static void hvs_close_connection(struct vmbus_channel *chan) +{ + struct sock *sk = get_per_channel_state(chan);
+ lock_sock(sk); + hvs_do_close_lock_held(vsock_sk(sk), true); release_sock(sk); }
@@ -446,50 +462,80 @@ static int hvs_connect(struct vsock_sock return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id); }
+static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode) +{ + struct vmpipe_proto_header hdr; + + if (hvs->fin_sent || !hvs->chan) + return; + + /* It can't fail: see hvs_channel_writable_bytes(). */ + (void)hvs_send_data(hvs->chan, (struct hvs_send_buf *)&hdr, 0); + hvs->fin_sent = true; +} + static int hvs_shutdown(struct vsock_sock *vsk, int mode) { struct sock *sk = sk_vsock(vsk); - struct vmpipe_proto_header hdr; - struct hvs_send_buf *send_buf; - struct hvsock *hvs;
if (!(mode & SEND_SHUTDOWN)) return 0;
lock_sock(sk); - - hvs = vsk->trans; - if (hvs->fin_sent) - goto out; - - send_buf = (struct hvs_send_buf *)&hdr; - - /* It can't fail: see hvs_channel_writable_bytes(). */ - (void)hvs_send_data(hvs->chan, send_buf, 0); - - hvs->fin_sent = true; -out: + hvs_shutdown_lock_held(vsk->trans, mode); release_sock(sk); return 0; }
-static void hvs_release(struct vsock_sock *vsk) +static void hvs_close_timeout(struct work_struct *work) { + struct vsock_sock *vsk = + container_of(work, struct vsock_sock, close_work.work); struct sock *sk = sk_vsock(vsk); - struct hvsock *hvs = vsk->trans; - struct vmbus_channel *chan;
+ sock_hold(sk); lock_sock(sk); + if (!sock_flag(sk, SOCK_DONE)) + hvs_do_close_lock_held(vsk, false);
- sk->sk_state = TCP_CLOSING; - vsock_remove_sock(vsk); - + vsk->close_work_scheduled = false; release_sock(sk); + sock_put(sk); +}
- chan = hvs->chan; - if (chan) - hvs_shutdown(vsk, RCV_SHUTDOWN | SEND_SHUTDOWN); +/* Returns true, if it is safe to remove socket; false otherwise */ +static bool hvs_close_lock_held(struct vsock_sock *vsk) +{ + struct sock *sk = sk_vsock(vsk); + + if (!(sk->sk_state == TCP_ESTABLISHED || + sk->sk_state == TCP_CLOSING)) + return true; + + if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK) + hvs_shutdown_lock_held(vsk->trans, SHUTDOWN_MASK); + + if (sock_flag(sk, SOCK_DONE)) + return true; + + /* This reference will be dropped by the delayed close routine */ + sock_hold(sk); + INIT_DELAYED_WORK(&vsk->close_work, hvs_close_timeout); + vsk->close_work_scheduled = true; + schedule_delayed_work(&vsk->close_work, HVS_CLOSE_TIMEOUT); + return false; +}
+static void hvs_release(struct vsock_sock *vsk) +{ + struct sock *sk = sk_vsock(vsk); + bool remove_sock; + + lock_sock(sk); + remove_sock = hvs_close_lock_held(vsk); + release_sock(sk); + if (remove_sock) + vsock_remove_sock(vsk); }
static void hvs_destruct(struct vsock_sock *vsk)
From: Vladis Dronov vdronov@redhat.com
commit b36a1552d7319bbfd5cf7f08726c23c5c66d4f73 upstream.
Certain ttys operations (pty_unix98_ops) lack tiocmget() and tiocmset() functions which are called by the certain HCI UART protocols (hci_ath, hci_bcm, hci_intel, hci_mrvl, hci_qca) via hci_uart_set_flow_control() or directly. This leads to an execution at NULL and can be triggered by an unprivileged user. Fix this by adding a helper function and a check for the missing tty operations in the protocols code.
This fixes CVE-2019-10207. The Fixes: lines list commits where calls to tiocm[gs]et() or hci_uart_set_flow_control() were added to the HCI UART protocols.
Link: https://syzkaller.appspot.com/bug?id=1b42faa2848963564a5b1b7f8c837ea7b55ffa5... Reported-by: syzbot+79337b501d6aa974d0f6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # v2.6.36+ Fixes: b3190df62861 ("Bluetooth: Support for Atheros AR300x serial chip") Fixes: 118612fb9165 ("Bluetooth: hci_bcm: Add suspend/resume PM functions") Fixes: ff2895592f0f ("Bluetooth: hci_intel: Add Intel baudrate configuration support") Fixes: 162f812f23ba ("Bluetooth: hci_uart: Add Marvell support") Fixes: fa9ad876b8e0 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth chip wcn3990") Signed-off-by: Vladis Dronov vdronov@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Reviewed-by: Yu-Chen, Cho acho@suse.com Tested-by: Yu-Chen, Cho acho@suse.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bluetooth/hci_ath.c | 3 +++ drivers/bluetooth/hci_bcm.c | 3 +++ drivers/bluetooth/hci_intel.c | 3 +++ drivers/bluetooth/hci_ldisc.c | 13 +++++++++++++ drivers/bluetooth/hci_mrvl.c | 3 +++ drivers/bluetooth/hci_uart.h | 1 + 6 files changed, 26 insertions(+)
--- a/drivers/bluetooth/hci_ath.c +++ b/drivers/bluetooth/hci_ath.c @@ -101,6 +101,9 @@ static int ath_open(struct hci_uart *hu)
BT_DBG("hu %p", hu);
+ if (!hci_uart_has_flow_control(hu)) + return -EOPNOTSUPP; + ath = kzalloc(sizeof(*ath), GFP_KERNEL); if (!ath) return -ENOMEM; --- a/drivers/bluetooth/hci_bcm.c +++ b/drivers/bluetooth/hci_bcm.c @@ -305,6 +305,9 @@ static int bcm_open(struct hci_uart *hu)
bt_dev_dbg(hu->hdev, "hu %p", hu);
+ if (!hci_uart_has_flow_control(hu)) + return -EOPNOTSUPP; + bcm = kzalloc(sizeof(*bcm), GFP_KERNEL); if (!bcm) return -ENOMEM; --- a/drivers/bluetooth/hci_intel.c +++ b/drivers/bluetooth/hci_intel.c @@ -406,6 +406,9 @@ static int intel_open(struct hci_uart *h
BT_DBG("hu %p", hu);
+ if (!hci_uart_has_flow_control(hu)) + return -EOPNOTSUPP; + intel = kzalloc(sizeof(*intel), GFP_KERNEL); if (!intel) return -ENOMEM; --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -297,6 +297,19 @@ static int hci_uart_send_frame(struct hc return 0; }
+/* Check the underlying device or tty has flow control support */ +bool hci_uart_has_flow_control(struct hci_uart *hu) +{ + /* serdev nodes check if the needed operations are present */ + if (hu->serdev) + return true; + + if (hu->tty->driver->ops->tiocmget && hu->tty->driver->ops->tiocmset) + return true; + + return false; +} + /* Flow control or un-flow control the device */ void hci_uart_set_flow_control(struct hci_uart *hu, bool enable) { --- a/drivers/bluetooth/hci_mrvl.c +++ b/drivers/bluetooth/hci_mrvl.c @@ -66,6 +66,9 @@ static int mrvl_open(struct hci_uart *hu
BT_DBG("hu %p", hu);
+ if (!hci_uart_has_flow_control(hu)) + return -EOPNOTSUPP; + mrvl = kzalloc(sizeof(*mrvl), GFP_KERNEL); if (!mrvl) return -ENOMEM; --- a/drivers/bluetooth/hci_uart.h +++ b/drivers/bluetooth/hci_uart.h @@ -117,6 +117,7 @@ void hci_uart_unregister_device(struct h int hci_uart_tx_wakeup(struct hci_uart *hu); int hci_uart_init_ready(struct hci_uart *hu); void hci_uart_set_baudrate(struct hci_uart *hu, unsigned int speed); +bool hci_uart_has_flow_control(struct hci_uart *hu); void hci_uart_set_flow_control(struct hci_uart *hu, bool enable); void hci_uart_set_speeds(struct hci_uart *hu, unsigned int init_speed, unsigned int oper_speed);
From: Jann Horn jannh@google.com
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
When going through execve(), zero out the NUMA fault statistics instead of freeing them.
During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve.
Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will@kernel.org Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/exec.c | 2 +- include/linux/sched/numa_balancing.h | 4 ++-- kernel/fork.c | 2 +- kernel/sched/fair.c | 24 ++++++++++++++++++++---- 4 files changed, 24 insertions(+), 8 deletions(-)
--- a/fs/exec.c +++ b/fs/exec.c @@ -1808,7 +1808,7 @@ static int do_execveat_common(int fd, st current->in_execve = 0; membarrier_execve(current); acct_update_integrals(current); - task_numa_free(current); + task_numa_free(current, false); free_bprm(bprm); kfree(pathbuf); putname(filename); --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -19,7 +19,7 @@ extern void task_numa_fault(int last_node, int node, int pages, int flags); extern pid_t task_numa_group_id(struct task_struct *p); extern void set_numabalancing_state(bool enabled); -extern void task_numa_free(struct task_struct *p); +extern void task_numa_free(struct task_struct *p, bool final); extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page, int src_nid, int dst_cpu); #else @@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(s static inline void set_numabalancing_state(bool enabled) { } -static inline void task_numa_free(struct task_struct *p) +static inline void task_numa_free(struct task_struct *p, bool final) { } static inline bool should_numa_migrate_memory(struct task_struct *p, --- a/kernel/fork.c +++ b/kernel/fork.c @@ -415,7 +415,7 @@ void __put_task_struct(struct task_struc WARN_ON(tsk == current);
cgroup_free(tsk); - task_numa_free(tsk); + task_numa_free(tsk, true); security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2358,13 +2358,23 @@ no_join: return; }
-void task_numa_free(struct task_struct *p) +/* + * Get rid of NUMA staticstics associated with a task (either current or dead). + * If @final is set, the task is dead and has reached refcount zero, so we can + * safely free all relevant data structures. Otherwise, there might be + * concurrent reads from places like load balancing and procfs, and we should + * reset the data back to default state without freeing ->numa_faults. + */ +void task_numa_free(struct task_struct *p, bool final) { struct numa_group *grp = p->numa_group; - void *numa_faults = p->numa_faults; + unsigned long *numa_faults = p->numa_faults; unsigned long flags; int i;
+ if (!numa_faults) + return; + if (grp) { spin_lock_irqsave(&grp->lock, flags); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) @@ -2377,8 +2387,14 @@ void task_numa_free(struct task_struct * put_numa_group(grp); }
- p->numa_faults = NULL; - kfree(numa_faults); + if (final) { + p->numa_faults = NULL; + kfree(numa_faults); + } else { + p->total_numa_faults = 0; + for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) + numa_faults[i] = 0; + } }
/*
From: Miroslav Lichvar mlichvar@redhat.com
commit 5515e9a6273b8c02034466bcbd717ac9f53dab99 upstream.
The PPS assert/clear offset corrections are set by the PPS_SETPARAMS ioctl in the pps_ktime structs, which also contain flags. The flags are not initialized by applications (using the timepps.h header) and they are not used by the kernel for anything except returning them back in the PPS_GETPARAMS ioctl.
Set the flags to zero to make it clear they are unused and avoid leaking uninitialized data of the PPS_SETPARAMS caller to other applications that have a read access to the PPS device.
Link: http://lkml.kernel.org/r/20190702092251.24303-1-mlichvar@redhat.com Signed-off-by: Miroslav Lichvar mlichvar@redhat.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Rodolfo Giometti giometti@enneenne.com Cc: Greg KH greg@kroah.com Cc: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pps/pps.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/pps/pps.c +++ b/drivers/pps/pps.c @@ -166,6 +166,14 @@ static long pps_cdev_ioctl(struct file * pps->params.mode |= PPS_CANWAIT; pps->params.api_version = PPS_API_VERS;
+ /* + * Clear unused fields of pps_kparams to avoid leaking + * uninitialized data of the PPS_SETPARAMS caller via + * PPS_GETPARAMS + */ + pps->params.assert_off_tu.flags = 0; + pps->params.clear_off_tu.flags = 0; + spin_unlock_irq(&pps->lock);
break;
From: Yoshinori Sato ysato@users.sourceforge.jp
commit 1b496469d0c020e09124e03e66a81421c21272a7 upstream.
Conflict JCore-SoC and SolutionEngine 7619.
Signed-off-by: Yoshinori Sato ysato@users.sourceforge.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/sh/boards/Kconfig | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
--- a/arch/sh/boards/Kconfig +++ b/arch/sh/boards/Kconfig @@ -8,27 +8,19 @@ config SH_ALPHA_BOARD bool
config SH_DEVICE_TREE - bool "Board Described by Device Tree" + bool select OF select OF_EARLY_FLATTREE select TIMER_OF select COMMON_CLK select GENERIC_CALIBRATE_DELAY - help - Select Board Described by Device Tree to build a kernel that - does not hard-code any board-specific knowledge but instead uses - a device tree blob provided by the boot-loader. You must enable - drivers for any hardware you want to use separately. At this - time, only boards based on the open-hardware J-Core processors - have sufficient driver coverage to use this option; do not - select it if you are using original SuperH hardware.
config SH_JCORE_SOC bool "J-Core SoC" - depends on SH_DEVICE_TREE && (CPU_SH2 || CPU_J2) + select SH_DEVICE_TREE select CLKSRC_JCORE_PIT select JCORE_AIC - default y if CPU_J2 + depends on CPU_J2 help Select this option to include drivers core components of the J-Core SoC, including interrupt controllers and timers.
From: Yan, Zheng zyan@redhat.com
commit d6e47819721ae2d9d090058ad5570a66f3c42e39 upstream.
ceph_d_revalidate(, LOOKUP_RCU) may call __ceph_caps_issued_mask() on a freeing inode.
Signed-off-by: "Yan, Zheng" zyan@redhat.com Reviewed-by: Jeff Layton jlayton@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/caps.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1119,20 +1119,23 @@ static int send_cap_msg(struct cap_msg_a }
/* - * Queue cap releases when an inode is dropped from our cache. Since - * inode is about to be destroyed, there is no need for i_ceph_lock. + * Queue cap releases when an inode is dropped from our cache. */ void ceph_queue_caps_release(struct inode *inode) { struct ceph_inode_info *ci = ceph_inode(inode); struct rb_node *p;
+ /* lock i_ceph_lock, because ceph_d_revalidate(..., LOOKUP_RCU) + * may call __ceph_caps_issued_mask() on a freeing inode. */ + spin_lock(&ci->i_ceph_lock); p = rb_first(&ci->i_caps); while (p) { struct ceph_cap *cap = rb_entry(p, struct ceph_cap, ci_node); p = rb_next(p); __ceph_remove_cap(cap, true); } + spin_unlock(&ci->i_ceph_lock); }
/*
From: Jann Horn jannh@google.com
commit cb361d8cdef69990f6b4504dc1fd9a594d983c97 upstream.
The old code used RCU annotations and accessors inconsistently for ->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such issues.
Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will@kernel.org Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults") Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
diff --git a/include/linux/sched.h b/include/linux/sched.h index 8dc1811487f5..9f51932bd543 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1092,7 +1092,15 @@ struct task_struct { u64 last_sum_exec_runtime; struct callback_head numa_work;
- struct numa_group *numa_group; + /* + * This pointer is only modified for current in syscall and + * pagefault context (and for tasks being destroyed), so it can be read + * from any of the following contexts: + * - RCU read-side critical section + * - current->numa_group from everywhere + * - task's runqueue locked, task not running + */ + struct numa_group __rcu *numa_group;
/* * numa_faults is an array split into four regions: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6adb0e0f5feb..bc9cfeaac8bd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1086,6 +1086,21 @@ struct numa_group { unsigned long faults[0]; };
+/* + * For functions that can be called in multiple contexts that permit reading + * ->numa_group (see struct task_struct for locking rules). + */ +static struct numa_group *deref_task_numa_group(struct task_struct *p) +{ + return rcu_dereference_check(p->numa_group, p == current || + (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu))); +} + +static struct numa_group *deref_curr_numa_group(struct task_struct *p) +{ + return rcu_dereference_protected(p->numa_group, p == current); +} + static inline unsigned long group_faults_priv(struct numa_group *ng); static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p) { unsigned long smin = task_scan_min(p); unsigned long period = smin; + struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */ - if (p->numa_group) { - struct numa_group *ng = p->numa_group; + rcu_read_lock(); + ng = rcu_dereference(p->numa_group); + if (ng) { unsigned long shared = group_faults_shared(ng); unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p) period *= shared + 1; period /= private + shared + 1; } + rcu_read_unlock();
return max(smin, period); } @@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p) { unsigned long smin = task_scan_min(p); unsigned long smax; + struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */ smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */ - if (p->numa_group) { - struct numa_group *ng = p->numa_group; + ng = deref_curr_numa_group(p); + if (ng) { unsigned long shared = group_faults_shared(ng); unsigned long private = group_faults_priv(ng); unsigned long period = smax; @@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) p->numa_scan_period = sysctl_numa_balancing_scan_delay; p->numa_work.next = &p->numa_work; p->numa_faults = NULL; - p->numa_group = NULL; + RCU_INIT_POINTER(p->numa_group, NULL); p->last_task_numa_placement = 0; p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p) { - return p->numa_group ? p->numa_group->gid : 0; + struct numa_group *ng; + pid_t gid = 0; + + rcu_read_lock(); + ng = rcu_dereference(p->numa_group); + if (ng) + gid = ng->gid; + rcu_read_unlock(); + + return gid; }
/* @@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid) { - if (!p->numa_group) + struct numa_group *ng = deref_task_numa_group(p); + + if (!ng) return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] + - p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)]; + return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] + + ng->faults[task_faults_idx(NUMA_MEM, nid, 1)]; }
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) @@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid, static inline unsigned long group_weight(struct task_struct *p, int nid, int dist) { + struct numa_group *ng = deref_task_numa_group(p); unsigned long faults, total_faults;
- if (!p->numa_group) + if (!ng) return 0;
- total_faults = p->numa_group->total_faults; + total_faults = ng->total_faults;
if (!total_faults) return 0; @@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid, bool should_numa_migrate_memory(struct task_struct *p, struct page * page, int src_nid, int dst_cpu) { - struct numa_group *ng = p->numa_group; + struct numa_group *ng = deref_curr_numa_group(p); int dst_nid = cpu_to_node(dst_cpu); int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load, static void task_numa_compare(struct task_numa_env *env, long taskimp, long groupimp, bool maymove) { + struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p); struct rq *dst_rq = cpu_rq(env->dst_cpu); + long imp = p_ng ? groupimp : taskimp; struct task_struct *cur; long src_load, dst_load; - long load; - long imp = env->p->numa_group ? groupimp : taskimp; - long moveimp = imp; int dist = env->dist; + long moveimp = imp; + long load;
if (READ_ONCE(dst_rq->numa_migrate_on)) return; @@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env, * If dst and source tasks are in the same NUMA group, or not * in any group then look only at task weights. */ - if (cur->numa_group == env->p->numa_group) { + cur_ng = rcu_dereference(cur->numa_group); + if (cur_ng == p_ng) { imp = taskimp + task_weight(cur, env->src_nid, dist) - task_weight(cur, env->dst_nid, dist); /* * Add some hysteresis to prevent swapping the * tasks within a group over tiny differences. */ - if (cur->numa_group) + if (cur_ng) imp -= imp / 16; } else { /* * Compare the group weights. If a task is all by itself * (not part of a group), use the task weight instead. */ - if (cur->numa_group && env->p->numa_group) + if (cur_ng && p_ng) imp += group_weight(cur, env->src_nid, dist) - group_weight(cur, env->dst_nid, dist); else @@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p) .best_imp = 0, .best_cpu = -1, }; + unsigned long taskweight, groupweight; struct sched_domain *sd; + long taskimp, groupimp; + struct numa_group *ng; struct rq *best_rq; - unsigned long taskweight, groupweight; int nid, ret, dist; - long taskimp, groupimp;
/* * Pick the lowest SD_NUMA domain, as that would have the smallest @@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p) * multiple NUMA nodes; in order to better consolidate the group, * we need to check other locations. */ - if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) { + ng = deref_curr_numa_group(p); + if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) { for_each_online_node(nid) { if (nid == env.src_nid || nid == p->numa_preferred_nid) continue; @@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p) * A task that migrated to a second choice node will be better off * trying for a better one later. Do not set the preferred node here. */ - if (p->numa_group) { + if (ng) { if (env.best_cpu == -1) nid = env.src_nid; else @@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p) unsigned long total_faults; u64 runtime, period; spinlock_t *group_lock = NULL; + struct numa_group *ng;
/* * The p->mm->numa_scan_seq field gets updated without @@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p) runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */ - if (p->numa_group) { - group_lock = &p->numa_group->lock; + ng = deref_curr_numa_group(p); + if (ng) { + group_lock = &ng->lock; spin_lock_irq(group_lock); }
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p) p->numa_faults[cpu_idx] += f_diff; faults += p->numa_faults[mem_idx]; p->total_numa_faults += diff; - if (p->numa_group) { + if (ng) { /* * safe because we can only change our own group * @@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p) * nid and priv in a specific region because it * is at the beginning of the numa_faults array. */ - p->numa_group->faults[mem_idx] += diff; - p->numa_group->faults_cpu[mem_idx] += f_diff; - p->numa_group->total_faults += diff; - group_faults += p->numa_group->faults[mem_idx]; + ng->faults[mem_idx] += diff; + ng->faults_cpu[mem_idx] += f_diff; + ng->total_faults += diff; + group_faults += ng->faults[mem_idx]; } }
- if (!p->numa_group) { + if (!ng) { if (faults > max_faults) { max_faults = faults; max_nid = nid; @@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p) } }
- if (p->numa_group) { - numa_group_count_active_nodes(p->numa_group); + if (ng) { + numa_group_count_active_nodes(ng); spin_unlock_irq(group_lock); max_nid = preferred_group_nid(p, max_nid); } @@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, int cpu = cpupid_to_cpu(cpupid); int i;
- if (unlikely(!p->numa_group)) { + if (unlikely(!deref_curr_numa_group(p))) { unsigned int size = sizeof(struct numa_group) + 4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, if (!grp) goto no_join;
- my_grp = p->numa_group; + my_grp = deref_curr_numa_group(p); if (grp == my_grp) goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, */ void task_numa_free(struct task_struct *p, bool final) { - struct numa_group *grp = p->numa_group; + /* safe: p either is current or is being freed by current */ + struct numa_group *grp = rcu_dereference_raw(p->numa_group); unsigned long *numa_faults = p->numa_faults; unsigned long flags; int i; @@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags) * actively using should be counted as local. This allows the * scan rate to slow down when a workload has settled down. */ - ng = p->numa_group; + ng = deref_curr_numa_group(p); if (!priv && !local && ng && ng->active_nodes > 1 && numa_is_active_node(cpu_node, ng) && numa_is_active_node(mem_node, ng)) @@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m) { int node; unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0; + struct numa_group *ng;
+ rcu_read_lock(); + ng = rcu_dereference(p->numa_group); for_each_online_node(node) { if (p->numa_faults) { tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)]; tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)]; } - if (p->numa_group) { - gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)], - gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)]; + if (ng) { + gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)], + gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)]; } print_numa_stats(m, node, tsf, tpf, gsf, gpf); } + rcu_read_unlock(); } #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_SCHED_DEBUG */
On Fri, Aug 02, 2019 at 11:39:57AM +0200, Greg Kroah-Hartman wrote:
From: Jann Horn jannh@google.com
commit cb361d8cdef69990f6b4504dc1fd9a594d983c97 upstream.
The old code used RCU annotations and accessors inconsistently for ->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such issues.
Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will@kernel.org Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults") Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Oops, nope, this didn't apply, sorry about that, it needs to be properly backported to 4.14.y and older before it can go here.
thanks,
greg k-h
On Fri, Aug 02, 2019 at 11:39:32AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 24 tests: 24 pass, 0 fail
Linux version: 4.14.136-rc1-g0931704b8bef Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Thierry
On Fri, Aug 02, 2019 at 11:39:32AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
-rc2 is out to match up with the failure of one patch to apply, and the ip tunnel patch added.
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc...
thanks,
greg k-h
On Fri, 2 Aug 2019 at 21:20, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Fri, Aug 02, 2019 at 11:39:32AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
-rc2 is out to match up with the failure of one patch to apply, and the ip tunnel patch added.
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.136-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 8c06cc9d417294ee552cee5025f8dad9a982a026 git describe: v4.14.134-319-g8c06cc9d4172 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.134-3...
No regressions (compared to build v4.14.134)
No fixes (compared to build v4.14.134)
Ran 23868 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-ipc-tests * ltp-timers-tests * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 8/2/19 3:39 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 8/2/19 2:39 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.136 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 346 pass: 346 fail: 0
Guenter
linux-stable-mirror@lists.linaro.org