commit 4a204f7895878363ca8211f50ec610408c8c70aa upstream.
Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
by the architecture. The number of bits consumed by hardware is model
specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
to enumerate the "true" size. So, KVM must allow using all bits, else it
risks rejecting completely legal x2APIC IDs on newer CPUs.
This means KVM relies on hardware to not assign x2APIC IDs that exceed the
"true" width of the field, but presumably hardware is smart enough to tie
the width to the max x2APIC ID. KVM also relies on hardware to support at
least 8 bits, as the legacy xAPIC ID is writable by software. But, those
assumptions are unavoidable due to the lack of any way to enumerate the
"true" width.
Note:
This patch has been modified from the upstream commit due to the conflict
caused by the commit 391503528257 ("KVM: x86: SVM: move avic definitions
from AMD's spec to svm.h")
Please apply this patch to the following kernel tree:
* 4.14-stable
* 4.9-stable
* 4.19-stable
* 5.4-stable
* 5.10-stable
* 5.15-stable
* 5.16-stable
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Message-Id: <20220211000851.185799-1-suravee.suthikulpanit(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/svm/avic.c | 7 +------
arch/x86/kvm/svm/svm.h | 2 +-
2 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 212af871ca74..be0bc06797e6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -943,15 +943,10 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
u64 entry;
- /* ID = 0xff (broadcast), ID > 0xff (reserved) */
int h_physical_id = kvm_cpu_get_apicid(cpu);
struct vcpu_svm *svm = to_svm(vcpu);
- /*
- * Since the host physical APIC id is 8 bits,
- * we can support host APIC ID upto 255.
- */
- if (WARN_ON(h_physical_id > AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK))
+ if (WARN_ON(h_physical_id & ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK))
return;
entry = READ_ONCE(*(svm->avic_physical_id_cache));
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1b460e539926..47e4fd75caa9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -508,7 +508,7 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31
#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
-#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK (0xFFULL)
+#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12)
#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
--
2.17.1
On Fri, Apr 08, 2022 at 07:53:42PM -0400, Joshua Freedman wrote:
> I got the first bisect done, but not sure what do bisect after that.....
What is the result, good or bad? What does 'git bisect log' show?
thanks,
greg k-h
Commit 87ebbb8c612b broke suspend-to-ram on kernel 5.17, trying to resume
with it results in an unusable system. I've tried reverting it from 5.17
and found that resuming works once again.
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Tue, 26 Oct 2021 18:01:07 -0400
Subject: [PATCH] SUNRPC: Prevent immediate close+reconnect
If we have already set up the socket and are waiting for it to connect,
then don't immediately close and retry.
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 691fe5a682b6..a02de2bddb28 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -767,7 +767,8 @@ EXPORT_SYMBOL_GPL(xprt_disconnect_done);
*/
static void xprt_schedule_autoclose_locked(struct rpc_xprt *xprt)
{
- set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ if (test_and_set_bit(XPRT_CLOSE_WAIT, &xprt->state))
+ return;
if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
queue_work(xprtiod_workqueue, &xprt->task_cleanup);
else if (xprt->snd_task && !test_bit(XPRT_SND_IS_COOKIE, &xprt->state))
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7fb302e202bc..ae48c9c84ee1 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2314,7 +2314,7 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL) {
+ if (transport->sock != NULL && !xprt_connecting(xprt)) {
dprintk("RPC: xs_connect delayed xprt %p for %lu "
"seconds\n",
xprt, xprt->reestablish_timeout / HZ);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 89f42494f92f448747bd8a7ab1ae8b5d5520577d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Wed, 16 Mar 2022 19:10:43 -0400
Subject: [PATCH] SUNRPC: Don't call connect() more than once on a TCP socket
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz(a)sigma-chemnitz.de>
Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 8c2a712cb242..689062afdd61 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -89,5 +89,6 @@ struct sock_xprt {
#define XPRT_SOCK_WAKE_WRITE (5)
#define XPRT_SOCK_WAKE_PENDING (6)
#define XPRT_SOCK_WAKE_DISCONNECT (7)
+#define XPRT_SOCK_CONNECT_SENT (8)
#endif /* _LINUX_SUNRPC_XPRTSOCK_H */
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7e39f87cde2d..8f8a03c3315a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2235,10 +2235,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
if (atomic_read(&xprt->swapper))
current->flags |= PF_MEMALLOC;
- if (!sock) {
- sock = xs_create_sock(xprt, transport,
- xs_addr(xprt)->sa_family, SOCK_STREAM,
- IPPROTO_TCP, true);
+
+ if (xprt_connected(xprt))
+ goto out;
+ if (test_and_clear_bit(XPRT_SOCK_CONNECT_SENT,
+ &transport->sock_state) ||
+ !sock) {
+ xs_reset_transport(transport);
+ sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family,
+ SOCK_STREAM, IPPROTO_TCP, true);
if (IS_ERR(sock)) {
xprt_wake_pending_tasks(xprt, PTR_ERR(sock));
goto out;
@@ -2262,6 +2267,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
fallthrough;
case -EINPROGRESS:
/* SYN_SENT! */
+ set_bit(XPRT_SOCK_CONNECT_SENT, &transport->sock_state);
if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
fallthrough;
@@ -2323,13 +2329,9 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL && !xprt_connecting(xprt)) {
+ if (transport->sock != NULL) {
dprintk("RPC: xs_connect delayed xprt %p for %lu "
- "seconds\n",
- xprt, xprt->reestablish_timeout / HZ);
-
- /* Start by resetting any existing state */
- xs_reset_transport(transport);
+ "seconds\n", xprt, xprt->reestablish_timeout / HZ);
delay = xprt_reconnect_delay(xprt);
xprt_reconnect_backoff(xprt, XS_TCP_INIT_REEST_TO);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f00432063db1a0db484e85193eccc6845435b80e Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Sun, 3 Apr 2022 15:58:11 -0400
Subject: [PATCH] SUNRPC: Ensure we flush any closed sockets before
xs_xprt_free()
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.
Reported-by: Felix Fu <foyjog(a)gmail.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable(a)vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable(a)vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable(a)vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/fs/file_table.c b/fs/file_table.c
index 7d2e692b66a9..ada8fe814db9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -412,6 +412,7 @@ void __fput_sync(struct file *file)
}
EXPORT_SYMBOL(fput);
+EXPORT_SYMBOL(__fput_sync);
void __init files_init(void)
{
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index ac33892da411..a4848c7bab80 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -1004,7 +1004,6 @@ DEFINE_RPC_XPRT_LIFETIME_EVENT(connect);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_auto);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_done);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_force);
-DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_cleanup);
DEFINE_RPC_XPRT_LIFETIME_EVENT(destroy);
DECLARE_EVENT_CLASS(rpc_xprt_event,
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 73344ffb2692..ad62eba540a4 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -930,12 +930,7 @@ void xprt_connect(struct rpc_task *task)
if (!xprt_lock_write(xprt, task))
return;
- if (test_and_clear_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
- trace_xprt_disconnect_cleanup(xprt);
- xprt->ops->close(xprt);
- }
-
- if (!xprt_connected(xprt)) {
+ if (!xprt_connected(xprt) && !test_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie;
rpc_sleep_on_timeout(&xprt->pending, task, NULL,
xprt_request_timeout(task->tk_rqstp));
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 9b75891b3cc0..c6a13893e308 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -879,7 +879,7 @@ static int xs_local_send_request(struct rpc_rqst *req)
/* Close the stream if the previous transmission was incomplete */
if (xs_send_request_was_aborted(transport, req)) {
- xs_close(xprt);
+ xprt_force_disconnect(xprt);
return -ENOTCONN;
}
@@ -915,7 +915,7 @@ static int xs_local_send_request(struct rpc_rqst *req)
-status);
fallthrough;
case -EPIPE:
- xs_close(xprt);
+ xprt_force_disconnect(xprt);
status = -ENOTCONN;
}
@@ -1185,6 +1185,16 @@ static void xs_reset_transport(struct sock_xprt *transport)
if (sk == NULL)
return;
+ /*
+ * Make sure we're calling this in a context from which it is safe
+ * to call __fput_sync(). In practice that means rpciod and the
+ * system workqueue.
+ */
+ if (!(current->flags & PF_WQ_WORKER)) {
+ WARN_ON_ONCE(1);
+ set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ return;
+ }
if (atomic_read(&transport->xprt.swapper))
sk_clear_memalloc(sk);
@@ -1208,7 +1218,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
mutex_unlock(&transport->recv_mutex);
trace_rpc_socket_close(xprt, sock);
- fput(filp);
+ __fput_sync(filp);
xprt_disconnect_done(xprt);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f00432063db1a0db484e85193eccc6845435b80e Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Sun, 3 Apr 2022 15:58:11 -0400
Subject: [PATCH] SUNRPC: Ensure we flush any closed sockets before
xs_xprt_free()
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.
Reported-by: Felix Fu <foyjog(a)gmail.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable(a)vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable(a)vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable(a)vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/fs/file_table.c b/fs/file_table.c
index 7d2e692b66a9..ada8fe814db9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -412,6 +412,7 @@ void __fput_sync(struct file *file)
}
EXPORT_SYMBOL(fput);
+EXPORT_SYMBOL(__fput_sync);
void __init files_init(void)
{
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index ac33892da411..a4848c7bab80 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -1004,7 +1004,6 @@ DEFINE_RPC_XPRT_LIFETIME_EVENT(connect);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_auto);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_done);
DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_force);
-DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_cleanup);
DEFINE_RPC_XPRT_LIFETIME_EVENT(destroy);
DECLARE_EVENT_CLASS(rpc_xprt_event,
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 73344ffb2692..ad62eba540a4 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -930,12 +930,7 @@ void xprt_connect(struct rpc_task *task)
if (!xprt_lock_write(xprt, task))
return;
- if (test_and_clear_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
- trace_xprt_disconnect_cleanup(xprt);
- xprt->ops->close(xprt);
- }
-
- if (!xprt_connected(xprt)) {
+ if (!xprt_connected(xprt) && !test_bit(XPRT_CLOSE_WAIT, &xprt->state)) {
task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie;
rpc_sleep_on_timeout(&xprt->pending, task, NULL,
xprt_request_timeout(task->tk_rqstp));
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 9b75891b3cc0..c6a13893e308 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -879,7 +879,7 @@ static int xs_local_send_request(struct rpc_rqst *req)
/* Close the stream if the previous transmission was incomplete */
if (xs_send_request_was_aborted(transport, req)) {
- xs_close(xprt);
+ xprt_force_disconnect(xprt);
return -ENOTCONN;
}
@@ -915,7 +915,7 @@ static int xs_local_send_request(struct rpc_rqst *req)
-status);
fallthrough;
case -EPIPE:
- xs_close(xprt);
+ xprt_force_disconnect(xprt);
status = -ENOTCONN;
}
@@ -1185,6 +1185,16 @@ static void xs_reset_transport(struct sock_xprt *transport)
if (sk == NULL)
return;
+ /*
+ * Make sure we're calling this in a context from which it is safe
+ * to call __fput_sync(). In practice that means rpciod and the
+ * system workqueue.
+ */
+ if (!(current->flags & PF_WQ_WORKER)) {
+ WARN_ON_ONCE(1);
+ set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ return;
+ }
if (atomic_read(&transport->xprt.swapper))
sk_clear_memalloc(sk);
@@ -1208,7 +1218,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
mutex_unlock(&transport->recv_mutex);
trace_rpc_socket_close(xprt, sock);
- fput(filp);
+ __fput_sync(filp);
xprt_disconnect_done(xprt);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 89f42494f92f448747bd8a7ab1ae8b5d5520577d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Wed, 16 Mar 2022 19:10:43 -0400
Subject: [PATCH] SUNRPC: Don't call connect() more than once on a TCP socket
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz(a)sigma-chemnitz.de>
Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 8c2a712cb242..689062afdd61 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -89,5 +89,6 @@ struct sock_xprt {
#define XPRT_SOCK_WAKE_WRITE (5)
#define XPRT_SOCK_WAKE_PENDING (6)
#define XPRT_SOCK_WAKE_DISCONNECT (7)
+#define XPRT_SOCK_CONNECT_SENT (8)
#endif /* _LINUX_SUNRPC_XPRTSOCK_H */
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7e39f87cde2d..8f8a03c3315a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2235,10 +2235,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
if (atomic_read(&xprt->swapper))
current->flags |= PF_MEMALLOC;
- if (!sock) {
- sock = xs_create_sock(xprt, transport,
- xs_addr(xprt)->sa_family, SOCK_STREAM,
- IPPROTO_TCP, true);
+
+ if (xprt_connected(xprt))
+ goto out;
+ if (test_and_clear_bit(XPRT_SOCK_CONNECT_SENT,
+ &transport->sock_state) ||
+ !sock) {
+ xs_reset_transport(transport);
+ sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family,
+ SOCK_STREAM, IPPROTO_TCP, true);
if (IS_ERR(sock)) {
xprt_wake_pending_tasks(xprt, PTR_ERR(sock));
goto out;
@@ -2262,6 +2267,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
fallthrough;
case -EINPROGRESS:
/* SYN_SENT! */
+ set_bit(XPRT_SOCK_CONNECT_SENT, &transport->sock_state);
if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
fallthrough;
@@ -2323,13 +2329,9 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL && !xprt_connecting(xprt)) {
+ if (transport->sock != NULL) {
dprintk("RPC: xs_connect delayed xprt %p for %lu "
- "seconds\n",
- xprt, xprt->reestablish_timeout / HZ);
-
- /* Start by resetting any existing state */
- xs_reset_transport(transport);
+ "seconds\n", xprt, xprt->reestablish_timeout / HZ);
delay = xprt_reconnect_delay(xprt);
xprt_reconnect_backoff(xprt, XS_TCP_INIT_REEST_TO);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 89f42494f92f448747bd8a7ab1ae8b5d5520577d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Wed, 16 Mar 2022 19:10:43 -0400
Subject: [PATCH] SUNRPC: Don't call connect() more than once on a TCP socket
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz(a)sigma-chemnitz.de>
Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 8c2a712cb242..689062afdd61 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -89,5 +89,6 @@ struct sock_xprt {
#define XPRT_SOCK_WAKE_WRITE (5)
#define XPRT_SOCK_WAKE_PENDING (6)
#define XPRT_SOCK_WAKE_DISCONNECT (7)
+#define XPRT_SOCK_CONNECT_SENT (8)
#endif /* _LINUX_SUNRPC_XPRTSOCK_H */
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7e39f87cde2d..8f8a03c3315a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2235,10 +2235,15 @@ static void xs_tcp_setup_socket(struct work_struct *work)
if (atomic_read(&xprt->swapper))
current->flags |= PF_MEMALLOC;
- if (!sock) {
- sock = xs_create_sock(xprt, transport,
- xs_addr(xprt)->sa_family, SOCK_STREAM,
- IPPROTO_TCP, true);
+
+ if (xprt_connected(xprt))
+ goto out;
+ if (test_and_clear_bit(XPRT_SOCK_CONNECT_SENT,
+ &transport->sock_state) ||
+ !sock) {
+ xs_reset_transport(transport);
+ sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family,
+ SOCK_STREAM, IPPROTO_TCP, true);
if (IS_ERR(sock)) {
xprt_wake_pending_tasks(xprt, PTR_ERR(sock));
goto out;
@@ -2262,6 +2267,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
fallthrough;
case -EINPROGRESS:
/* SYN_SENT! */
+ set_bit(XPRT_SOCK_CONNECT_SENT, &transport->sock_state);
if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
fallthrough;
@@ -2323,13 +2329,9 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL && !xprt_connecting(xprt)) {
+ if (transport->sock != NULL) {
dprintk("RPC: xs_connect delayed xprt %p for %lu "
- "seconds\n",
- xprt, xprt->reestablish_timeout / HZ);
-
- /* Start by resetting any existing state */
- xs_reset_transport(transport);
+ "seconds\n", xprt, xprt->reestablish_timeout / HZ);
delay = xprt_reconnect_delay(xprt);
xprt_reconnect_backoff(xprt, XS_TCP_INIT_REEST_TO);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Tue, 26 Oct 2021 18:01:07 -0400
Subject: [PATCH] SUNRPC: Prevent immediate close+reconnect
If we have already set up the socket and are waiting for it to connect,
then don't immediately close and retry.
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 691fe5a682b6..a02de2bddb28 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -767,7 +767,8 @@ EXPORT_SYMBOL_GPL(xprt_disconnect_done);
*/
static void xprt_schedule_autoclose_locked(struct rpc_xprt *xprt)
{
- set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ if (test_and_set_bit(XPRT_CLOSE_WAIT, &xprt->state))
+ return;
if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
queue_work(xprtiod_workqueue, &xprt->task_cleanup);
else if (xprt->snd_task && !test_bit(XPRT_SND_IS_COOKIE, &xprt->state))
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7fb302e202bc..ae48c9c84ee1 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2314,7 +2314,7 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL) {
+ if (transport->sock != NULL && !xprt_connecting(xprt)) {
dprintk("RPC: xs_connect delayed xprt %p for %lu "
"seconds\n",
xprt, xprt->reestablish_timeout / HZ);