4.20-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit deaa5c96c2f7e8b934088a1e70a0fe8797bd1149 ]
When using Kerberos with v4.20, I've observed frequent connection loss on heavy workloads. I traced it down to the client underrunning the GSS sequence number window -- NFS servers are required to drop the RPC with the low sequence number, and also drop the connection to signal that an RPC was dropped.
Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for interactive tasks").
I've got a one-line workaround for this issue, which is easy to backport to v4.20 while a more permanent solution is being derived. Essentially, tk_owner-based sorting is disabled for RPCs that carry a GSS sequence number.
Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1177,7 +1177,7 @@ xprt_request_enqueue_transmit(struct rpc INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else { + } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) continue;