From: Nikhil Jha njha@janestreet.com
[ Upstream commit fadc0f3bb2de8c570ced6d9c1f97222213d93140 ]
RFC2203 requires that retransmitted messages use a new gss sequence number, but the same XID. This means that if the server is just slow (e.x. overloaded), the client might receive a response using an older seqno than the one it has recorded.
Currently, Linux's client immediately retransmits in this case. However, this leads to a lot of wasted retransmits until the server eventually responds faster than the client can resend.
Client -> SEQ 1 -> Server Client -> SEQ 2 -> Server Client <- SEQ 1 <- Server (misses, expecting seqno = 2) Client -> SEQ 3 -> Server (immediate retransmission on miss) Client <- SEQ 2 <- Server (misses, expecting seqno = 3) Client -> SEQ 4 -> Server (immediate retransmission on miss) ... and so on ...
This commit makes it so that we ignore messages with bad checksums due to seqnum mismatch, and rely on the usual timeout behavior for retransmission instead of doing so immediately.
Signed-off-by: Nikhil Jha njha@janestreet.com Acked-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker anna.schumaker@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my analysis of the commit and the similar backported commits as reference, here is my assessment:
**YES**
This commit should be backported to stable kernel trees. Here's my detailed analysis:
## Key Analysis Points:
### 1. **Bug Fix Nature**: This commit addresses a clear performance bug in SUNRPC's RPCSEC_GSS sequence number handling. The current code immediately retransmits when it receives a response with an older sequence number than expected, leading to excessive and wasteful retransmissions when the server is simply slow.
### 2. **Alignment with RFC2203**: The commit correctly implements RFC2203 requirements, which state that retransmitted messages should use new GSS sequence numbers but keep the same XID. The current behavior creates a retransmission storm that violates the spirit of this requirement.
### 3. **Code Changes Analysis**: The modification in `rpc_decode_header()` at the `out_verifier` label is minimal and targeted:
**Before** (lines 2773-2776): ```c case -EACCES: /* Re-encode with a fresh cred */ fallthrough; default: goto out_garbage; ```
**After**: ```c case -EACCES: /* possible RPCSEC_GSS out-of-sequence event (RFC2203), - reset recv state and keep waiting, don't retransmit */ task->tk_rqstp->rq_reply_bytes_recvd = 0; task->tk_status = xprt_request_enqueue_receive(task); task->tk_action = call_transmit_status; return -EBADMSG; default: goto out_garbage; ```
### 4. **Risk Assessment - LOW**: - **Scope**: Changes are confined to a specific error handling path in SUNRPC client code - **Behavioral change**: Instead of immediate retransmission, it now waits for normal timeout-based retransmission - **Fallback**: The normal timeout mechanism still provides retransmission if needed - **Error handling**: Uses existing infrastructure (`xprt_request_enqueue_receive`, `call_transmit_status`)
### 5. **Comparison with Similar Backported Commits**: This commit shares characteristics with successful backports: - **Similar Commit #1**: Also fixed SUNRPC verifier handling by changing `goto out_err` to `goto out_garbage` - **Similar Commit #2**: Fixed incomplete RPC message handling with proper state management - **Similar Commit #5**: Fixed RPCSEC_GSS sequence number limit handling
All these were backported because they fixed protocol compliance issues with minimal risk.
### 6. **User Impact**: - **Performance improvement**: Significantly reduces unnecessary retransmissions under server load - **Network efficiency**: Reduces wasted bandwidth and server load - **No breaking changes**: Maintains compatibility while improving behavior
### 7. **Stability Criteria Compliance**: ✅ **Important bugfix**: Fixes performance degradation and protocol compliance ✅ **Minimal risk**: Small, localized change to error handling path ✅ **No new features**: Pure bugfix, no architectural changes ✅ **Subsystem-contained**: Changes only affect SUNRPC client verifier handling ✅ **Clear benefit**: Reduces retransmission storms in production environments
The commit follows the stable tree rules perfectly: it's a targeted fix for a real-world performance problem with minimal risk of regression, affecting only the specific error case of RPCSEC_GSS sequence number mismatches.
net/sunrpc/clnt.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 6f75862d97820..21426c3049d35 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2771,8 +2771,13 @@ rpc_decode_header(struct rpc_task *task, struct xdr_stream *xdr) case -EPROTONOSUPPORT: goto out_err; case -EACCES: - /* Re-encode with a fresh cred */ - fallthrough; + /* possible RPCSEC_GSS out-of-sequence event (RFC2203), + * reset recv state and keep waiting, don't retransmit + */ + task->tk_rqstp->rq_reply_bytes_recvd = 0; + task->tk_status = xprt_request_enqueue_receive(task); + task->tk_action = call_transmit_status; + return -EBADMSG; default: goto out_garbage; }