On Sun, Dec 22, 2024 at 5:25 PM Rick Macklem rick.macklem@gmail.com wrote:
On Sun, Dec 22, 2024 at 4:10 PM cel@kernel.org wrote:
From: Chuck Lever chuck.lever@oracle.com
J. David reports an odd corruption of a READDIR reply sent to a FreeBSD client.
xdr_reserve_space() has to do a special trick when the @nbytes value requests more space than there is in the current page of the XDR buffer.
In that case, xdr_reserve_space() returns a pointer to the start of the next page, and then the next call to xdr_reserve_space() invokes __xdr_commit_encode() to copy enough of the data item back into the previous page to make that data item contiguous across the page boundary.
But we need to be careful in the case where buffer space is reserved early for a data item that will be inserted into the buffer later.
One such caller, nfsd4_encode_operation(), reserves 8 bytes in the encoding buffer for each COMPOUND operation. However, a READDIR result can sometimes encode file names so that there are only 4 bytes left at the end of the current XDR buffer page (though plenty of pages are left to handle the remaining encoding tasks).
If a COMPOUND operation follows the READDIR result (say, a GETATTR), then nfsd4_encode_operation() will reserve 8 bytes for the op number (9) and the op status (usually NFS4_OK). In this weird case, xdr_reserve_space() returns a pointer to byte zero of the next buffer page, as it assumes the data item will be copied back into place (in the previous page) on the next call to xdr_reserve_space().
nfsd4_encode_operation() writes the op num into the buffer, then saves the next 4-byte location for the op's status code. The next xdr_reserve_space() call is part of GETATTR encoding, so the op num gets copied back into the previous page, but the saved location for the op status continues to point to the wrong spot in the current XDR buffer page because __xdr_commit_encode() moved that data item.
After GETATTR encoding is complete, nfsd4_encode_operation() writes the op status over the first XDR data item in the GETATTR result. The NFS4_OK status code (0) makes it look like there are zero items in the GETATTR's attribute bitmask.
I can confirm that this patch fixes the test case I have, which is based on J. David's reproducer.
I also think the analysis sounds right. I had gotten to the point where I thought it was some oddball case related to xdr_reserve_space() and saw that the pointer used by GETATTR's nfsd4_encode_bitmap4() was 4 bytes into a page, for the broken case. (As an aside, it appears that "%p" is broken for 32bit arches. I needed to use "%x" with a (unsigned int) cast to printk() pointers. I doubt anyone much cares about 32bits any more, but I might look to see if I can fix it.)
Good work Chuck!
The patch description of commit 2825a7f90753 ("nfsd4: allow encoding across page boundaries") [2014] remarks that NFSD "can't handle a new operation starting close to the end of a page." This behavior appears to be one reason for that remark.
Break up the reservation of the COMPOUND op num and op status data items into two distinct 4-octet reservations. Thanks to XDR data item alignment restrictions, a 4-octet buffer reservation can never straddle a page boundary.
I don't know enough about the code to comment w.r.t. whether or not this is a correct fix, although it seems to work for the test case.
I will note that it is a pretty subtle trap and it would be nice if something could be done to avoid this coding mistake from happening again. All I can think of is defining a new function that must be used instead of xdr_reserve_space() if there will be other encoding calls done before the pointer is used. Something like xdr_reserve_u32_long_term(). (Chuck is much better at naming stuff than I am.) --> If your fix is correct, in general, this function would just call xdr_reserve_space(xdr, XDR_UNIT), but it could also set a flag in the xdr structure so that it can only be done once until the flag is cleared (by a function call when the code is done with the pointer), or something like that?
I take back this latter bit w.r.t. a flag. If the function only allocates 4bytes, it will never straddle pages and cause problems.
My suggestion would just clarify that "long term" pointers can only be used for a 4 byte (XDR_UNIT) allocation.
rick
Anyhow, others probably can suggest better changes that would avoid this trap?
Good work, rick
Reported-by: J David j.david.lists@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com
fs/nfsd/nfs4xdr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 53fac037611c..8780da884197 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5764,10 +5764,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) nfsd4_enc encoder; __be32 *p;
p = xdr_reserve_space(xdr, 8);
if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
goto release;
p = xdr_reserve_space(xdr, XDR_UNIT); if (!p) goto release;
*p++ = cpu_to_be32(op->opnum); post_err_offset = xdr->buf->len; if (op->opnum == OP_ILLEGAL)
-- 2.47.0