From: Sergey Bashirov sergeybashirov@gmail.com
[ Upstream commit d897d81671bc4615c80f4f3bd5e6b218f59df50c ]
When there are too many block extents for a layoutcommit, they may not all fit into the maximum-sized RPC. This patch allows the generic pnfs code to properly handle -ENOSPC returned by the block/scsi layout driver and trigger additional layoutcommits if necessary.
Co-developed-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Sergey Bashirov sergeybashirov@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20250630183537.196479-5-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Fix Nature
This commit fixes a clear bug in the pNFS (parallel NFS) layoutcommit handling where the code didn't properly handle the `-ENOSPC` error returned when there are too many block extents to fit in a single RPC (Remote Procedure Call). Looking at the code changes:
1. **The Problem**: In `ext_tree_encode_commit()` (fs/nfs/blocklayout/extent_tree.c:536-539), when there are too many extents, the function returns `-ENOSPC` but continues counting. The original `pnfs_layoutcommit_inode()` function didn't handle this error properly.
2. **The Fix**: The patch adds specific handling for the `-ENOSPC` error case: - Line 3394 → 3396: Changed to only `put_cred()` if status is NOT `-ENOSPC` - Line 3399 → 3401-3403: Added special handling to check for `-ENOSPC` and set `mark_as_dirty = true` - Line 3409: Changed condition to mark inode dirty if either `status` is set OR `mark_as_dirty` is true
## Impact Analysis
1. **User-Visible Bug**: Without this fix, when users have workloads that generate many block extents (common in database or large file operations), layoutcommits would fail silently, potentially leading to data inconsistency or loss.
2. **Contained Fix**: The changes are minimal and localized to the layoutcommit path: - Only affects the error handling path - Doesn't change the normal operation flow - Doesn't introduce new features or APIs
3. **Clear Regression Risk Assessment**: - Low risk - the patch only adds proper error handling for a specific error code - Doesn't change fundamental data structures or algorithms - The `-ENOSPC` handling triggers additional layoutcommits as needed, which is the correct behavior
## Stable Criteria Compliance
The commit meets stable kernel criteria: - **Fixes a real bug**: Handles RPC size limit overflow that can occur in production - **Small and targeted**: Only ~15 lines of actual logic changes - **No new features**: Pure bug fix, no feature additions - **Tested**: Has review from Christoph Hellwig (a well-known filesystem maintainer) - **Important for users**: Prevents potential data consistency issues in pNFS deployments
## Code Analysis Details
The specific code flow shows: 1. `ext_tree_encode_commit()` returns `-ENOSPC` when buffer is too small (line 538 in extent_tree.c) 2. The old code would incorrectly release credentials and fail the entire operation 3. The new code: - Preserves the credentials when `-ENOSPC` occurs - Sets the inode as dirty to trigger another layoutcommit attempt - Allows the operation to be retried with proper handling
This is a classic case of missing error handling that should be backported to ensure data integrity in stable kernels running pNFS workloads.
fs/nfs/pnfs.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 1a7ec68bde15..3fd0971bf16f 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -3340,6 +3340,7 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync) struct nfs_inode *nfsi = NFS_I(inode); loff_t end_pos; int status; + bool mark_as_dirty = false;
if (!pnfs_layoutcommit_outstanding(inode)) return 0; @@ -3391,19 +3392,23 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync) if (ld->prepare_layoutcommit) { status = ld->prepare_layoutcommit(&data->args); if (status) { - put_cred(data->cred); + if (status != -ENOSPC) + put_cred(data->cred); spin_lock(&inode->i_lock); set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags); if (end_pos > nfsi->layout->plh_lwb) nfsi->layout->plh_lwb = end_pos; - goto out_unlock; + if (status != -ENOSPC) + goto out_unlock; + spin_unlock(&inode->i_lock); + mark_as_dirty = true; } }
status = nfs4_proc_layoutcommit(data, sync); out: - if (status) + if (status || mark_as_dirty) mark_inode_dirty_sync(inode); dprintk("<-- %s status %d\n", __func__, status); return status;