From: Naohiro Aota naohiro.aota@wdc.com
[ Upstream commit 0d703963d297964451783e1a0688ebdf74cd6151 ]
The hint block group selection in the extent allocator is wrong in the first place, as it can select the dedicated data relocation block group for the normal data allocation.
Since we separated the normal data space_info and the data relocation space_info, we can easily identify a block group is for data relocation or not. Do not choose it for the normal data allocation.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my comprehensive analysis, here is my assessment:
## Backport Status: YES
### Summary of Analysis
**What the commit fixes:** The commit fixes incorrect extent allocator hint selection in btrfs zoned mode where the allocator could incorrectly select the dedicated data relocation block group for normal data allocation. This violates the design principle that relocation block groups should only be used for data relocation operations (fs/btrfs/extent-tree.c:4322).
**Code change analysis:** 1. Adds `space_info` parameter to `prepare_allocation_zoned()` function (lines 4299-4301) 2. Adds a critical check: `block_group->space_info == space_info` before selecting a block group as the hint (line 4323) 3. Passes the `space_info` parameter through the call chain (line 4344)
The change is minimal (4 insertions, 2 deletions) and surgically targeted.
**Why this is a bug affecting users:**
After commit f92ee31e031c7 (v6.16-rc1, May 2025) introduced sub- space_info separation, btrfs zoned mode maintains separate space_info structures for: - Normal data block groups - Data relocation block groups
Without this fix, `prepare_allocation_zoned()` only checks `block_group_bits(block_group, ffe_ctl->flags)` which verifies the block group is DATA type, but doesn't distinguish between normal data and relocation data since both have the same flags. This can cause:
1. **Incorrect space accounting**: Normal allocations appear to have free space when only relocation space is available 2. **ENOSPC errors**: As noted in commit f92ee31e031c7, users could experience "strange ENOSPC" errors 3. **Write pointer violations**: On zoned devices, mixing relocation extents and regular extents in the same zone can cause WRITE and ZONE APPEND commands to be dispatched simultaneously, breaking the write pointer (see commit 7b2d588572e75)
**Risk assessment:** - **Very low risk**: The change is confined to btrfs zoned extent allocator hint selection - **Clear correctness**: The fix ensures block group selection respects space_info boundaries - **Code quality**: Reviewed by Johannes Thumshirn, an expert in btrfs zoned support
**Dependencies:** - **Hard dependency**: Requires sub-space_info infrastructure from commit f92ee31e031c7 (merged in v6.16-rc1) - **Cannot be backported to kernels < v6.16**
**Why it meets stable tree criteria:** 1. ✅ Fixes important bug affecting btrfs zoned mode users 2. ✅ Small, contained change with minimal regression risk 3. ✅ No architectural changes 4. ✅ Clear side effects are intentional (preventing wrong block group selection) 5. ✅ Fixes correctness issue introduced by the v6.16 space_info separation
**Recommendation:** This commit should be backported to v6.16+ stable trees to ensure correct extent allocator behavior for btrfs zoned mode users.
fs/btrfs/extent-tree.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 97d517cdf2df7..682d21a73a67a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4297,7 +4297,8 @@ static int prepare_allocation_clustered(struct btrfs_fs_info *fs_info, }
static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info, - struct find_free_extent_ctl *ffe_ctl) + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_space_info *space_info) { if (ffe_ctl->for_treelog) { spin_lock(&fs_info->treelog_bg_lock); @@ -4321,6 +4322,7 @@ static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info, u64 avail = block_group->zone_capacity - block_group->alloc_offset;
if (block_group_bits(block_group, ffe_ctl->flags) && + block_group->space_info == space_info && avail >= ffe_ctl->num_bytes) { ffe_ctl->hint_byte = block_group->start; break; @@ -4342,7 +4344,7 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - return prepare_allocation_zoned(fs_info, ffe_ctl); + return prepare_allocation_zoned(fs_info, ffe_ctl, space_info); default: BUG(); }