From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit f22a0ef2231a7d8374bb021eb86404d0e9de5a02 ]
The EXT4_IOC_GET_ES_CACHE and EXT4_IOC_PRECACHE_EXTENTS currently invokes ext4_ext_precache() to preload the extent cache without holding the inode's i_rwsem. This can result in stale extent cache entries when competing with operations such as ext4_collapse_range() which calls ext4_ext_remove_space() or ext4_ext_shift_extents().
The problem arises when ext4_ext_remove_space() temporarily releases i_data_sem due to insufficient journal credits. During this interval, a concurrent EXT4_IOC_GET_ES_CACHE or EXT4_IOC_PRECACHE_EXTENTS may cache extent entries that are about to be deleted. As a result, these cached entries become stale and inconsistent with the actual extents.
Loading the extents cache without holding the inode's i_rwsem or the mapping's invalidate_lock is not permitted besides during the writeback. Fix this by holding the i_rwsem during EXT4_IOC_GET_ES_CACHE and EXT4_IOC_PRECACHE_EXTENTS.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Link: https://patch.msgid.link/20250423085257.122685-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my comprehensive analysis of this commit and comparison with the historical similar commits, here is my assessment: **YES** This commit should be backported to stable kernel trees for the following reasons: ## Analysis Summary ### **Type of Bug Fix** This commit addresses a **race condition** that can lead to **stale extent cache entries** in the ext4 filesystem. The race occurs between: 1. `EXT4_IOC_GET_ES_CACHE` and `EXT4_IOC_PRECACHE_EXTENTS` operations 2. Concurrent `ext4_collapse_range()` operations that call `ext4_ext_remove_space()` or `ext4_ext_shift_extents()` ### **Technical Details of the Race** The problem stems from the fact that `ext4_ext_remove_space()` temporarily releases `i_data_sem` when journal credits are insufficient. During this critical window: - The extent precaching operations (`ext4_ext_precache()`) can cache extent entries - These cached entries are about to be deleted by the ongoing collapse/shift operation - Result: **stale, inconsistent extent cache entries** ### **Code Changes Analysis** The fix is **minimal and surgical**: **In `fs/ext4/extents.c` (ext4_get_es_cache function):** ```c if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) { + inode_lock_shared(inode); error = ext4_ext_precache(inode); + inode_unlock_shared(inode); if (error) return error; ``` **In `fs/ext4/ioctl.c` (EXT4_IOC_PRECACHE_EXTENTS case):** ```c case EXT4_IOC_PRECACHE_EXTENTS: - return ext4_ext_precache(inode); +{ + int ret; + + inode_lock_shared(inode); + ret = ext4_ext_precache(inode); + inode_unlock_shared(inode); + return ret; +} ``` ### **Why This Qualifies for Backporting** 1. **Important Bug Fix**: Stale extent cache entries can lead to filesystem inconsistencies and potential data corruption scenarios. 2. **Minimal Risk**: The changes are extremely small and contained: - Only adds proper locking around existing operations - Uses shared locking (`inode_lock_shared`) which allows concurrent readers - No algorithmic changes or new features 3. **Follows Established Patterns**: Similar commits in the repository (marked with "Backport Status: YES") show this pattern: - Similar commit #2: "ext4: fix data races when using cached status extents" - also deals with extent cache consistency - Similar commit #3: "ext4: convert to exclusive lock while inserting delalloc extents" - addresses similar race conditions with proper locking - Similar commit #4: "ext4: do not polute the extents cache while shifting extents" - prevents extent cache corruption during shift operations 4. **Clear Impact**: The commit message explicitly states this prevents "stale extent cache entries" which can cause inconsistency with actual extents. 5. **Contained to Subsystem**: Changes are limited to ext4 extent management and don't affect other kernel subsystems. 6. **Stable Backport Indicators**: - No "Cc: stable@" tag is present, but the nature of the fix (race condition leading to corruption) makes it a good candidate - The fix follows ext4's established locking protocol requirements - Multiple similar extent cache race condition fixes have been backported ### **Risk Assessment** - **Very Low Regression Risk**: Only adds locking protection, doesn't change functional behavior - **Well-understood Fix**: The locking pattern matches established ext4 conventions - **Limited Scope**: Only affects two specific ioctl code paths This commit represents exactly the type of fix that stable trees are meant for: a small, safe fix for a real bug that can cause filesystem corruption without introducing new functionality or architectural changes.
fs/ext4/extents.c | 2 ++ fs/ext4/ioctl.c | 8 +++++++- 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index b9bbe4dbedf4f..4c55789d1a558 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5045,7 +5045,9 @@ int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo, }
if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) { + inode_lock_shared(inode); error = ext4_ext_precache(inode); + inode_unlock_shared(inode); if (error) return error; fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE; diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 18002b0a908ce..bd90b454c6213 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -1130,8 +1130,14 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return 0; } case EXT4_IOC_PRECACHE_EXTENTS: - return ext4_ext_precache(inode); + { + int ret;
+ inode_lock_shared(inode); + ret = ext4_ext_precache(inode); + inode_unlock_shared(inode); + return ret; + } case FS_IOC_SET_ENCRYPTION_POLICY: if (!ext4_has_feature_encrypt(sb)) return -EOPNOTSUPP;