In v5.16 btrfs had a big rework (originally considered as a refactor only) on defrag, which brings quite some regressions.
With those regressions, extra scrutiny is brought to defrag code, and some existing bugs are exposed.
For v5.15, there are those patches need to be backported: (Tracked through github issue: https://github.com/btrfs/linux/issues/422)
- Already upstreamed: "btrfs: don't hold CPU for too long when defragging a file" The first patch.
- In maintainer's tree btrfs: defrag: don't try to merge regular extents with preallocated extents btrfs: defrag: don't defrag extents which are already at max capacity btrfs: defrag: remove an ambiguous condition for rejection
Those will be backported when they get merged upstream.
- One special fix for v5.15 The 2nd patch.
The problem is, that patch has no upstream commit, since v5.16 reworked the defrag code and fix the problem unintentionally. It's not a good idea to bring the whole rework (along with its bugs) to stable kernel.
So here I crafted a small fix for v5.15 only.
Not sure if this is conflicted with any stable policy.
Qu Wenruo (2): btrfs: don't hold CPU for too long when defragging a file btrfs: defrag: use the same cluster size for defrag ioctl and autodefrag
fs/btrfs/ioctl.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b upstream.
There is a user report about "btrfs filesystem defrag" causing 120s timeout problem.
For btrfs_defrag_file() it will iterate all file extents if called from defrag ioctl, thus it can take a long time.
There is no reason not to release the CPU during such a long operation.
Add cond_resched() after defragged one cluster.
CC: stable@vger.kernel.org # 5.15 Link: https://lore.kernel.org/linux-btrfs/10e51417-2203-f0a4-2021-86c8511cc367@gmx... Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com --- fs/btrfs/ioctl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6a863b3f6de0..38a1b68c7851 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1581,6 +1581,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, last_len = 0; } } + cond_resched(); }
ret = defrag_count;
On Wed, Feb 16, 2022 at 03:09:07PM +0800, Qu Wenruo wrote:
commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b upstream.
This commit is already in 5.15.22.
There is a user report about "btrfs filesystem defrag" causing 120s timeout problem.
For btrfs_defrag_file() it will iterate all file extents if called from defrag ioctl, thus it can take a long time.
There is no reason not to release the CPU during such a long operation.
Add cond_resched() after defragged one cluster.
CC: stable@vger.kernel.org # 5.15 Link: https://lore.kernel.org/linux-btrfs/10e51417-2203-f0a4-2021-86c8511cc367@gmx... Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com
fs/btrfs/ioctl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6a863b3f6de0..38a1b68c7851 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1581,6 +1581,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, last_len = 0; } }
}cond_resched();
ret = defrag_count; -- 2.35.1
The original commit looks nothing like this commit at all. Are you sure you got this correct?
confused,
greg k-h
On 2022-02-17 20:01, Greg KH wrote:
On Wed, Feb 16, 2022 at 03:09:07PM +0800, Qu Wenruo wrote:
commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b upstream.
This commit is already in 5.15.22.
It most certainly is not, since it applies cleanly in 5.15.24. The correct upstream commit of this patch is ea0eba69a2a8125229b1b6011644598039bc53aa
cheers, Holger
There is a user report about "btrfs filesystem defrag" causing 120s timeout problem.
For btrfs_defrag_file() it will iterate all file extents if called from defrag ioctl, thus it can take a long time.
There is no reason not to release the CPU during such a long operation.
Add cond_resched() after defragged one cluster.
CC: stable@vger.kernel.org # 5.15 Link: https://lore.kernel.org/linux-btrfs/10e51417-2203-f0a4-2021-86c8511cc367@gmx... Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com
fs/btrfs/ioctl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6a863b3f6de0..38a1b68c7851 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1581,6 +1581,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, last_len = 0; } }
}cond_resched();
ret = defrag_count; -- 2.35.1
The original commit looks nothing like this commit at all. Are you sure you got this correct?
confused,
greg k-h
On Thu, Feb 17, 2022 at 08:41:51PM +0100, Holger Hoffstätte wrote:
On 2022-02-17 20:01, Greg KH wrote:
On Wed, Feb 16, 2022 at 03:09:07PM +0800, Qu Wenruo wrote:
commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b upstream.
This commit is already in 5.15.22.
It most certainly is not
Commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b is in 5.15.22.
since it applies cleanly in 5.15.24. The correct upstream commit of this patch is ea0eba69a2a8125229b1b6011644598039bc53aa
Ah, no wonder the confusion. For obvious reasons, I can not take this as-is :)
thanks,
greg k-h
On 2022/2/18 03:48, Greg KH wrote:
On Thu, Feb 17, 2022 at 08:41:51PM +0100, Holger Hoffstätte wrote:
On 2022-02-17 20:01, Greg KH wrote:
On Wed, Feb 16, 2022 at 03:09:07PM +0800, Qu Wenruo wrote:
commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b upstream.
This commit is already in 5.15.22.
It most certainly is not
Commit 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b is in 5.15.22.
since it applies cleanly in 5.15.24. The correct upstream commit of this patch is ea0eba69a2a8125229b1b6011644598039bc53aa
Ah, no wonder the confusion. For obvious reasons, I can not take this as-is :)
My bad, wrong commit hash...
thanks,
greg k-h
No upstream commit. Since the bug only exists between v5.11 and v5.15. In v5.16 btrfs reworked defrag and no longer has this bug.
[BUG] Since commit 7f458a3873ae ("btrfs: fix race when defragmenting leads to unnecessary IO") autodefrag no longer works with the following script:
mkfs.btrfs -f $dev mount $dev $mnt -o datacow,autodefrag
# Create a layout where we have fragmented extents at [0, 64k) (sync write in # reserve order), then a hole at [64k, 128k) xfs_io -f -s -c "pwrite 48k 16k" -c "pwrite 32k 16k" \ -c "pwrite 16k 16k" -c "pwrite 0 16k" \ $mnt/foobar truncate -s 128k $mnt/foobar
echo "=== File extent layout before autodefrag ===" xfs_io -c "fiemap -v" "$mnt/foobar"
# Now trigger autodefrag, autodefrag is triggered in the cleaner thread, # which will be woken up by commit thread mount -o remount,commit=1 $mnt sleep 3 sync
echo "=== File extent layout after autodefrag ===" xfs_io -c "fiemap -v" "$mnt/foobar"
The file "foobar" will not be autodefraged even it should.
[CAUSE] Commit 7f458a3873ae ("btrfs: fix race when defragmenting leads to unnecessary IO") fixes the race by rejecting the cluster if there is any hole in the cluster.
But unlike regular defrag ioctl, autodefrag ignores the @defrag_end parameter, and always uses a fixed cluster size 256K. While defrag ioctl uses @defrag_end to skip existing holes.
This hidden autodefrag only behavior prevents autodefrag from working in above script.
[FIX] Remove the special cluster size, and unify the behavior for both autodefrag and defrag ioctl.
This fix is only needed for v5.15 (and maybe v5.10) stable branch, as in v5.16 the whole defrag get reworked in btrfs, which at least solves this particular bug. (although introduced quite some other regressions)
CC: stable@vger.kernel.org # 5.15 Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/ioctl.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 38a1b68c7851..61f6e77698a2 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1521,13 +1521,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, continue; }
- if (!newer_than) { - cluster = (PAGE_ALIGN(defrag_end) >> - PAGE_SHIFT) - i; - cluster = min(cluster, max_cluster); - } else { - cluster = max_cluster; - } + cluster = (PAGE_ALIGN(defrag_end) >> PAGE_SHIFT) - i; + cluster = min(cluster, max_cluster);
if (i + cluster > ra_index) { ra_index = max(i, ra_index);
On Wed, Feb 16, 2022 at 03:09:08PM +0800, Qu Wenruo wrote:
No upstream commit. Since the bug only exists between v5.11 and v5.15. In v5.16 btrfs reworked defrag and no longer has this bug.
I'm not sure this will work as a stable patch. A backport of an existing upstream patch that is only adapted to older stable code base is fine but what is the counterpart of this patch?
[BUG] Since commit 7f458a3873ae ("btrfs: fix race when defragmenting leads to unnecessary IO") autodefrag no longer works with the following script:
The bug does no seem to be significant, autodefrag is basically a heuristic so if it does not work perfectly in all cases it's still OK.
On 2022/2/16 20:37, David Sterba wrote:
On Wed, Feb 16, 2022 at 03:09:08PM +0800, Qu Wenruo wrote:
No upstream commit. Since the bug only exists between v5.11 and v5.15. In v5.16 btrfs reworked defrag and no longer has this bug.
I'm not sure this will work as a stable patch. A backport of an existing upstream patch that is only adapted to older stable code base is fine but what is the counterpart of this patch?
The whole ill-fated rework on defrag.
[BUG] Since commit 7f458a3873ae ("btrfs: fix race when defragmenting leads to unnecessary IO") autodefrag no longer works with the following script:
The bug does no seem to be significant, autodefrag is basically a heuristic so if it does not work perfectly in all cases it's still OK.
Normally I'd say yes.
But I don't want to surprise end users by suddenly increase their IO for autodefrag in the next LTS.
This bug is really setting a high bar (or low IO expectation) for end users.
And another thing is, I can definitely create a local branch with this fixed to test against fixed autodefrag code, but that won't make much sense.
Thus getting this merged could provide a more realistic baseline for autodefrag.
Finally, one lesssen I learnt from the defrag thing is, if we allow some untested/undefined corner cases, it will bite us eventually.
So I really want autodefrag to behave just like ioctl defrag, with a pre-defined and predictable (at least not under races) behavior.
Thanks, Qu
linux-stable-mirror@lists.linaro.org