On 2020/7/26 下午8:51, Qu Wenruo wrote:
On 2020/7/26 下午8:41, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
btrfs: qgroup: fix data leak caused by race between writeback and truncate
to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
Please don't merge this patch for any of the stable branches.
This patch needs one unmerged patch ("btrfs: change timing for qgroup reserved space for ordered extents to fix reserved space leak", already in maintainer's tree) as prerequisite.
Also add btrfs mail list to the discssusion.
The behavior without that patch could be problematic.
I should have noticed this earlier.
Thanks, Qu
From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001 From: Qu Wenruo wqu@suse.com Date: Fri, 17 Jul 2020 15:12:05 +0800 Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate
From: Qu Wenruo wqu@suse.com
commit fa91e4aa1716004ea8096d5185ec0451e206aea0 upstream.
[BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time:
BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked
[CAUSE] In the offending case, the offending operations are: 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
The following sequence of events could happen after the writev(): CPU1 (writeback) | CPU2 (truncate)
btrfs_writepages() | |- extent_write_cache_pages() | |- Got page for 1003520 | | 1003520 is Dirty, no writeback | | So (!clear_page_dirty_for_io()) | | gets called for it | |- Now page 1003520 is Clean. | | | btrfs_setattr() | | |- btrfs_setsize() | | |- truncate_setsize() | | New i_size is 18388 |- __extent_writepage() | | |- page_offset() > i_size | |- btrfs_invalidatepage() | |- Page is clean, so no qgroup | callback executed
This means, the qgroup reserved data space is not properly released in btrfs_invalidatepage() as the page is Clean.
[FIX] Instead of checking the dirty bit of a page, call btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
As qgroup rsv are completely bound to the QGROUP_RESERVED bit of io_tree, not bound to page status, thus we won't cause double freeing anyway.
Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
fs/btrfs/inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9197,20 +9197,17 @@ again: /* * Qgroup reserved space handler * Page here will be either
* 1) Already written to disk
* In this case, its reserved space is released from data rsv map
* and will be freed by delayed_ref handler finally.
* So even we call qgroup_free_data(), it won't decrease reserved
* space.
* 2) Not written to disk
* This means the reserved space should be freed here. However,
* if a truncate invalidates the page (by clearing PageDirty)
* and the page is accounted for while allocating extent
* in btrfs_check_data_free_space() we let delayed_ref to
* free the entire extent.
* 1) Already written to disk or ordered extent already submitted
* Then its QGROUP_RESERVED bit in io_tree is already cleaned.
* Qgroup will be handled by its qgroup_record then.
* btrfs_qgroup_free_data() call will do nothing here.
*
* 2) Not written to disk yet
* Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED
* bit of its io_tree, and free the qgroup reserved data space.
*/* Since the IO will never happen for this page.
- if (PageDirty(page))
btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
- btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); if (!inode_evicting) { clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DIRTY |
Patches currently in stable-queue which might be from wqu@suse.com are
queue-4.14/btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch