On 2020/7/26 下午8:41, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
btrfs: qgroup: fix data leak caused by race between writeback and truncate
to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
Please don't merge this patch for any of the stable branches.
This patch needs one unmerged patch ("btrfs: change timing for qgroup reserved space for ordered extents to fix reserved space leak", already in maintainer's tree) as prerequisite.
The behavior without that patch could be problematic.
I should have noticed this earlier.
Thanks, Qu
From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001 From: Qu Wenruo wqu@suse.com Date: Fri, 17 Jul 2020 15:12:05 +0800 Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate
From: Qu Wenruo wqu@suse.com
commit fa91e4aa1716004ea8096d5185ec0451e206aea0 upstream.
[BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time:
BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked
[CAUSE] In the offending case, the offending operations are: 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
The following sequence of events could happen after the writev(): CPU1 (writeback) | CPU2 (truncate)
btrfs_writepages() | |- extent_write_cache_pages() | |- Got page for 1003520 | | 1003520 is Dirty, no writeback | | So (!clear_page_dirty_for_io()) | | gets called for it | |- Now page 1003520 is Clean. | | | btrfs_setattr() | | |- btrfs_setsize() | | |- truncate_setsize() | | New i_size is 18388 |- __extent_writepage() | | |- page_offset() > i_size | |- btrfs_invalidatepage() | |- Page is clean, so no qgroup | callback executed
This means, the qgroup reserved data space is not properly released in btrfs_invalidatepage() as the page is Clean.
[FIX] Instead of checking the dirty bit of a page, call btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
As qgroup rsv are completely bound to the QGROUP_RESERVED bit of io_tree, not bound to page status, thus we won't cause double freeing anyway.
Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
fs/btrfs/inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9197,20 +9197,17 @@ again: /* * Qgroup reserved space handler * Page here will be either
* 1) Already written to disk
* In this case, its reserved space is released from data rsv map
* and will be freed by delayed_ref handler finally.
* So even we call qgroup_free_data(), it won't decrease reserved
* space.
* 2) Not written to disk
* This means the reserved space should be freed here. However,
* if a truncate invalidates the page (by clearing PageDirty)
* and the page is accounted for while allocating extent
* in btrfs_check_data_free_space() we let delayed_ref to
* free the entire extent.
* 1) Already written to disk or ordered extent already submitted
* Then its QGROUP_RESERVED bit in io_tree is already cleaned.
* Qgroup will be handled by its qgroup_record then.
* btrfs_qgroup_free_data() call will do nothing here.
*
* 2) Not written to disk yet
* Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED
* bit of its io_tree, and free the qgroup reserved data space.
*/* Since the IO will never happen for this page.
- if (PageDirty(page))
btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
- btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); if (!inode_evicting) { clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DIRTY |
Patches currently in stable-queue which might be from wqu@suse.com are
queue-4.14/btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch