On 1/13/20 11:16 AM, David Sterba wrote:
From: Qu Wenruo wqu@suse.com
This is what I'm going to commit, but as this has a long discussion behind I'm sending it to the mailinglist.
- there are 2 helpers to avoid using raw barriers in the tests where there are at least 2 places
- each barrier is commented
- subject and changelog have been updated to reflect the changes
https://lore.kernel.org/linux-btrfs/20200108051200.8909-1-wqu@suse.com/
[BUG] There are several different KASAN reports for balance + snapshot workloads. Involved call paths include:
should_ignore_root+0x54/0xb0 [btrfs] build_backref_tree+0x11af/0x2280 [btrfs] relocate_tree_blocks+0x391/0xb80 [btrfs] relocate_block_group+0x3e5/0xa00 [btrfs] btrfs_relocate_block_group+0x240/0x4d0 [btrfs] btrfs_relocate_chunk+0x53/0xf0 [btrfs] btrfs_balance+0xc91/0x1840 [btrfs] btrfs_ioctl_balance+0x416/0x4e0 [btrfs] btrfs_ioctl+0x8af/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 create_reloc_root+0x9f/0x460 [btrfs] btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs] create_pending_snapshot+0xa9b/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] create_pending_snapshot+0x209/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10
[CAUSE] All these call sites are only relying on root->reloc_root, which can undergo btrfs_drop_snapshot(), and since we don't have real refcount based protection to reloc roots, we can reach already dropped reloc root, triggering KASAN.
[FIX] To avoid such access to unstable root->reloc_root, we should check BTRFS_ROOT_DEAD_RELOC_TREE bit first.
This patch introduces wrappers that provide the correct way to check the bit with memory barriers protection.
Most callers don't distinguish merged reloc tree and no reloc tree. The only exception is should_ignore_root(), as merged reloc tree can be ignored, while no reloc tree shouldn't.
[CRITICAL SECTION ANALYSIS] Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the DEAD_RELOC_TREE bit has extra help from transaction as a higher level barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are:
NULL: reloc_root is NULL PTR: reloc_root is not NULL 0: DEAD_RELOC_ROOT bit not set DEAD: DEAD_RELOC_ROOT bit set
(NULL, 0) Initial state __ | /\ Section A btrfs_init_reloc_root() / | __ (PTR, 0) reloc_root initialized /\ | | btrfs_update_reloc_root() | Section B | | (PTR, DEAD) reloc_root has been merged / | __ === btrfs_commit_transaction() ==================== | /\ clean_dirty_subvols() | | | Section C (NULL, DEAD) reloc_root cleanup starts / | __ btrfs_drop_snapshot() /\ | | Section D (NULL, 0) Back to initial state /
Every have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller holds transaction handle, so none of such caller can cross transaction boundary.
In Section A, every caller just found no DEAD bit, and grab reloc_root.
In the cross section A-B, caller may get no DEAD bit, but since reloc_root is still completely valid thus accessing reloc_root is completely safe.
No test_bit() caller can cross the boundary of Section B and Section C.
In Section C, every caller found the DEAD bit, so no one will access reloc_root.
In the cross section C-D, either caller gets the DEAD bit set, avoiding access reloc_root no matter if it's safe or not. Or caller get the DEAD bit cleared, then access reloc_root, which is already NULL, nothing will be wrong.
The memory write barriers are between the reloc_root updates and bit set/clear, the pairing read side is before test_bit.
Reported-by: Zygo Blaxell ce3g8jdj@umail.furryterror.org Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.4+ Suggested-by: David Sterba dsterba@suse.com Signed-off-by: Qu Wenruo wqu@suse.com [ barriers ] Signed-off-by: David Sterba dsterba@suse.com
Let's just get this fixed, the root refcounting stuff will solve this so we just need something for now.
Reviewed-by: Josef Bacik josef@toxicpanda.com
Thanks,
Josef