On Wed, Feb 13, 2019 at 02:27:04PM +0800, Qu Wenruo wrote:
From: Josef Bacik josef@toxicpanda.com
Qgroups will do the old roots lookup at delayed ref time, which could be while walking down the extent root while running a delayed ref. This should be fine, except we specifically lock eb's in the backref walking code irrespective of path->skip_locking, which deadlocks the system. Fix up the backref code to honor path->skip_locking, nobody will be modifying the commit_root when we're searching so it's completely safe to do.
This happens Since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled.
There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be reliably reproduced:
btrfs-cleaner D 0 4062 2 0x80000000 Call Trace: schedule+0x32/0x90 btrfs_tree_read_lock+0x93/0x130 [btrfs] find_parent_nodes+0x29b/0x1170 [btrfs] btrfs_find_all_roots_safe+0xa8/0x120 [btrfs] btrfs_find_all_roots+0x57/0x70 [btrfs] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs] btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs] do_walk_down+0x541/0x5e3 [btrfs] walk_down_tree+0xab/0xe7 [btrfs] btrfs_drop_snapshot+0x356/0x71a [btrfs] btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs] cleaner_kthread+0x12b/0x160 [btrfs] kthread+0x112/0x130 ret_from_fork+0x27/0x50
When dropping snapshots with qgroup enabled, we will trigger backref walk.
However such backref walk at that timing is pretty dangerous, as if one of the parent nodes get WRITE locked by other thread, we could cause a dead lock.
For example:
FS 260 FS 261 (Dropped) node A node B / \ / \ node C node D node E / \ / \ / \
leaf F|leaf G|leaf H|leaf I|leaf J|leaf K
The lock sequence would be:
Thread A (cleaner) | Thread B (other writer)
write_lock(B) | write_lock(D) | ^^^ called by walk_down_tree() | | write_lock(A) | write_lock(D) << Stall read_lock(H) << for backref walk | read_lock(D) << lock owner is | the same thread A | so read lock is OK | read_lock(A) << Stall |
So thread A hold write lock D, and needs read lock A to unlock. While thread B holds write lock A, while needs lock D to unlock.
This will cause a deadlock.
This is not only limited to snapshot dropping case. As the backref walk, even only happens on commit trees, is breaking the normal top-down locking order, makes it deadlock prone.
Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") CC: stable@vger.kernel.org # 4.19+ Reported-and-tested-by: David Sterba dsterba@suse.com Reported-by: Filipe Manana fdmanana@suse.com Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com [ copy logs and deadlock analysis from Qu's patch ] [ rebase to latest branch and fix lock assert bug ] Signed-off-by: David Sterba dsterba@suse.com
changelog: v2:
- Rebased to latest misc-next branch.
- Fix a lock assert by moving btrfs_set_lock_blocking_read() into the same branch of btrfs_tree_read_lock().
Thanks for finding the fix and sending on Josef's behalf. If this weren't an important bugfix I'd just wait for him to resend. I've added your signed-off-by as 3/4 of the changelog were copied from your patch anyway.