[BUG] When running generic/475 with 64K page size and 4K sector size, it has a very high chance (almost 100%) to hang, with mostly data page locked but no one is going to unlock it.
[CAUSE] With commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on reads"), if we failed to lookup checksum due to metadata IO error, we will return error for btrfs_submit_data_bio().
This will cause the page to be unlocked twice in btrfs_do_readpage():
btrfs_do_readpage() |- submit_extent_page() | |- submit_one_bio() | |- btrfs_submit_data_bio() | |- if (ret) { | |- bio->bi_status = ret; | |- bio_endio(bio); } | In the endio function, we will call end_page_read() | and unlock_extent() to cleanup the subpage range. | |- if (ret) { |- unlock_extent(); end_page_read() } Here we unlock the extent and cleanup the subpage range again.
For unlock_extent(), it's mostly double unlock safe.
But for end_page_read(), it's not, especially for subpage case, as for subpage case we will call btrfs_subpage_end_reader() to reduce the reader number, and use that to number to determine if we need to unlock the full page.
If double accounted, it can underflow the number and leave the page locked without anyone to unlock it.
[FIX] The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on reads") itself is completely fine, it's our existing code not properly handling the error from bio submission hook properly.
Since one and only one of the endio function or the submit_extent_page() caller should do the cleanup, let's just ignore the return value from bio submission hook.
Before the hook, it's callers' responsibility to do cleanup, after the hook (including inside the hook), it's the endio doing the cleanup.
This patch is makes submit_one_bio() always return 0, but still keep the old int return value to make minimal change for backport.
CC: stable@vger.kernel.org # 5.18+ Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/extent_io.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 53b59944013f..34073b0ed6ca 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -165,24 +165,28 @@ static int add_extent_changeset(struct extent_state *state, u32 bits, return ret; }
-int __must_check submit_one_bio(struct bio *bio, int mirror_num, - unsigned long bio_flags) +int submit_one_bio(struct bio *bio, int mirror_num, unsigned long bio_flags) { - blk_status_t ret = 0; struct extent_io_tree *tree = bio->bi_private;
bio->bi_private = NULL;
/* Caller should ensure the bio has at least some range added */ ASSERT(bio->bi_iter.bi_size); + if (is_data_inode(tree->private_data)) - ret = btrfs_submit_data_bio(tree->private_data, bio, mirror_num, + btrfs_submit_data_bio(tree->private_data, bio, mirror_num, bio_flags); else - ret = btrfs_submit_metadata_bio(tree->private_data, bio, + btrfs_submit_metadata_bio(tree->private_data, bio, mirror_num, bio_flags); - - return blk_status_to_errno(ret); + /* + * Above submission hooks will handle the error by ending the bio, + * which will do the cleanup properly. + * So here we should not return any error, or the caller of + * submit_extent_page() will do cleanup again, causing problems. + */ + return 0; }
/* Cleanup unsubmitted bios */