[BUG] Test case generic/475 have a very high chance (almost 100%) to hit a fs hang, where a data page will never be unlocked and hang all later operations.
[CAUSE] In btrfs_do_readpage(), if we hit an error from submit_extent_page() we will try to do the cleanup for our current io range, and exit.
This works fine for PAGE_SIZE == sectorsize cases, but not for subpage.
For subpage btrfs_do_readpage() will lock the full page first, which can contain several different sectors and extents:
btrfs_do_readpage() |- begin_page_read() | |- btrfs_subpage_start_reader(); | Now the page will hage PAGE_SIZE / sectorsize reader pending, | and the page is locked. | |- end_page_read() for different branches | This function will reduce subpage readers, and when readers | reach 0, it will unlock the page.
But when submit_extent_page() failed, we only cleanup the current io range, while the remaining io range will never be cleaned up, and the page remains locked forever.
[FIX] Update the error handling of submit_extent_page() to cleanup all the remaining subpage range before exiting the loop.
Please note that, now submit_extent_page() can only fail due to sanity check in alloc_new_bio().
Thus regular IO errors are impossible to trigger the error path.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/extent_io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 163aa6dee987..990d8475ba31 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3774,8 +3774,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, this_bio_flag, force_bio_submit); if (ret) { - unlock_extent(tree, cur, cur + iosize - 1); - end_page_read(page, false, cur, iosize); + /* + * We have to unlock the remaining range, or the page + * will never be unlocked. + */ + unlock_extent(tree, cur, end); + end_page_read(page, false, cur, end + 1 - cur); goto out; } cur = cur + iosize;