The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 537f38f019fa0b762dbb4c0fc95d7fcce9db8e2d Mon Sep 17 00:00:00 2001
From: Nikolay Borisov nborisov@suse.com Date: Thu, 14 Mar 2019 09:52:35 +0200 Subject: [PATCH] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
If a an eb fails to be read for whatever reason - it's corrupted on disk and parent transid/key validations fail or IO for eb pages fail then this buffer must be removed from the buffer cache. Currently the code calls free_extent_buffer if an error occurs. Unfortunately this doesn't achieve the desired behavior since btrfs_find_create_tree_block returns with eb->refs == 2.
On the other hand free_extent_buffer will only decrement the refs once leaving it added to the buffer cache radix tree. This enables later code to look up the buffer from the cache and utilize it potentially leading to a crash.
The correct way to free the buffer is call free_extent_buffer_stale. This function will correctly call atomic_dec explicitly for the buffer and subsequently call release_extent_buffer which will decrement the final reference thus correctly remove the invalid buffer from buffer cache. This change affects only newly allocated buffers since they have eb->refs == 2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Reported-by: Jungyeon jungyeon@gatech.edu CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 46b368d84aa3..ea44cf136131 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1021,13 +1021,18 @@ void readahead_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr) { struct extent_buffer *buf = NULL; struct inode *btree_inode = fs_info->btree_inode; + int ret;
buf = btrfs_find_create_tree_block(fs_info, bytenr); if (IS_ERR(buf)) return; - read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree, - buf, WAIT_NONE, 0); - free_extent_buffer(buf); + + ret = read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree, buf, + WAIT_NONE, 0); + if (ret < 0) + free_extent_buffer_stale(buf); + else + free_extent_buffer(buf); }
int reada_tree_block_flagged(struct btrfs_fs_info *fs_info, u64 bytenr, @@ -1047,12 +1052,12 @@ int reada_tree_block_flagged(struct btrfs_fs_info *fs_info, u64 bytenr, ret = read_extent_buffer_pages(io_tree, buf, WAIT_PAGE_LOCK, mirror_num); if (ret) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return ret; }
if (test_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags)) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return -EIO; } else if (extent_buffer_uptodate(buf)) { *eb = buf; @@ -1106,7 +1111,7 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, ret = btree_read_extent_buffer_pages(fs_info, buf, parent_transid, level, first_key); if (ret) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return ERR_PTR(ret); } return buf;