On 2024-02-11 03:58, Qu Wenruo wrote:
On 2024/2/11 01:04, Celeste Liu wrote:
On 3/2/23 09:54, Qu Wenruo wrote:
[BUG] During my scrub rework, I did a stupid thing like this:
bio->bi_iter.bi_sector = stripe->logical; btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
Above bi_sector assignment is using logical address directly, which lacks ">> SECTOR_SHIFT".
This results a read on a range which has no chunk mapping.
This results the following crash:
BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536 assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387 ------------[ cut here ]------------
Sure this is all my fault, but this shows a possible problem in real world, that some bitflip in file extents/tree block can point to unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT is not configured, cause invalid pointer.
[PROBLEMS] In above call chain, we just don't handle the possible error from btrfs_get_chunk_map() inside __btrfs_map_block().
[FIX] The fix is pretty straightforward, just replace the ASSERT() with proper error handling.
Signed-off-by: Qu Wenruo wqu@suse.com
Changelog: v2:
- Rebased to latest misc-next
The error path in bio.c is already fixed, thus only need to replace the ASSERT() in __btrfs_map_block().
fs/btrfs/volumes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4d479ac233a4..93bc45001e68 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, return -EINVAL;
em = btrfs_get_chunk_map(fs_info, logical, *length); - ASSERT(!IS_ERR(em)); + if (IS_ERR(em)) + return PTR_ERR(em);
map = em->map_lookup; data_stripes = nr_data_stripes(map);
This bug affects 6.1.y LTS branch but no backport commit of this fix in 6.1.y branch. Please include it. Thanks.
Just want to make sure, are you hitting the ASSERT()?
No, in non-debug build, ASSERT() is noop. So em will be a pointer whose value is errno and there is no any error handle for it. And then it trigger kernel NULL Pointer Dereference exception in btrfs_get_io_geometry() with em->map_lookup (Of course, it's not 0 but 0x5a (-EINVAL + offsetof(map_lookup)). But it still is a illegal memory access), this kernel thread was killed but the lock hold by this thread are not released. After that, all fs operations was block by the lock.
If so, please provide the calltrace and btrfs-check output to pin down the cause.
It doesn't happen on my machine, I've notified the finder to send a backtrace to the mailing list
Just backporting the patch is only to make it a little more graceful, but won't solve the root problem.
No. In non debug build, ASSERT() is noop so it lead that there is no any error handle code. There is no RAII/SBRM, unexpected exit will lead the resources will not be released correctly. If it has unique holder (e.g. mutex), it will break all other code that need this resource!
Thanks, Qu