Recently we found a bug related with ext4 buffer head is fixed by commit 0b73284c564d("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate")[1].
This bug is fixed on some kernel long term versions, such as 5.10 and 5.15. However, on 5.4 stable version, we can still easily reproduce this bug by adding some delay after buffer_migrate_lock_buffers() in __buffer_migrate_page() and do fsstress on the ext4 filesystem. We can get some errors in dmesg like:
EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #73193: comm fsstress: reading directory lblock 0 EXT4-fs error (device pmem1): __ext4_find_entry:1658: inode #75334: comm fsstress: reading directory lblock 0
About how to fix this bug in 5.4 version, currently I have three ideas. But I don't know which one is better or is there any other feasible way to fix this bug elegantly based on the 5.4 stable branch?
The first idea comes from this thread[2]. In __buffer_migrate_page(), we can let it fallback to migrate_page that are not uptodate like fallback_migrate_page(), those pages that has buffers may probably do read operation soon. From [3], we can see this solution is not good enough because there are other places that lock the buffer without doing IO. I think this solution can be a candidate option to fix if we do not want to change a lot. Also based on my test results, the ext4 filesystem remains stable after one week stress test with this patch applied.
The second idea is backport a series of commits from upstream, such as
2d069c0889ef ("ext4: use common helpers in all places reading metadata buffers") 0b73284c564d ("ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate") 79f597842069 ("fs/buffer: remove ll_rw_block() helper")
This will lead to many lines of code change and should be carefully conducted, but it looks like the most reasonable solution so far.
The third idea is replace trylock_buffer in ll_rw_block() with lock_buffer and change ll_rw_block() in __breadahead_gfp() to trylock_buffer. However, this will change the semantic of ll_rw_block(), and will not be suitable for some readahead circumstances. Besides, the ll_rw_block() has many occurences among many filesystems other than ext4, I think it is better to limit the fix in the ext4 filesystem without affecting other filesystems.
Here I send the patch based on the first idea, hope someone can give more ideas about how to fix this bug in kernel 5.4 version, thanks.
[1] https://lore.kernel.org/linux-mm/20220825080146.2021641-1-chengzhihao1@huawe... [2] https://lore.kernel.org/all/20220831074629.3755110-1-yi.zhang@huawei.com/T/ [3] https://lore.kernel.org/linux-mm/20220825105704.e46hz6dp6opawsjk@quack3/
Yue Zhao (1): mm: migrate: buffer_migrate_page_norefs() fallback migrate not uptodate pages
mm/migrate.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)