Hi,
Syzbot has reporting hitting this bug on 6.1.18 and 5.15.101 LTS kernels and provided reproducer as well.
BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
I've copied the same config and reproduced the bug on 6.1.18, 6.1.44 and next-20230809.
This part of code hasn't been changed from the time it was introduced 4e7ea81db53465 ("ext4: restructure writeback path"). I'm not sure why the inlined data is being destroyed before copying it somewhere else.
Please consider this a report.
Regards, Muhammad Usama Anjum
On 3/13/23 11:34 AM, syzbot wrote:
syzbot has found a reproducer for the following issue on:
HEAD commit: 1cc3fcf63192 Linux 6.1.18 git tree: linux-6.1.y console output: https://syzkaller.appspot.com/x/log.txt?x=10d4b342c80000 kernel config: https://syzkaller.appspot.com/x/.config?x=157296d36f92ea19
^ Kernel config
dashboard link: https://syzkaller.appspot.com/bug?extid=a8068dd81edde0186829 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13512ec6c80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ca0ff4c80000
^ reproducers. C reproducer reproduces the bug easily.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0e4c0d43698b/disk-1cc3fcf6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/a4de39d735de/vmlinux-1cc3fcf6.x... kernel image: https://storage.googleapis.com/syzbot-assets/82bab928f6e3/Image-1cc3fcf6.gz.... mounted in repro: https://storage.googleapis.com/syzbot-assets/bf2e21b96210/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a8068dd81edde0186829@syzkaller.appspotmail.com
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2746! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.18-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: writeback wb_workfn (flush-7:0) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 lr : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 sp : ffff800019d16d40 x29: ffff800019d17120 x28: ffff800008e691e4 x27: dfff800000000000 x26: ffff0000de1f3ee0 x25: ffff800019d17590 x24: ffff800019d17020 x23: ffff0000dd616000 x22: ffff800019d16f40 x21: ffff0000de1f4108 x20: 0000008410000000 x19: 0000000000000001 x18: ffff800019d16a20 x17: ffff80001572d000 x16: ffff8000083099b4 x15: 000000000000ba31 x14: 00000000ffffffff x13: dfff800000000000 x12: 0000000000000001 x11: ff80800008e6c7d8 x10: 0000000000000000 x9 : ffff800008e6c7d8 x8 : ffff0000c099b680 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000008000000000 x0 : 0000000000000000 Call trace: ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 do_writepages+0x2e8/0x56c mm/page-writeback.c:2469 __writeback_single_inode+0x228/0x1ec8 fs/fs-writeback.c:1587 writeback_sb_inodes+0x9c0/0x1844 fs/fs-writeback.c:1878 wb_writeback+0x4f8/0x1580 fs/fs-writeback.c:2052 wb_do_writeback fs/fs-writeback.c:2195 [inline] wb_workfn+0x460/0x11b8 fs/fs-writeback.c:2235 process_one_work+0x868/0x16f4 kernel/workqueue.c:2289 worker_thread+0x8e4/0xfec kernel/workqueue.c:2436 kthread+0x24c/0x2d4 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: d4210000 97da5cfa d4210000 97da5cf8 (d4210000) ---[ end trace 0000000000000000 ]---
Hello!
On 2023/8/10 18:49, Muhammad Usama Anjum wrote:
Hi,
Syzbot has reporting hitting this bug on 6.1.18 and 5.15.101 LTS kernels and provided reproducer as well.
BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
I've copied the same config and reproduced the bug on 6.1.18, 6.1.44 and next-20230809.
This part of code hasn't been changed from the time it was introduced 4e7ea81db53465 ("ext4: restructure writeback path"). I'm not sure why the inlined data is being destroyed before copying it somewhere else.
Please consider this a report.
Regards, Muhammad Usama Anjum
We've already noticed this problem, which is caused by the fact that
ext4_convert_inline_data() in ext4_page_mkwrite() is not protected by
an inode_lock, so it can modify the state of the inode while someone
else is holding the lock.
Unfortunately we don't have a good solution for this at the moment,
as adding inode_lock here could easily form an ABBA deadlock with
mmap_lock. For a more detailed discussion see:
https://lkml.org/lkml/2023/5/30/894
On 3/13/23 11:34 AM, syzbot wrote:
syzbot has found a reproducer for the following issue on:
HEAD commit: 1cc3fcf63192 Linux 6.1.18 git tree: linux-6.1.y console output: https://syzkaller.appspot.com/x/log.txt?x=10d4b342c80000 kernel config: https://syzkaller.appspot.com/x/.config?x=157296d36f92ea19
^ Kernel config
dashboard link: https://syzkaller.appspot.com/bug?extid=a8068dd81edde0186829 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13512ec6c80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ca0ff4c80000
^ reproducers. C reproducer reproduces the bug easily.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0e4c0d43698b/disk-1cc3fcf6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/a4de39d735de/vmlinux-1cc3fcf6.x... kernel image: https://storage.googleapis.com/syzbot-assets/82bab928f6e3/Image-1cc3fcf6.gz.... mounted in repro: https://storage.googleapis.com/syzbot-assets/bf2e21b96210/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a8068dd81edde0186829@syzkaller.appspotmail.com
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2746! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.18-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: writeback wb_workfn (flush-7:0) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 lr : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 sp : ffff800019d16d40 x29: ffff800019d17120 x28: ffff800008e691e4 x27: dfff800000000000 x26: ffff0000de1f3ee0 x25: ffff800019d17590 x24: ffff800019d17020 x23: ffff0000dd616000 x22: ffff800019d16f40 x21: ffff0000de1f4108 x20: 0000008410000000 x19: 0000000000000001 x18: ffff800019d16a20 x17: ffff80001572d000 x16: ffff8000083099b4 x15: 000000000000ba31 x14: 00000000ffffffff x13: dfff800000000000 x12: 0000000000000001 x11: ff80800008e6c7d8 x10: 0000000000000000 x9 : ffff800008e6c7d8 x8 : ffff0000c099b680 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000008000000000 x0 : 0000000000000000 Call trace: ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 do_writepages+0x2e8/0x56c mm/page-writeback.c:2469 __writeback_single_inode+0x228/0x1ec8 fs/fs-writeback.c:1587 writeback_sb_inodes+0x9c0/0x1844 fs/fs-writeback.c:1878 wb_writeback+0x4f8/0x1580 fs/fs-writeback.c:2052 wb_do_writeback fs/fs-writeback.c:2195 [inline] wb_workfn+0x460/0x11b8 fs/fs-writeback.c:2235 process_one_work+0x868/0x16f4 kernel/workqueue.c:2289 worker_thread+0x8e4/0xfec kernel/workqueue.c:2436 kthread+0x24c/0x2d4 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: d4210000 97da5cfa d4210000 97da5cf8 (d4210000) ---[ end trace 0000000000000000 ]---
On 8/10/23 4:30 PM, Baokun Li wrote:
Hello!
On 2023/8/10 18:49, Muhammad Usama Anjum wrote:
Hi,
Syzbot has reporting hitting this bug on 6.1.18 and 5.15.101 LTS kernels and provided reproducer as well.
BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
I've copied the same config and reproduced the bug on 6.1.18, 6.1.44 and next-20230809.
This part of code hasn't been changed from the time it was introduced 4e7ea81db53465 ("ext4: restructure writeback path"). I'm not sure why the inlined data is being destroyed before copying it somewhere else.
Please consider this a report.
Regards, Muhammad Usama Anjum
We've already noticed this problem, which is caused by the fact that
ext4_convert_inline_data() in ext4_page_mkwrite() is not protected by
an inode_lock, so it can modify the state of the inode while someone
else is holding the lock.
Unfortunately we don't have a good solution for this at the moment,
as adding inode_lock here could easily form an ABBA deadlock with
mmap_lock. For a more detailed discussion see:
Thank you so much for replying, explaining and this reference.
On 3/13/23 11:34 AM, syzbot wrote:
syzbot has found a reproducer for the following issue on:
HEAD commit: 1cc3fcf63192 Linux 6.1.18 git tree: linux-6.1.y console output: https://syzkaller.appspot.com/x/log.txt?x=10d4b342c80000 kernel config: https://syzkaller.appspot.com/x/.config?x=157296d36f92ea19
^ Kernel config
dashboard link: https://syzkaller.appspot.com/bug?extid=a8068dd81edde0186829 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13512ec6c80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ca0ff4c80000
^ reproducers. C reproducer reproduces the bug easily.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0e4c0d43698b/disk-1cc3fcf6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/a4de39d735de/vmlinux-1cc3fcf6.x... kernel image: https://storage.googleapis.com/syzbot-assets/82bab928f6e3/Image-1cc3fcf6.gz.... mounted in repro: https://storage.googleapis.com/syzbot-assets/bf2e21b96210/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a8068dd81edde0186829@syzkaller.appspotmail.com
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2746! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.18-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: writeback wb_workfn (flush-7:0) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 lr : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 sp : ffff800019d16d40 x29: ffff800019d17120 x28: ffff800008e691e4 x27: dfff800000000000 x26: ffff0000de1f3ee0 x25: ffff800019d17590 x24: ffff800019d17020 x23: ffff0000dd616000 x22: ffff800019d16f40 x21: ffff0000de1f4108 x20: 0000008410000000 x19: 0000000000000001 x18: ffff800019d16a20 x17: ffff80001572d000 x16: ffff8000083099b4 x15: 000000000000ba31 x14: 00000000ffffffff x13: dfff800000000000 x12: 0000000000000001 x11: ff80800008e6c7d8 x10: 0000000000000000 x9 : ffff800008e6c7d8 x8 : ffff0000c099b680 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000008000000000 x0 : 0000000000000000 Call trace: ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 do_writepages+0x2e8/0x56c mm/page-writeback.c:2469 __writeback_single_inode+0x228/0x1ec8 fs/fs-writeback.c:1587 writeback_sb_inodes+0x9c0/0x1844 fs/fs-writeback.c:1878 wb_writeback+0x4f8/0x1580 fs/fs-writeback.c:2052 wb_do_writeback fs/fs-writeback.c:2195 [inline] wb_workfn+0x460/0x11b8 fs/fs-writeback.c:2235 process_one_work+0x868/0x16f4 kernel/workqueue.c:2289 worker_thread+0x8e4/0xfec kernel/workqueue.c:2436 kthread+0x24c/0x2d4 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: d4210000 97da5cfa d4210000 97da5cf8 (d4210000) ---[ end trace 0000000000000000 ]---
On 8/10/23 3:49 PM, Muhammad Usama Anjum wrote:
Hi,
Syzbot has reporting hitting this bug on 6.1.18 and 5.15.101 LTS kernels and provided reproducer as well.
BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
I've copied the same config and reproduced the bug on 6.1.18, 6.1.44 and next-20230809.
This part of code hasn't been changed from the time it was introduced 4e7ea81db53465 ("ext4: restructure writeback path"). I'm not sure why the inlined data is being destroyed before copying it somewhere else.
Please consider this a report.
Regards, Muhammad Usama Anjum
On 3/13/23 11:34 AM, syzbot wrote:
syzbot has found a reproducer for the following issue on:
HEAD commit: 1cc3fcf63192 Linux 6.1.18 git tree: linux-6.1.y console output: https://syzkaller.appspot.com/x/log.txt?x=10d4b342c80000 kernel config: https://syzkaller.appspot.com/x/.config?x=157296d36f92ea19
^ Kernel config
dashboard link: https://syzkaller.appspot.com/bug?extid=a8068dd81edde0186829 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13512ec6c80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ca0ff4c80000
^ reproducers. C reproducer reproduces the bug easily.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0e4c0d43698b/disk-1cc3fcf6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/a4de39d735de/vmlinux-1cc3fcf6.x... kernel image: https://storage.googleapis.com/syzbot-assets/82bab928f6e3/Image-1cc3fcf6.gz.... mounted in repro: https://storage.googleapis.com/syzbot-assets/bf2e21b96210/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a8068dd81edde0186829@syzkaller.appspotmail.com
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2746! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.18-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: writeback wb_workfn (flush-7:0) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 lr : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 sp : ffff800019d16d40 x29: ffff800019d17120 x28: ffff800008e691e4 x27: dfff800000000000 x26: ffff0000de1f3ee0 x25: ffff800019d17590 x24: ffff800019d17020 x23: ffff0000dd616000 x22: ffff800019d16f40 x21: ffff0000de1f4108 x20: 0000008410000000 x19: 0000000000000001 x18: ffff800019d16a20 x17: ffff80001572d000 x16: ffff8000083099b4 x15: 000000000000ba31 x14: 00000000ffffffff x13: dfff800000000000 x12: 0000000000000001 x11: ff80800008e6c7d8 x10: 0000000000000000 x9 : ffff800008e6c7d8 x8 : ffff0000c099b680 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000008000000000 x0 : 0000000000000000 Call trace: ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 do_writepages+0x2e8/0x56c mm/page-writeback.c:2469 __writeback_single_inode+0x228/0x1ec8 fs/fs-writeback.c:1587 writeback_sb_inodes+0x9c0/0x1844 fs/fs-writeback.c:1878 wb_writeback+0x4f8/0x1580 fs/fs-writeback.c:2052 wb_do_writeback fs/fs-writeback.c:2195 [inline] wb_workfn+0x460/0x11b8 fs/fs-writeback.c:2235 process_one_work+0x868/0x16f4 kernel/workqueue.c:2289 worker_thread+0x8e4/0xfec kernel/workqueue.c:2436 kthread+0x24c/0x2d4 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: d4210000 97da5cfa d4210000 97da5cf8 (d4210000) ---[ end trace 0000000000000000 ]---
The last refactoring was done by 4e7ea81db53465 on this code in 2013. The code segment in question is present from even before that. It means that this bug is present for several years. 4.14 is the most old kernel being maintained today. So it affects all current LTS and mainline kernels. I'll report 4e7ea81db53465 with regzbot for proper tracking. Thus probably the bug report will get associated with all LTS kernels as well.
#regzbot title: Race condition between buffer write and page_mkwrite
#regzbot introduced: 4e7ea81db53465
#regzbot monitor: https://lore.kernel.org/all/20230530134405.322194-1-libaokun1@huawei.com
On 8/14/23 10:31 AM, Muhammad Usama Anjum wrote:
On 8/10/23 3:49 PM, Muhammad Usama Anjum wrote:
Hi,
Syzbot has reporting hitting this bug on 6.1.18 and 5.15.101 LTS kernels and provided reproducer as well.
BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
I've copied the same config and reproduced the bug on 6.1.18, 6.1.44 and next-20230809.
This part of code hasn't been changed from the time it was introduced 4e7ea81db53465 ("ext4: restructure writeback path"). I'm not sure why the inlined data is being destroyed before copying it somewhere else.
Please consider this a report.
Regards, Muhammad Usama Anjum
On 3/13/23 11:34 AM, syzbot wrote:
syzbot has found a reproducer for the following issue on:
HEAD commit: 1cc3fcf63192 Linux 6.1.18 git tree: linux-6.1.y console output: https://syzkaller.appspot.com/x/log.txt?x=10d4b342c80000 kernel config: https://syzkaller.appspot.com/x/.config?x=157296d36f92ea19
^ Kernel config
dashboard link: https://syzkaller.appspot.com/bug?extid=a8068dd81edde0186829 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13512ec6c80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ca0ff4c80000
^ reproducers. C reproducer reproduces the bug easily.
Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0e4c0d43698b/disk-1cc3fcf6.raw.... vmlinux: https://storage.googleapis.com/syzbot-assets/a4de39d735de/vmlinux-1cc3fcf6.x... kernel image: https://storage.googleapis.com/syzbot-assets/82bab928f6e3/Image-1cc3fcf6.gz.... mounted in repro: https://storage.googleapis.com/syzbot-assets/bf2e21b96210/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+a8068dd81edde0186829@syzkaller.appspotmail.com
------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2746! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.18-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: writeback wb_workfn (flush-7:0) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 lr : ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 sp : ffff800019d16d40 x29: ffff800019d17120 x28: ffff800008e691e4 x27: dfff800000000000 x26: ffff0000de1f3ee0 x25: ffff800019d17590 x24: ffff800019d17020 x23: ffff0000dd616000 x22: ffff800019d16f40 x21: ffff0000de1f4108 x20: 0000008410000000 x19: 0000000000000001 x18: ffff800019d16a20 x17: ffff80001572d000 x16: ffff8000083099b4 x15: 000000000000ba31 x14: 00000000ffffffff x13: dfff800000000000 x12: 0000000000000001 x11: ff80800008e6c7d8 x10: 0000000000000000 x9 : ffff800008e6c7d8 x8 : ffff0000c099b680 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000008000000000 x0 : 0000000000000000 Call trace: ext4_writepages+0x35f4/0x35f8 fs/ext4/inode.c:2745 do_writepages+0x2e8/0x56c mm/page-writeback.c:2469 __writeback_single_inode+0x228/0x1ec8 fs/fs-writeback.c:1587 writeback_sb_inodes+0x9c0/0x1844 fs/fs-writeback.c:1878 wb_writeback+0x4f8/0x1580 fs/fs-writeback.c:2052 wb_do_writeback fs/fs-writeback.c:2195 [inline] wb_workfn+0x460/0x11b8 fs/fs-writeback.c:2235 process_one_work+0x868/0x16f4 kernel/workqueue.c:2289 worker_thread+0x8e4/0xfec kernel/workqueue.c:2436 kthread+0x24c/0x2d4 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: d4210000 97da5cfa d4210000 97da5cf8 (d4210000) ---[ end trace 0000000000000000 ]---
The last refactoring was done by 4e7ea81db53465 on this code in 2013. The code segment in question is present from even before that. It means that this bug is present for several years. 4.14 is the most old kernel being maintained today. So it affects all current LTS and mainline kernels. I'll report 4e7ea81db53465 with regzbot for proper tracking. Thus probably the bug report will get associated with all LTS kernels as well.
#regzbot title: Race condition between buffer write and page_mkwrite
#regzbot title: ext4: Race condition between buffer write and page_mkwrite
#regzbot introduced: 4e7ea81db53465
#regzbot monitor: https://lore.kernel.org/all/20230530134405.322194-1-libaokun1@huawei.com
On Mon, Aug 14, 2023 at 10:35:57AM +0500, Muhammad Usama Anjum wrote:
The last refactoring was done by 4e7ea81db53465 on this code in 2013. The code segment in question is present from even before that. It means that this bug is present for several years. 4.14 is the most old kernel being maintained today. So it affects all current LTS and mainline kernels. I'll report 4e7ea81db53465 with regzbot for proper tracking. Thus probably the bug report will get associated with all LTS kernels as well.
#regzbot title: Race condition between buffer write and page_mkwrite
#regzbot title: ext4: Race condition between buffer write and page_mkwrite
If it's a long-standing bug, then it's really not something I consider a regression. That being said, you're assuming that the refactoring is what has introduced the bug; that's not necessarily case.
*Especially* if it requires a maliciously fuzzed file system, since you have to be root to mount a file system. That's the other thing; the different reports at the console have different reproducers, and at least one of them has a very badly corrupted file system --- and since you need to have root to mount the a maliciously fuzzed file system, these are treated with a much lower priority as far as I'm concerned.
(If you think it should be higher priority, and your company is willing to fund such work, patches are greatfully appreciated. :-)
I tried to reproduce this using one of the reproducers on a modern kernel, and it doesn't reproduce there. That being said, it's not entirely what the reproducer is doing, since (a) passing -1 to the in_fd and out_fd to sendfile *should* just cause sendfile to to return an EBADF error, and (b) when I ran it, it just segfaulted on an mmap() before it executed anything interesting.
Please let me know (a) if you can replicate this on the latest upstream kernel, and (b) if the reproducer doesn't require a maliciously fuzzed kernel, or where the reproducer is scribbling on the file system image while it is mounted.
Cheers,
- Ted
Thank you for looking at the email.
On 8/15/23 3:05 AM, Theodore Ts'o wrote:
On Mon, Aug 14, 2023 at 10:35:57AM +0500, Muhammad Usama Anjum wrote:
The last refactoring was done by 4e7ea81db53465 on this code in 2013. The code segment in question is present from even before that. It means that this bug is present for several years. 4.14 is the most old kernel being maintained today. So it affects all current LTS and mainline kernels. I'll report 4e7ea81db53465 with regzbot for proper tracking. Thus probably the bug report will get associated with all LTS kernels as well.
#regzbot title: Race condition between buffer write and page_mkwrite
#regzbot title: ext4: Race condition between buffer write and page_mkwrite
If it's a long-standing bug, then it's really not something I consider a regression. That being said, you're assuming that the refactoring is what has introduced the bug; that's not necessarily case.
The bug was introduced by the following patch: 9c3569b50f12 ("ext4: add delalloc support for inline data")
https://lore.kernel.org/all/1351047338-4963-7-git-send-email-tm@tao.ma/ The bug is in the inline data feature addition patches itself.
Should I remove this regression from regzbot marking it as not regression and only a long-standing bug?
*Especially* if it requires a maliciously fuzzed file system, since you have to be root to mount a file system. That's the other thing; the different reports at the console have different reproducers, and at least one of them has a very badly corrupted file system --- and since you need to have root to mount the a maliciously fuzzed file system, these are treated with a much lower priority as far as I'm concerned.
(If you think it should be higher priority, and your company is willing to fund such work, patches are greatfully appreciated. :-)
I tried to reproduce this using one of the reproducers on a modern kernel, and it doesn't reproduce there. That being said, it's not entirely what the reproducer is doing, since (a) passing -1 to the in_fd and out_fd to sendfile *should* just cause sendfile to to return an EBADF error, and (b) when I ran it, it just segfaulted on an mmap() before it executed anything interesting.
Please let me know (a) if you can replicate this on the latest upstream kernel, and (b) if the reproducer doesn't require a maliciously fuzzed kernel, or where the reproducer is scribbling on the file system image while it is mounted.
I can replicate the bug on next-20230809 with the attached config and reproducer application. Root permissions are required for the bug to get reproduced though.
Cheers,
- Ted
On 15.08.23 18:31, Muhammad Usama Anjum wrote:
Thank you for looking at the email.
On 8/15/23 3:05 AM, Theodore Ts'o wrote:
On Mon, Aug 14, 2023 at 10:35:57AM +0500, Muhammad Usama Anjum wrote:
The last refactoring was done by 4e7ea81db53465 on this code in 2013. The code segment in question is present from even before that. It means that this bug is present for several years. 4.14 is the most old kernel being maintained today. So it affects all current LTS and mainline kernels. I'll report 4e7ea81db53465 with regzbot for proper tracking. Thus probably the bug report will get associated with all LTS kernels as well.
#regzbot title: Race condition between buffer write and page_mkwrite
#regzbot title: ext4: Race condition between buffer write and page_mkwrite
If it's a long-standing bug, then it's really not something I consider a regression. That being said, you're assuming that the refactoring is what has introduced the bug; that's not necessarily case.
The bug was introduced by the following patch: 9c3569b50f12 ("ext4: add delalloc support for inline data")
Which was v3.8-rc1 afaics.
https://lore.kernel.org/all/1351047338-4963-7-git-send-email-tm@tao.ma/ The bug is in the inline data feature addition patches itself.
Should I remove this regression from regzbot marking it as not regression and only a long-standing bug?
Let me do that:
#regzbot inconclusive: regression from the 3.8 days, tracking doesn't really gain us anything
To explain: not sure how Linus sees it, but if the culprit was merged that long ago there is not much worth in tracking it, as there is no easy way to fix it with a revert or something anyway. Sure, the issue nevertheless should not remain unfixed, but lets trust Ted here that he'll sooner or later take care of it when he sees fit.
Ciao, Thorsten
linux-stable-mirror@lists.linaro.org