A long story behind this one, first of all, we (LXD project folks) started to see issues in our tests on GitHub Actions ubuntu22.04 runners after they've got updated from 6.5 to 6.8-based kernel and this was reported on Launchpad tracker [1] by Wesley Hershberger.
At the same time, Stéphane Graber from LXC containers project saw the same problem on Incus testsuite (also on Github Actions).
Then we had a debugging session together with Stéphane and he found a quite minimalistic reproducer: curl https://pkgs.zabbly.com/get/incus-daily | sudo sh sudo apt-get install lvm2 --yes sudo incus storage create default lvm volume.size=25MiB size=1GiB sudo incus image copy images:alpine/edge local: --alias testimage sudo incus profile device add default root disk pool=default path=/ size=3GiB sudo incus create testimage a1
this thing produces the following output in kernel logs:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324! [ 33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 [ 33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.895877] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.896524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 33.897516] Call Trace: [ 33.897638] <TASK> [ 33.897728] ? show_regs+0x6d/0x80 [ 33.897942] ? die+0x3c/0xa0 [ 33.898106] ? do_trap+0xe5/0x110 [ 33.898311] ? do_error_trap+0x6e/0x90 [ 33.898555] ? ext4_resize_fs+0x1212/0x12d0 [ 33.898844] ? exc_invalid_op+0x57/0x80 [ 33.899101] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899387] ? asm_exc_invalid_op+0x1f/0x30 [ 33.899675] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899961] ? ext4_resize_fs+0x745/0x12d0 [ 33.900239] __ext4_ioctl+0x4e0/0x1800 [ 33.900489] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.900832] ? putname+0x5b/0x70 [ 33.901028] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901374] ? do_sys_openat2+0x87/0xd0 [ 33.901632] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901981] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.902324] ? __x64_sys_openat+0x59/0xa0 [ 33.902595] ext4_ioctl+0x12/0x20 [ 33.902802] ? ext4_ioctl+0x12/0x20 [ 33.903031] __x64_sys_ioctl+0x99/0xd0 [ 33.903277] x64_sys_call+0x1206/0x20d0 [ 33.903534] do_syscall_64+0x72/0x110 [ 33.903771] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904115] ? irqentry_exit+0x3f/0x50 [ 33.904362] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904707] ? exc_page_fault+0x1aa/0x7b0 [ 33.904979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 33.905349] RIP: 0033:0x7f46efe3294f [ 33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f [ 33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004 [ 33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7 [ 33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000 [ 33.910740] </TASK> [ 33.910837] Modules linked in: [ 33.911049] ---[ end trace 0000000000000000 ]--- [ 33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.917884] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.918818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 44.072293] ------------[ cut here ]------------
This patch is attempt to fix it and I can confirm that after applying it everything works just fine. At the same time, I'm not knowledgable in ext4 filesystem code so careful review is required here.
Kind regards, Alex
Cc: stable@vger.kernel.org # v6.8+ Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 [1] Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Jan Kara jack@suse.cz Cc: Baokun Li libaokun1@huawei.com Cc: Stéphane Graber stgraber@stgraber.org Cc: Christian Brauner brauner@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org Reported-by: Wesley Hershberger wesley.hershberger@canonical.com Reported-by: Stéphane Graber stgraber@stgraber.org Signed-off-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
Alexander Mikhalitsyn (1): ext4: fix BUG at fs/ext4/resize.c
fs/ext4/resize.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324! [ 33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 [ 33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.895877] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.896524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 33.897516] Call Trace: [ 33.897638] <TASK> [ 33.897728] ? show_regs+0x6d/0x80 [ 33.897942] ? die+0x3c/0xa0 [ 33.898106] ? do_trap+0xe5/0x110 [ 33.898311] ? do_error_trap+0x6e/0x90 [ 33.898555] ? ext4_resize_fs+0x1212/0x12d0 [ 33.898844] ? exc_invalid_op+0x57/0x80 [ 33.899101] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899387] ? asm_exc_invalid_op+0x1f/0x30 [ 33.899675] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899961] ? ext4_resize_fs+0x745/0x12d0 [ 33.900239] __ext4_ioctl+0x4e0/0x1800 [ 33.900489] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.900832] ? putname+0x5b/0x70 [ 33.901028] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901374] ? do_sys_openat2+0x87/0xd0 [ 33.901632] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901981] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.902324] ? __x64_sys_openat+0x59/0xa0 [ 33.902595] ext4_ioctl+0x12/0x20 [ 33.902802] ? ext4_ioctl+0x12/0x20 [ 33.903031] __x64_sys_ioctl+0x99/0xd0 [ 33.903277] x64_sys_call+0x1206/0x20d0 [ 33.903534] do_syscall_64+0x72/0x110 [ 33.903771] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904115] ? irqentry_exit+0x3f/0x50 [ 33.904362] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904707] ? exc_page_fault+0x1aa/0x7b0 [ 33.904979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 33.905349] RIP: 0033:0x7f46efe3294f [ 33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f [ 33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004 [ 33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7 [ 33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000 [ 33.910740] </TASK> [ 33.910837] Modules linked in: [ 33.911049] ---[ end trace 0000000000000000 ]--- [ 33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.917884] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.918818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 44.072293] ------------[ cut here ]------------
Cc: stable@vger.kernel.org # v6.8+ Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Jan Kara jack@suse.cz Cc: Baokun Li libaokun1@huawei.com Cc: Stéphane Graber stgraber@stgraber.org Cc: Christian Brauner brauner@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org Reported-by: Wesley Hershberger wesley.hershberger@canonical.com Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber stgraber@stgraber.org Signed-off-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com --- fs/ext4/resize.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..c057a7867363 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -300,8 +300,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd) * block group. */ static int ext4_alloc_group_tables(struct super_block *sb, - struct ext4_new_flex_group_data *flex_gd, - unsigned int flexbg_size) + struct ext4_new_flex_group_data *flex_gd) { struct ext4_new_group_data *group_data = flex_gd->groups; ext4_fsblk_t start_blk; @@ -313,7 +312,7 @@ static int ext4_alloc_group_tables(struct super_block *sb, ext4_group_t group; ext4_group_t last_group; unsigned overhead; - __u16 uninit_mask = (flexbg_size > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0; + __u16 uninit_mask = (flex_gd->resize_bg > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0; int i;
BUG_ON(flex_gd->count == 0 || group_data == NULL); @@ -321,8 +320,8 @@ static int ext4_alloc_group_tables(struct super_block *sb, src_group = group_data[0].group; last_group = src_group + flex_gd->count - 1;
- BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) != - (last_group & ~(flexbg_size - 1)))); + BUG_ON((flex_gd->resize_bg > 1) && ((src_group & ~(flex_gd->resize_bg - 1)) != + (last_group & ~(flex_gd->resize_bg - 1)))); next_group: group = group_data[0].group; if (src_group >= group_data[0].group + flex_gd->count) @@ -403,7 +402,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
printk(KERN_DEBUG "EXT4-fs: adding a flex group with " "%u groups, flexbg size is %u:\n", flex_gd->count, - flexbg_size); + flex_gd->resize_bg);
for (i = 0; i < flex_gd->count; i++) { ext4_debug( @@ -2158,7 +2157,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count) ext4_blocks_count(es)); last_update_time = jiffies; } - if (ext4_alloc_group_tables(sb, flex_gd, flexbg_size) != 0) + if (ext4_alloc_group_tables(sb, flex_gd) != 0) break; err = ext4_flex_group_add(sb, resize_inode, flex_gd); if (unlikely(err))
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size - 1) which then confuses things. I think that was not really intended and instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
[ 33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 [ 33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.895877] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.896524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 33.897516] Call Trace: [ 33.897638] <TASK> [ 33.897728] ? show_regs+0x6d/0x80 [ 33.897942] ? die+0x3c/0xa0 [ 33.898106] ? do_trap+0xe5/0x110 [ 33.898311] ? do_error_trap+0x6e/0x90 [ 33.898555] ? ext4_resize_fs+0x1212/0x12d0 [ 33.898844] ? exc_invalid_op+0x57/0x80 [ 33.899101] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899387] ? asm_exc_invalid_op+0x1f/0x30 [ 33.899675] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899961] ? ext4_resize_fs+0x745/0x12d0 [ 33.900239] __ext4_ioctl+0x4e0/0x1800 [ 33.900489] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.900832] ? putname+0x5b/0x70 [ 33.901028] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901374] ? do_sys_openat2+0x87/0xd0 [ 33.901632] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901981] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.902324] ? __x64_sys_openat+0x59/0xa0 [ 33.902595] ext4_ioctl+0x12/0x20 [ 33.902802] ? ext4_ioctl+0x12/0x20 [ 33.903031] __x64_sys_ioctl+0x99/0xd0 [ 33.903277] x64_sys_call+0x1206/0x20d0 [ 33.903534] do_syscall_64+0x72/0x110 [ 33.903771] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904115] ? irqentry_exit+0x3f/0x50 [ 33.904362] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904707] ? exc_page_fault+0x1aa/0x7b0 [ 33.904979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 33.905349] RIP: 0033:0x7f46efe3294f [ 33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f [ 33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004 [ 33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7 [ 33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000 [ 33.910740] </TASK> [ 33.910837] Modules linked in: [ 33.911049] ---[ end trace 0000000000000000 ]--- [ 33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.917884] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.918818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 44.072293] ------------[ cut here ]------------
Cc: stable@vger.kernel.org # v6.8+ Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Jan Kara jack@suse.cz Cc: Baokun Li libaokun1@huawei.com Cc: Stéphane Graber stgraber@stgraber.org Cc: Christian Brauner brauner@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org Reported-by: Wesley Hershberger wesley.hershberger@canonical.com Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber stgraber@stgraber.org Signed-off-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
fs/ext4/resize.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..c057a7867363 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -300,8 +300,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
- block group.
*/ static int ext4_alloc_group_tables(struct super_block *sb,
struct ext4_new_flex_group_data *flex_gd,
unsigned int flexbg_size)
struct ext4_new_flex_group_data *flex_gd)
{ struct ext4_new_group_data *group_data = flex_gd->groups; ext4_fsblk_t start_blk; @@ -313,7 +312,7 @@ static int ext4_alloc_group_tables(struct super_block *sb, ext4_group_t group; ext4_group_t last_group; unsigned overhead;
- __u16 uninit_mask = (flexbg_size > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
- __u16 uninit_mask = (flex_gd->resize_bg > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0; int i;
BUG_ON(flex_gd->count == 0 || group_data == NULL); @@ -321,8 +320,8 @@ static int ext4_alloc_group_tables(struct super_block *sb, src_group = group_data[0].group; last_group = src_group + flex_gd->count - 1;
- BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
(last_group & ~(flexbg_size - 1))));
- BUG_ON((flex_gd->resize_bg > 1) && ((src_group & ~(flex_gd->resize_bg - 1)) !=
(last_group & ~(flex_gd->resize_bg - 1))));
next_group: group = group_data[0].group; if (src_group >= group_data[0].group + flex_gd->count) @@ -403,7 +402,7 @@ static int ext4_alloc_group_tables(struct super_block *sb, printk(KERN_DEBUG "EXT4-fs: adding a flex group with " "%u groups, flexbg size is %u:\n", flex_gd->count,
flexbg_size);
flex_gd->resize_bg);
for (i = 0; i < flex_gd->count; i++) { ext4_debug( @@ -2158,7 +2157,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count) ext4_blocks_count(es)); last_update_time = jiffies; }
if (ext4_alloc_group_tables(sb, flex_gd, flexbg_size) != 0)
err = ext4_flex_group_add(sb, resize_inode, flex_gd); if (unlikely(err))if (ext4_alloc_group_tables(sb, flex_gd) != 0) break;
-- 2.34.1
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara jack@suse.cz wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
Hi Jan,
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..530a918f0cab 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), fls(n_group - last_group));
+ BUG_ON(flex_gd->resize_bg > flexbg_size); + flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data), GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making flex_gd->resize_bg to be smaller than flexbg_size in most cases we can actually have an opposite effect here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
At the same time, if I understand the code right, as we can have flex_gd->resize_bg != flexbg_size after 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") and 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") we should always refer to flex_gd->resize_bg value which means that ext4_alloc_group_tables() fix is needed too. Am I correct in my understanding?
Honza
Kind regards, Alex
[ 33.889075] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.889503] CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 [ 33.890039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 33.890705] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.891063] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.892701] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.893081] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.893639] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.894197] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.894755] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.895317] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.895877] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.896524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.896954] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 33.897516] Call Trace: [ 33.897638] <TASK> [ 33.897728] ? show_regs+0x6d/0x80 [ 33.897942] ? die+0x3c/0xa0 [ 33.898106] ? do_trap+0xe5/0x110 [ 33.898311] ? do_error_trap+0x6e/0x90 [ 33.898555] ? ext4_resize_fs+0x1212/0x12d0 [ 33.898844] ? exc_invalid_op+0x57/0x80 [ 33.899101] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899387] ? asm_exc_invalid_op+0x1f/0x30 [ 33.899675] ? ext4_resize_fs+0x1212/0x12d0 [ 33.899961] ? ext4_resize_fs+0x745/0x12d0 [ 33.900239] __ext4_ioctl+0x4e0/0x1800 [ 33.900489] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.900832] ? putname+0x5b/0x70 [ 33.901028] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901374] ? do_sys_openat2+0x87/0xd0 [ 33.901632] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.901981] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.902324] ? __x64_sys_openat+0x59/0xa0 [ 33.902595] ext4_ioctl+0x12/0x20 [ 33.902802] ? ext4_ioctl+0x12/0x20 [ 33.903031] __x64_sys_ioctl+0x99/0xd0 [ 33.903277] x64_sys_call+0x1206/0x20d0 [ 33.903534] do_syscall_64+0x72/0x110 [ 33.903771] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904115] ? irqentry_exit+0x3f/0x50 [ 33.904362] ? srso_alias_return_thunk+0x5/0xfbef5 [ 33.904707] ? exc_page_fault+0x1aa/0x7b0 [ 33.904979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 33.905349] RIP: 0033:0x7f46efe3294f [ 33.905579] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 33.907321] RSP: 002b:00007ffe9b8833a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 33.907926] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f46efe3294f [ 33.908487] RDX: 00007ffe9b8834a0 RSI: 0000000040086610 RDI: 0000000000000004 [ 33.909046] RBP: 00005630a4a0b0e0 R08: 0000000000000000 R09: 00007ffe9b8832d7 [ 33.909605] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 33.910165] R13: 00005630a4a0c580 R14: 00005630a4a10400 R15: 0000000000000000 [ 33.910740] </TASK> [ 33.910837] Modules linked in: [ 33.911049] ---[ end trace 0000000000000000 ]--- [ 33.911428] RIP: 0010:ext4_resize_fs+0x1212/0x12d0 [ 33.911810] Code: b8 45 31 c0 4c 89 ff 45 31 c9 31 c9 ba 0e 08 00 00 48 c7 c6 68 75 65 b8 e8 2b 79 01 00 41 b8 ea ff ff ff 41 5f e9 8d f1 ff ff <0f> 0b 48 83 bd 70 ff ff ff 00 75 32 45 31 c0 e9 53 f1 ff ff 41 b8 [ 33.913928] RSP: 0018:ffffa97f413f3cc8 EFLAGS: 00010202 [ 33.914313] RAX: 0000000000000018 RBX: 0000000000000001 RCX: 00000000fffffff0 [ 33.914909] RDX: 0000000000000017 RSI: 0000000000000016 RDI: 00000000e8c2c810 [ 33.915482] RBP: ffffa97f413f3d90 R08: 0000000000000000 R09: 0000000000008000 [ 33.916258] R10: ffffa97f413f3cc8 R11: ffffa2c1845bfc80 R12: 0000000000000000 [ 33.917027] R13: ffffa2c1843d6000 R14: 0000000000008000 R15: ffffa2c199963000 [ 33.917884] FS: 00007f46efd17000(0000) GS:ffffa2c89fc40000(0000) knlGS:0000000000000000 [ 33.918818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.919322] CR2: 00005630a4a1cc88 CR3: 000000010532c000 CR4: 0000000000350eb0 [ 44.072293] ------------[ cut here ]------------
Cc: stable@vger.kernel.org # v6.8+ Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Jan Kara jack@suse.cz Cc: Baokun Li libaokun1@huawei.com Cc: Stéphane Graber stgraber@stgraber.org Cc: Christian Brauner brauner@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org Reported-by: Wesley Hershberger wesley.hershberger@canonical.com Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber stgraber@stgraber.org Signed-off-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
fs/ext4/resize.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..c057a7867363 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -300,8 +300,7 @@ static void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
- block group.
*/ static int ext4_alloc_group_tables(struct super_block *sb,
struct ext4_new_flex_group_data *flex_gd,
unsigned int flexbg_size)
struct ext4_new_flex_group_data *flex_gd)
{ struct ext4_new_group_data *group_data = flex_gd->groups; ext4_fsblk_t start_blk; @@ -313,7 +312,7 @@ static int ext4_alloc_group_tables(struct super_block *sb, ext4_group_t group; ext4_group_t last_group; unsigned overhead;
__u16 uninit_mask = (flexbg_size > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0;
__u16 uninit_mask = (flex_gd->resize_bg > 1) ? ~EXT4_BG_BLOCK_UNINIT : ~0; int i; BUG_ON(flex_gd->count == 0 || group_data == NULL);
@@ -321,8 +320,8 @@ static int ext4_alloc_group_tables(struct super_block *sb, src_group = group_data[0].group; last_group = src_group + flex_gd->count - 1;
BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
(last_group & ~(flexbg_size - 1))));
BUG_ON((flex_gd->resize_bg > 1) && ((src_group & ~(flex_gd->resize_bg - 1)) !=
(last_group & ~(flex_gd->resize_bg - 1))));
next_group: group = group_data[0].group; if (src_group >= group_data[0].group + flex_gd->count) @@ -403,7 +402,7 @@ static int ext4_alloc_group_tables(struct super_block *sb,
printk(KERN_DEBUG "EXT4-fs: adding a flex group with " "%u groups, flexbg size is %u:\n", flex_gd->count,
flexbg_size);
flex_gd->resize_bg); for (i = 0; i < flex_gd->count; i++) { ext4_debug(
@@ -2158,7 +2157,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count) ext4_blocks_count(es)); last_update_time = jiffies; }
if (ext4_alloc_group_tables(sb, flex_gd, flexbg_size) != 0)
if (ext4_alloc_group_tables(sb, flex_gd) != 0) break; err = ext4_flex_group_add(sb, resize_inode, flex_gd); if (unlikely(err))
-- 2.34.1
-- Jan Kara jack@suse.com SUSE Labs, CR
On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara jack@suse.cz wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
Hi Jan,
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..530a918f0cab 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), fls(n_group - last_group));
BUG_ON(flex_gd->resize_bg > flexbg_size);
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data), GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making flex_gd->resize_bg to be smaller than flexbg_size in most cases we can actually have an opposite effect here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
At the same time, if I understand the code right, as we can have flex_gd->resize_bg != flexbg_size after 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") and 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") we should always refer to flex_gd->resize_bg value which means that ext4_alloc_group_tables() fix is needed too. Am I correct in my understanding?
Hi Alex,
These two are not exactly equivalent.
The flex_gd->resize_bg is only used to determine how many block groups we allocate memory to, i.e., the maximum number of block groups per resize. And the flexbg_size is used to make some judgement on flexible block groups, for example, the BUG_ON triggered in the issue is to make sure src_group and last_group must be in the same flexible block group.
Regards, Baokun
On Thu, Sep 26, 2024 at 10:50 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara jack@suse.cz wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
Hi Jan,
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..530a918f0cab 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), fls(n_group - last_group));
BUG_ON(flex_gd->resize_bg > flexbg_size);
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data), GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making flex_gd->resize_bg to be smaller than flexbg_size in most cases we can actually have an opposite effect here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
At the same time, if I understand the code right, as we can have flex_gd->resize_bg != flexbg_size after 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") and 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") we should always refer to flex_gd->resize_bg value which means that ext4_alloc_group_tables() fix is needed too. Am I correct in my understanding?
Hi Alex,
Hi Baokun,
These two are not exactly equivalent.
The flex_gd->resize_bg is only used to determine how many block groups we allocate memory to, i.e., the maximum number of block groups per resize. And the flexbg_size is used to make some judgement on flexible block groups, for example, the BUG_ON triggered in the issue is to make sure src_group and last_group must be in the same flexible block group.
Huge thanks for explaining this!
Then I guess it's better if you send a patch with your fix. Feel free to add my Tested-by tag.
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
Regards, Baokun
On 2024/9/26 17:23, Aleksandr Mikhalitsyn wrote:
On Thu, Sep 26, 2024 at 10:50 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara jack@suse.cz wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
Hi Jan,
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..530a918f0cab 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), fls(n_group - last_group));
BUG_ON(flex_gd->resize_bg > flexbg_size);
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data), GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making flex_gd->resize_bg to be smaller than flexbg_size in most cases we can actually have an opposite effect here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
At the same time, if I understand the code right, as we can have flex_gd->resize_bg != flexbg_size after 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") and 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") we should always refer to flex_gd->resize_bg value which means that ext4_alloc_group_tables() fix is needed too. Am I correct in my understanding?
Hi Alex,
Hi Baokun,
These two are not exactly equivalent.
The flex_gd->resize_bg is only used to determine how many block groups we allocate memory to, i.e., the maximum number of block groups per resize. And the flexbg_size is used to make some judgement on flexible block groups, for example, the BUG_ON triggered in the issue is to make sure src_group and last_group must be in the same flexible block group.
Huge thanks for explaining this!
Then I guess it's better if you send a patch with your fix. Feel free to add my Tested-by tag.
Okay, I'll send a patch later.
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
I think it makes sense, and it's good to have more use cases to look around some corners. If you have an idea, let it go.
Regards, Baokun
On Thu, Sep 26, 2024 at 11:40 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 17:23, Aleksandr Mikhalitsyn wrote:
On Thu, Sep 26, 2024 at 10:50 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote:
On Wed, Sep 25, 2024 at 5:57 PM Jan Kara jack@suse.cz wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
Hi Jan,
First of all, thanks for your reaction/review on this one ;-)
You are absolutely right, have just checked with our reproducer and this modification:
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..530a918f0cab 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), fls(n_group - last_group));
BUG_ON(flex_gd->resize_bg > flexbg_size);
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data), GFP_NOFS);
and yes, it crashes on this BUG_ON. So it looks like instead of making flex_gd->resize_bg to be smaller than flexbg_size in most cases we can actually have an opposite effect here. I guess we really need to fix alloc_flex_gd() too.
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
At the same time, if I understand the code right, as we can have flex_gd->resize_bg != flexbg_size after 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") and 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") we should always refer to flex_gd->resize_bg value which means that ext4_alloc_group_tables() fix is needed too. Am I correct in my understanding?
Hi Alex,
Hi Baokun,
These two are not exactly equivalent.
The flex_gd->resize_bg is only used to determine how many block groups we allocate memory to, i.e., the maximum number of block groups per resize. And the flexbg_size is used to make some judgement on flexible block groups, for example, the BUG_ON triggered in the issue is to make sure src_group and last_group must be in the same flexible block group.
Huge thanks for explaining this!
Then I guess it's better if you send a patch with your fix. Feel free to add my Tested-by tag.
Okay, I'll send a patch later.
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
I think it makes sense, and it's good to have more use cases to look around some corners. If you have an idea, let it go.
Minimal reproducer:
mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV
Regards, Baokun
On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
I think it makes sense, and it's good to have more use cases to look around some corners. If you have an idea, let it go.
Minimal reproducer:
mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV
Hi Alex,
This replicator didn't replicate the issue in my VM, so I took a deeper look. The reproduction of the problem requires the following:
o_group = flexbg_size * 2 * n; o_size = (o_group + 1) * group_size; n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
Take n=1,flexbg_size=16 as an example: last:47 |----------------|----------------|o---------------|--------------n-| old:32 >>> new:62
Thus the replicator can be simplified as:
img=test.img truncate -s 600M $img mkfs.ext4 -F $img -b 1024 -G 16 264M dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 504M
On Thu, Sep 26, 2024 at 3:58 PM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
I think it makes sense, and it's good to have more use cases to look around some corners. If you have an idea, let it go.
Minimal reproducer:
mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV
Hi Alex,
This replicator didn't replicate the issue in my VM, so I took a deeper look. The reproduction of the problem requires the following:
That's weird. Have just tried once again and it reproduces the issue:
root@ubuntu:/home/ubuntu# mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV mke2fs 1.47.0 (5-Feb-2023) Creating filesystem with 6400 4k blocks and 6400 inodes
Allocating group tables: done Writing inode tables: done Creating journal (1024 blocks): done Writing superblocks and filesystem accounting information: done
resize2fs 1.47.0 (5-Feb-2023) Filesystem at /dev/loop4 is mounted on /tmp/ext4_crash/mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 1 Segmentation fault
My kernel's commit hash is 684a64bf32b6e488004e0ad7f0d7e922798f65b6
Maybe it somehow depends on the resize2fs version?
Kind regards, Alex
o_group = flexbg_size * 2 * n; o_size = (o_group + 1) * group_size; n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
Take n=1,flexbg_size=16 as an example: last:47 |----------------|----------------|o---------------|--------------n-| old:32 >>> new:62
Thus the replicator can be simplified as:
img=test.img truncate -s 600M $img mkfs.ext4 -F $img -b 1024 -G 16 264M dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 504M
-- Cheers, Baokun
On 2024/9/26 22:19, Aleksandr Mikhalitsyn wrote:
On Thu, Sep 26, 2024 at 3:58 PM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/26 19:32, Aleksandr Mikhalitsyn wrote:
Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc)
I guess this test can be put in the xfstests test suite, right?
Kind regards, Alex
I think it makes sense, and it's good to have more use cases to look around some corners. If you have an idea, let it go.
Minimal reproducer:
mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV
Hi Alex,
This replicator didn't replicate the issue in my VM, so I took a deeper look. The reproduction of the problem requires the following:
That's weird. Have just tried once again and it reproduces the issue:
root@ubuntu:/home/ubuntu# mkdir -p /tmp/ext4_crash/mnt EXT4_CRASH_IMG="/tmp/ext4_crash/disk.img" rm -f $EXT4_CRASH_IMG truncate $EXT4_CRASH_IMG --size 25MiB EXT4_CRASH_DEV=$(losetup --find --nooverlap --direct-io=on --show $EXT4_CRASH_IMG) mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $EXT4_CRASH_DEV mount $EXT4_CRASH_DEV /tmp/ext4_crash/mnt truncate $EXT4_CRASH_IMG --size 3GiB losetup -c $EXT4_CRASH_DEV resize2fs $EXT4_CRASH_DEV mke2fs 1.47.0 (5-Feb-2023) Creating filesystem with 6400 4k blocks and 6400 inodes
Allocating group tables: done Writing inode tables: done Creating journal (1024 blocks): done Writing superblocks and filesystem accounting information: done
resize2fs 1.47.0 (5-Feb-2023) Filesystem at /dev/loop4 is mounted on /tmp/ext4_crash/mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 1
I can see why, on my side I mkfsed a 25M sized disk, and the ext4 block size is 1024 by default, whereas on your side it's 4096. I set the block size to 4096 and it also reproduced the issue.
Thanks for your feedback!
Cheers, Baokun
On 2024/9/25 23:57, Jan Kara wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
Hi Honza,
Your analysis is absolutely correct. It's a bug! Thank you for locating this issue! An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
Hi Aleksandr,
Could you help test if the following changes work?
Thanks, Baokun
---
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..1f01a7632149 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, /* Avoid allocating large 'groups' array if not needed */ last_group = o_group | (flex_gd->resize_bg - 1); if (n_group <= last_group) - flex_gd->resize_bg = 1 << fls(n_group - o_group + 1); + flex_gd->resize_bg = 1 << fls(n_group - o_group); else if (n_group - last_group < flex_gd->resize_bg) - flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), + flex_gd->resize_bg = 1 << max(fls(last_group - o_group), fls(n_group - last_group));
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data),
On Thu, Sep 26, 2024 at 10:29 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/25 23:57, Jan Kara wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
Hi Honza,
Your analysis is absolutely correct. It's a bug! Thank you for locating this issue! An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
Hi Aleksandr,
Hi Baokun,
Could you help test if the following changes work?
I can confirm that this patch helps.
Tested-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
Kind regards, Alex
Thanks, Baokun
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..1f01a7632149 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, /* Avoid allocating large 'groups' array if not needed */ last_group = o_group | (flex_gd->resize_bg - 1); if (n_group <= last_group)
flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
flex_gd->resize_bg = 1 << fls(n_group - o_group); else if (n_group - last_group < flex_gd->resize_bg)
flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
flex_gd->resize_bg = 1 << max(fls(last_group - o_group), fls(n_group - last_group)); flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data),
On 2024/9/26 17:16, Aleksandr Mikhalitsyn wrote:
On Thu, Sep 26, 2024 at 10:29 AM Baokun Li libaokun1@huawei.com wrote:
On 2024/9/25 23:57, Jan Kara wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
Hi Honza,
Your analysis is absolutely correct. It's a bug! Thank you for locating this issue! An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
Hi Aleksandr,
Hi Baokun,
Could you help test if the following changes work?
I can confirm that this patch helps.
Tested-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
Kind regards, Alex
Thank you for the test!
Cheers, Baokun
Thanks, Baokun
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..1f01a7632149 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, /* Avoid allocating large 'groups' array if not needed */ last_group = o_group | (flex_gd->resize_bg - 1); if (n_group <= last_group)
flex_gd->resize_bg = 1 << fls(n_group - o_group + 1);
flex_gd->resize_bg = 1 << fls(n_group - o_group); else if (n_group - last_group < flex_gd->resize_bg)
flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1),
flex_gd->resize_bg = 1 << max(fls(last_group - o_group), fls(n_group - last_group)); flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data),
On 9/26/24 3:28 AM, Baokun Li wrote:
On 2024/9/25 23:57, Jan Kara wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
Hi Honza,
Your analysis is absolutely correct. It's a bug! Thank you for locating this issue! An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
Hi Aleksandr,
Could you help test if the following changes work?
I just got an internal bug report for this as well, and I can also confirm that the patch fixes my testcase, thanks! Feel free to add:
Tested-by: Eric Sandeen sandeen@redhat.com
I had been trying to debug this and it felt like an off by one but I didn't get to a solution in time. ;)
Can you explain what the 2 cases under
/* Avoid allocating large 'groups' array if not needed */
are doing? I *think* the first 'if' is trying not to over-allocate for the last batch of block groups that get added during a resize. What is the "else if" case doing?
Thanks, -Eric
Thanks, Baokun
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index e04eb08b9060..1f01a7632149 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -253,10 +253,12 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size, /* Avoid allocating large 'groups' array if not needed */ last_group = o_group | (flex_gd->resize_bg - 1); if (n_group <= last_group) - flex_gd->resize_bg = 1 << fls(n_group - o_group + 1); + flex_gd->resize_bg = 1 << fls(n_group - o_group); else if (n_group - last_group < flex_gd->resize_bg) - flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), + flex_gd->resize_bg = 1 << max(fls(last_group - o_group), fls(n_group - last_group));
flex_gd->groups = kmalloc_array(flex_gd->resize_bg, sizeof(struct ext4_new_group_data),
On 9/26/24 11:04 AM, Eric Sandeen wrote:
Can you explain what the 2 cases under
/* Avoid allocating large 'groups' array if not needed */
are doing? I *think* the first 'if' is trying not to over-allocate for the last batch of block groups that get added during a resize. What is the "else if" case doing?
(or maybe I had that backwards)
Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned into CVE-2023-52622, so it's quite likely that distros have backported the broken commit as part of the CVE game.
So the followup fix looks a bit urgent to me.
-Eric
On 2024/9/27 0:29, Eric Sandeen wrote:
On 9/26/24 11:04 AM, Eric Sandeen wrote:
Can you explain what the 2 cases under
/* Avoid allocating large 'groups' array if not needed */
are doing? I *think* the first 'if' is trying not to over-allocate for the last batch of block groups that get added during a resize. What is the "else if" case doing?
(or maybe I had that backwards)
Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned into CVE-2023-52622, so it's quite likely that distros have backported the broken commit as part of the CVE game.
The commit to fix CVE-2023-52622 is commit 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg"). This commit does not address the off by one issue above.
So the followup fix looks a bit urgent to me.
-Eric
Okay, I'll send out the fix patch today.
Regards, Baokun .
On 9/26/24 8:51 PM, Baokun Li wrote:
On 2024/9/27 0:29, Eric Sandeen wrote:
On 9/26/24 11:04 AM, Eric Sandeen wrote:
Can you explain what the 2 cases under
/* Avoid allocating large 'groups' array if not needed */
are doing? I *think* the first 'if' is trying not to over-allocate for the last batch of block groups that get added during a resize. What is the "else if" case doing?
(or maybe I had that backwards)
Incidentally, the offending commit that this fixes (665d3e0af4d35ac) got turned into CVE-2023-52622, so it's quite likely that distros have backported the broken commit as part of the CVE game.
The commit to fix CVE-2023-52622 is commit 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg").
I'm sorry - you're right. 665d3e0af4d35ac was part of the original series that included 5d1935ac02ca5a, but it was not the fix.
This commit does not address the off by one issue above.
Agreed.
So the followup fix looks a bit urgent to me.
-Eric
Okay, I'll send out the fix patch today.
thanks :)
-Eric
Regards, Baokun .
On 2024/9/27 0:04, Eric Sandeen wrote:
On 9/26/24 3:28 AM, Baokun Li wrote:
On 2024/9/25 23:57, Jan Kara wrote:
On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote:
[ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks [ 33.888740] ------------[ cut here ]------------ [ 33.888742] kernel BUG at fs/ext4/resize.c:324!
Ah, I was staring at this for a while before I understood what's going on (it would be great to explain this in the changelog BTW). As far as I understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size
- which then confuses things. I think that was not really intended and
instead of fixing up ext4_alloc_group_tables() we should really change the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds flexbg size. Baokun?
Honza
Hi Honza,
Your analysis is absolutely correct. It's a bug! Thank you for locating this issue! An extra 1 should not be added when calculating resize_bg in alloc_flex_gd().
Hi Aleksandr,
Could you help test if the following changes work?
I just got an internal bug report for this as well, and I can also confirm that the patch fixes my testcase, thanks! Feel free to add:
Tested-by: Eric Sandeen sandeen@redhat.com
I had been trying to debug this and it felt like an off by one but I didn't get to a solution in time. ;)
Can you explain what the 2 cases under
/* Avoid allocating large 'groups' array if not needed */
are doing? I *think* the first 'if' is trying not to over-allocate for the last batch of block groups that get added during a resize. What is the "else if" case doing?
Thanks, -Eric
The ext4 online resize started out by allocating the memory needed for an entire flexible block group regardless of the number of block groups to be added.
This led to a warning being triggered in the mm module when the userland set a very large flexbg_size (perhaps they wanted to put all the group metadata in the head of the disk), allocating more memory than the MAX_ORDER size. Commit 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized flex bg") fixes this problem with resize_bg.
Normally, when online resizing, we may not need to allocate the memory required for an entire flexible block group. So if o_group and n_group are in the same flexible block group or if o_group and n_group are in neighbouring flexible block groups, we check if less memory can be allocated. This is what was done in commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()").
Regards, Baokun
linux-stable-mirror@lists.linaro.org