Hi all,
We started seeing panic during boot cycle on 6.12 upstream kernel.
Data points: * This is reproducible on 6.12.9 * Also reproducible on 6.13 from yesterday. * Not reproducible on 6.11
So I looked at commits between 6.11-> 6.12 , and narrowed it down to a patch series which made changed to md-bitmap.c
https://lore.kernel.org/all/20240826074452.1490072-1-yukuai1@huaweicloud.com...
After narrowing down further: it is narrowed down to this commit
ec6bb299c7c3 md/md-bitmap: add 'sync_size' into struct md_bitmap_stats
#regzbot introduced: ec6bb299c7c3
Also, the panic points to the middle line below:
sb = kmap_local_page(bitmap->storage.sb_page); * stats->sync_size = le64_to_cpu(sb->sync_size); kunmap_local(sb);
Call trace is as follows:
[ 21.427462] Oops: general protection fault, probably for non-canonical address 0x8730d3f80000028: 0000 [#1] PREEMPT SMP NOPTI [ 21.440104] CPU: 56 UID: 0 PID: 1531 Comm: mdadm Not tainted 6.13.0-master.20250121.ol8.x86_64 #1 [ 21.450019] Hardware name: Oracle Corporation ORACLE SERVER X9-2L/ASM,MTHRBD,2U, BIOS 62110100 07/15/2024 [ 21.460710] RIP: 0010:bitmap_get_stats+0x2b/0xa0 [ 21.465872] Code: 0f 1e fa 0f 1f 44 00 00 48 89 f2 48 85 ff 74 7d 48 8b 4f 50 48 2b 0d dc 9f e5 00 48 8b 35 e5 9f e5 00 48 c1 f9 06 48 c1 e1 0c <48> 8b 4c 31 28 48 89 4a 20 48 8b 4f 18 48 89 4a 10 48 8b 4f 10 48 [ 21.486849] RSP: 0018:ff3e5f658fc3fb18 EFLAGS: 00010206 [ 21.492690] RAX: ffffffff8d17d660 RBX: ff27d0600af69690 RCX: 094b3d0000000000 [ 21.500663] RDX: ff3e5f658fc3fb28 RSI: ff27d03f80000000 RDI: ff27d06008cd9c00 [ 21.507233] mlx5_core 0000:b1:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 21.508629] RBP: ff27d0604a737418 R08: 0000000000000000 R09: 0000000000000000 [ 21.508631] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000012c2000 [ 21.508631] R13: ff27d0604a737018 R14: ff27d0604a737000 R15: ff27d0604a737018 [ 21.508632] FS: 00007f61a01c98c0(0000) GS:ff27d07f7f600000(0000) knlGS:0000000000000000 [ 21.508634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.508635] CR2: 000056503c28f458 CR3: 00000020c000c004 CR4: 0000000000771ef0 [ 21.518772] mlx5_core 0000:b1:00.0: E-Switch: Total vports 27, per vport: max uc(128) max mc(2048) [ 21.526600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 21.526601] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 21.526602] PKRU: 55555554 [ 21.526603] Call Trace: [ 21.526604] <TASK> [ 21.535111] mlx5_core 0000:b1:00.0: Flow counters bulk query buffer size increased, bulk_query_len(8) [ 21.542533] ? show_trace_log_lvl+0x1b0/0x300 [ 21.542537] ? show_trace_log_lvl+0x1b0/0x300 [ 21.556126] mlx5_core 0000:b1:00.0: mlx5_pcie_event:301:(pid 529): PCIe slot advertised sufficient power (27W). [ 21.557983] ? md_seq_show+0x2d2/0x5b0 [ 21.557988] ? __die_body.cold+0x8/0x12 [ 21.641128] ? die_addr+0x3c/0x60 [ 21.645080] ? exc_general_protection+0x17d/0x400 [ 21.650574] ? asm_exc_general_protection+0x26/0x30 [ 21.656267] ? __pfx_bitmap_get_stats+0x10/0x10 [ 21.661568] ? bitmap_get_stats+0x2b/0xa0 [ 21.666277] md_seq_show+0x2d2/0x5b0 [ 21.670507] seq_read_iter+0x2b9/0x470 [ 21.674924] seq_read+0x12f/0x180 [ 21.678853] proc_reg_read+0x57/0xb0 [ 21.683074] vfs_read+0xf6/0x380 [ 21.686902] ? __seccomp_filter+0x30b/0x520 [ 21.691786] ksys_read+0x6c/0xf0 [ 21.695607] do_syscall_64+0x82/0x170 [ 21.699909] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0 [ 21.706637] ? syscall_exit_to_user_mode+0x37/0x1a0 [ 21.712295] ? __memcg_slab_free_hook+0xf7/0x160 [ 21.717660] ? __x64_sys_close+0x3c/0x80 [ 21.722248] ? kmem_cache_free+0x400/0x460 [ 21.727028] ? syscall_exit_to_user_mode_prepare+0x174/0x1b0 [ 21.733553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0 [ 21.740270] ? syscall_exit_to_user_mode+0x37/0x1a0 [ 21.745913] ? do_syscall_64+0x8e/0x170 [ 21.750388] ? do_syscall_64+0x8e/0x170 [ 21.754857] ? clear_bhb_loop+0x45/0xa0 [ 21.759318] ? clear_bhb_loop+0x45/0xa0 [ 21.763772] ? clear_bhb_loop+0x45/0xa0 [ 21.768218] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 21.774014] RIP: 0033:0x7f619f862585 [ 21.778170] Code: fe ff ff 50 48 8d 3d 52 a8 06 00 e8 e5 08 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 d5 71 2a 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 [ 21.799471] RSP: 002b:00007ffe50c2d3c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 21.808099] RAX: ffffffffffffffda RBX: 000056503c2802a0 RCX: 00007f619f862585 [ 21.816240] RDX: 0000000000000400 RSI: 000056503c28d000 RDI: 0000000000000004 [ 21.824382] RBP: 0000000000000d68 R08: 0000000000000008 R09: 0000000000000001 [ 21.832518] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f619fb00860 [ 21.840654] R13: 00007f619fb013a0 R14: 000056503c280a50 R15: 000056503c281480 [ 21.848789] </TASK> [ 21.851389] Modules linked in: raid1 mgag200 drm_client_lib drm_shmem_helper drm_kms_helper sd_mod sg raid0 mlx5_core(+) ahci libahci drm crct10dif_pclmul ghash_clmulni_intel mlxfw sha512_ssse3 igb nvme sha256_ssse3 libata tls sha1_ssse3 megaraid_sas nvme_core pci_hyperv_intf psample dca nvme_auth i2c_algo_bit nfit(+) libnvdimm aesni_intel gf128mul crypto_simd cryptd [ 21.888253] ---[ end trace 0000000000000000 ]--- [ 22.452319] RIP: 0010:bitmap_get_stats+0x2b/0xa0 [ 22.457699] Code: 0f 1e fa 0f 1f 44 00 00 48 89 f2 48 85 ff 74 7d 48 8b 4f 50 48 2b 0d dc 9f e5 00 48 8b 35 e5 9f e5 00 48 c1 f9 06 48 c1 e1 0c <48> 8b 4c 31 28 48 89 4a 20 48 8b 4f 18 48 89 4a 10 48 8b 4f 10 48 [ 22.479037] RSP: 0018:ff3e5f658fc3fb18 EFLAGS: 00010206 [ 22.485067] RAX: ffffffff8d17d660 RBX: ff27d0600af69690 RCX: 094b3d0000000000 [ 22.493217] RDX: ff3e5f658fc3fb28 RSI: ff27d03f80000000 RDI: ff27d06008cd9c00 [ 22.501372] RBP: ff27d0604a737418 R08: 0000000000000000 R09: 0000000000000000 [ 22.509527] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000012c2000 [ 22.517686] R13: ff27d0604a737018 R14: ff27d0604a737000 R15: ff27d0604a737018 [ 22.525845] FS: 00007f61a01c98c0(0000) GS:ff27d07f7f600000(0000) knlGS:0000000000000000 [ 22.535089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.541701] CR2: 000056503c28f458 CR3: 00000020c000c004 CR4: 0000000000771ef0 [ 22.549866] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 22.558040] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 22.566202] PKRU: 55555554 [ 22.569425] Kernel panic - not syncing: Fatal exception [ 22.576477] Kernel Offset: 0xb600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 22.654941] Rebooting in 60 seconds..
I would be happy to try any patches.
Thanks, Harshit