On Mon, 4 Apr 2022 09:12:41 +0200 Thorsten Leemhuis regressions@leemhuis.info wrote:
Kernels 5.16.10 do not have the following regression, 5.16.11-16
5.16.11-16 sounds like this is a distro kernel that might or might not be patched. Or is 11-16 just meant as a range. Could you clarify?
Sorry, I meant the problem occurred on 5.16.11, .12 and .16.
do. My machine would freeze completely about once a week, no oops in the logs, sysrq won't work either. I managed to log only the following (and only once) with netconsole, while running kernel 5.16.16. I could not reproduce the problem since.
Hmmm. Of course ideally all regressions get fixed, but that beeing said: 5.16 will likely be EOL in round about two weeks anway and getting to the root of this problem might take some time and effort. That's why I'm not sure myself what's the best way forward here.
I'm aware of this, but given the nature of the problem and how difficult it is to reproduce, I thought it was better to report it. Meanwhile I'm now on 5.17.1: let's say this is on hold until someone has a similar problem with 5.17.x.
Maybe testing 5.17 to see if the problem still shows up would be good; bisection would help, but I guess that will be hard here. But I guess there is one thing that could help: could you maybe decode the panic you have as described in this document: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
Thanks, I tried but I'm not sure it's of any help:
---------- 0,1493,12767657117,-;traps: PANIC: double fault, error_code: 0x0 4,1494,12767657121,-;double fault: 0000 [#1] PREEMPT SMP NOPTI 4,1496,12767657126,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018 4,1497,12767657127,-;RIP: entry_SYSCALL_64+0x3/0x29 4,1498,12767657133,-;Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 01 f8 <65> 48 89 24 25 14 60 00 00 eb 12 0f 20 dc 0f 1f 44 00 00 48 81 e4 All code ======== 0: cc int3 1: cc int3 2: cc int3 3: cc int3 4: cc int3 5: cc int3 6: cc int3 7: cc int3 8: cc int3 9: cc int3 a: cc int3 b: cc int3 c: cc int3 d: cc int3 e: cc int3 f: cc int3 10: cc int3 11: cc int3 12: cc int3 13: cc int3 14: cc int3 15: cc int3 16: cc int3 17: cc int3 18: cc int3 19: cc int3 1a: cc int3 1b: cc int3 1c: cc int3 1d: cc int3 1e: cc int3 1f: cc int3 20: cc int3 21: cc int3 22: cc int3 23: cc int3 24: cc int3 25: cc int3 26: cc int3 27: 0f 01 f8 swapgs 2a:* 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014 <-- trapping instruction 31: 00 00 33: eb 12 jmp 0x47 35: 0f 20 dc mov %cr3,%rsp 38: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 3d: 48 rex.W 3e: 81 .byte 0x81 3f: e4 .byte 0xe4
Code starting with the faulting instruction =========================================== 0: 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014 7: 00 00 9: eb 12 jmp 0x1d b: 0f 20 dc mov %cr3,%rsp e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 13: 48 rex.W 14: 81 .byte 0x81 15: e4 .byte 0xe4 4,1499,12767657134,-;RSP: 0018:00007f2a8bcbd438 EFLAGS: 00010002 4,1500,12767657136,-;RAX: 00000000000000ca RBX: 000000000000005d RCX: 00007f2aa45e8aab 4,1501,12767657138,-;RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f2aa4400018 4,1502,12767657139,-;RBP: 00007f2aa4400018 R08: 0000000000000000 R09: 00007f2a8ed00000 4,1503,12767657140,-;R10: 0000000000000000 R11: 0000000000000282 R12: 00000000000000a8 4,1504,12767657141,-;R13: 0000000000000003 R14: 0000000000000030 R15: 00007f2aa4400000 4,1505,12767657142,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000 4,1506,12767657143,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 4,1507,12767657144,-;CR2: 00007f2a8bcbd428 CR3: 00000002953f2000 CR4: 00000000003506e0 4,1508,12767657146,-;Call Trace: 4,1509,12767657146,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1509,12767657146,-,ncfrag=966/986;re rapl k10temp wmi 4,1510,12767657189,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole] 4,1511,12767657207,-;------------[ cut here ]------------ 4,1512,12767657207,-;WARNING: CPU: 4 PID: 16786 at kernel/softirq.c:362 __local_bh_enable_ip+0x43/0x70 4,1513,12767657212,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1513,12767657212,-,ncfrag=966/986;re rapl k10temp wmi 4,1514,12767657248,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole] 4,1516,12767657254,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018 4,1517,12767657255,-;RIP: __local_bh_enable_ip+0x43/0x70 4,1518,12767657257,-;Code: 01 35 61 1d f3 7d 65 8b 05 5a 1d f3 7d a9 00 ff ff 00 74 1a bf 01 00 00 00 e8 99 b5 02 00 65 8b 05 42 1d f3 7d 85 c0 74 25 c3 <0f> 0b eb cc 48 c7 c7 d9 53 42 83 e8 4d ec a6 00 65 66 8b 05 25 19 All code ======== 0: 01 35 61 1d f3 7d add %esi,0x7df31d61(%rip) # 0x7df31d67 6: 65 8b 05 5a 1d f3 7d mov %gs:0x7df31d5a(%rip),%eax # 0x7df31d67 d: a9 00 ff ff 00 test $0xffff00,%eax 12: 74 1a je 0x2e 14: bf 01 00 00 00 mov $0x1,%edi 19: e8 99 b5 02 00 call 0x2b5b7 1e: 65 8b 05 42 1d f3 7d mov %gs:0x7df31d42(%rip),%eax # 0x7df31d67 25: 85 c0 test %eax,%eax 27: 74 25 je 0x4e 29: c3 ret 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb cc jmp 0xfffffffffffffffa 2e: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi 35: e8 4d ec a6 00 call 0xa6ec87 3a: 65 gs 3b: 66 data16 3c: 8b .byte 0x8b 3d: 05 .byte 0x5 3e: 25 .byte 0x25 3f: 19 .byte 0x19
Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: eb cc jmp 0xffffffffffffffd0 4: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi b: e8 4d ec a6 00 call 0xa6ec5d 10: 65 gs 11: 66 data16 12: 8b .byte 0x8b 13: 05 .byte 0x5 14: 25 .byte 0x25 15: 19 .byte 0x19 4,1519,12767657259,-;RSP: 0018:fffffe00000f69a0 EFLAGS: 00010006 4,1520,12767657260,-;RAX: 0000000080110203 RBX: ffff8b0e05bd2000 RCX: ffff8b0e05bd2000 4,1521,12767657261,-;RDX: ffff8b0e0ac28000 RSI: 0000000000000201 RDI: ffffffffc12f12c3 4,1522,12767657262,-;RBP: ffff8b0e0c977a30 R08: fffffe00000f69e8 R09: ffff8b0e0d085000 4,1523,12767657263,-;R10: ffff8b0e03234300 R11: 0000000000000fff R12: ffff8b0e0d0850d0 4,1524,12767657264,-;R13: fffffe00000f69e8 R14: ffff8b0e0ddfc980 R15: ffff8b0e0d085a58 4,1525,12767657265,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000 4,1526,12767657266,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 ----------
Thanks, Michele Ballabio