Hi!
If we hit an error during construction of the reloc chain, we need to replace the chain into the next batch with the terminator so that upon flushing the relocations so far, we do not execute a hanging batch.
Thanks for the patches. I assume this should fix problem from "5.9-rc1: graphics regression moved from -next to mainline" thread.
I have applied them over current -next, and my machine seems to be working so far (but uptime is less than 30 minutes).
If the machine still works tommorow, I'll assume problem is solved.
Aye, best wait until we have to start competing with Chromium for memory... The suspicion is that it was the resource allocation failure path.
Yep, my machines are low on memory.
But ... test did not work that well. I have dead X and blinking screen. Machine still works reasonably well over ssh, so I guess that's an improvement.
Best regards, Pavel
[ 5604.909393] ACPI: EC: event unblocked [ 5604.913590] usb usb2: root hub lost power or was reset [ 5604.913812] usb usb3: root hub lost power or was reset [ 5604.914046] usb usb4: root hub lost power or was reset [ 5604.918812] ata6: port disabled--ignoring [ 5604.925353] sd 0:0:0:0: [sda] Starting disk [ 5605.150042] thinkpad_acpi: ACPI backlight control delay disabled [ 5605.204955] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 5605.205931] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [ 5605.205941] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [ 5605.205949] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 5605.207748] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [ 5605.207757] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [ 5605.207765] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 5605.208227] ata1.00: configured for UDMA/133 [ 5605.281913] usb 5-2: reset full-speed USB device number 3 using uhci_hcd [ 5605.569752] usb 5-1: reset full-speed USB device number 2 using uhci_hcd [ 5609.082771] PM: resume devices took 4.192 seconds [ 5609.083380] OOM killer enabled. [ 5609.083387] Restarting tasks ... done. [ 5609.103164] video LNXVIDEO:00: Restoring backlight state [ 5609.150144] PM: suspend exit [ 5609.190535] sdhci-pci 0000:15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.239495] sdhci-pci 0000:15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.287144] sdhci-pci 0000:15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.344497] sdhci-pci 0000:15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5611.426855] wlan0: authenticate with 5c:f4:ab:10:d2:bb [ 5611.430609] wlan0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [ 5611.432552] wlan0: authenticated [ 5611.433705] wlan0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [ 5611.436440] wlan0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [ 5611.439083] wlan0: associated [ 7744.718473] BUG: unable to handle page fault for address: f8c00000 [ 7744.718484] #PF: supervisor write access in kernel mode [ 7744.718487] #PF: error_code(0x0002) - not-present page [ 7744.718491] *pdpt = 0000000031b0b001 *pde = 0000000000000000 [ 7744.718500] Oops: 0002 [#1] PREEMPT SMP PTI [ 7744.718506] CPU: 0 PID: 3004 Comm: Xorg Not tainted 5.9.0-rc1-next-20200819+ #134 [ 7744.718509] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 7744.718518] EIP: eb_relocate_vma+0xdbf/0xf20 [ 7744.718523] Code: 48 74 8b 41 08 89 41 0c 8b 85 a4 fd ff ff 89 95 a0 fd ff ff e8 c2 12 6c 00 8b 95 a0 fd ff ff e9 03 fc ff ff 8b 85 d0 fd ff ff <c7> 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 4a f6 ff [ 7744.718527] EAX: 01397010 EBX: f8c00000 ECX: 01247000 EDX: 00000000 [ 7744.718531] ESI: f519cd80 EDI: f1ac1cd4 EBP: f1ac1c6c ESP: f1ac1a04 [ 7744.718535] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 7744.718539] CR0: 80050033 CR2: f8c00000 CR3: 31ac2000 CR4: 000006b0 [ 7744.718543] Call Trace: [ 7744.718553] ? shmem_read_mapping_page_gfp+0x32/0x70 [ 7744.718560] ? eb_lookup_vmas+0x272/0x9f0 [ 7744.718565] i915_gem_do_execbuffer+0xa7b/0x2730 [ 7744.718573] ? intel_runtime_pm_put_unchecked+0xd/0x10 [ 7744.718578] ? i915_gem_gtt_pwrite_fast+0xf6/0x520 [ 7744.718586] ? __lock_acquire.isra.0+0x223/0x500 [ 7744.718592] ? cache_alloc_debugcheck_after+0x151/0x180 [ 7744.718596] ? kvmalloc_node+0x69/0x80 [ 7744.718600] ? __kmalloc+0x92/0x120 [ 7744.718604] ? kvmalloc_node+0x69/0x80 [ 7744.718608] i915_gem_execbuffer2_ioctl+0xdd/0x350 [ 7744.718613] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 7744.718619] drm_ioctl_kernel+0x91/0xe0 [ 7744.718623] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 7744.718627] drm_ioctl+0x1fd/0x371 [ 7744.718631] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 7744.718639] ? posix_get_monotonic_timespec+0x1d/0x80 [ 7744.718645] ? __sys_recvmsg+0x37/0x80 [ 7744.718649] ? drm_ioctl_kernel+0xe0/0xe0 [ 7744.718654] __ia32_sys_ioctl+0x14b/0x7c6 [ 7744.718661] ? exit_to_user_mode_prepare+0x53/0x100 [ 7744.718667] do_int80_syscall_32+0x2c/0x40 [ 7744.718674] entry_INT80_32+0x111/0x111 [ 7744.718678] EIP: 0xb7fd3092 [ 7744.718683] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 [ 7744.718687] EAX: ffffffda EBX: 0000000a ECX: c0406469 EDX: bfe67abc [ 7744.718691] ESI: b73c1000 EDI: c0406469 EBP: 0000000a ESP: bfe67a34 [ 7744.718695] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200292 [ 7744.718700] ? asm_exc_nmi+0xcc/0x2bc [ 7744.718703] Modules linked in: [ 7744.718709] CR2: 00000000f8c00000 [ 7744.718714] ---[ end trace 121f748dd4d0d6ec ]--- [ 7744.718719] EIP: eb_relocate_vma+0xdbf/0xf20 [ 7744.718723] Code: 48 74 8b 41 08 89 41 0c 8b 85 a4 fd ff ff 89 95 a0 fd ff ff e8 c2 12 6c 00 8b 95 a0 fd ff ff e9 03 fc ff ff 8b 85 d0 fd ff ff <c7> 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 4a f6 ff [ 7744.718727] EAX: 01397010 EBX: f8c00000 ECX: 01247000 EDX: 00000000 [ 7744.718731] ESI: f519cd80 EDI: f1ac1cd4 EBP: f1ac1c6c ESP: f1ac1a04 [ 7744.718735] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 7744.718739] CR0: 80050033 CR2: f8c00000 CR3: 31ac2000 CR4: 000006b0 [ 7744.723687] BUG: unable to handle page fault for address: f8c02038 [ 7744.723695] #PF: supervisor write access in kernel mode [ 7744.723699] #PF: error_code(0x0002) - not-present page [ 7744.723702] *pdpt = 0000000031866001 *pde = 0000000000000000 [ 7744.723711] Oops: 0002 [#2] PREEMPT SMP PTI [ 7744.723717] CPU: 1 PID: 3004 Comm: Xorg Tainted: G D 5.9.0-rc1-next-20200819+ #134 [ 7744.723720] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 7744.723728] EIP: n_tty_open+0x26/0x80 [ 7744.723733] Code: 00 00 00 90 55 89 e5 56 53 89 c3 b8 f0 22 00 00 e8 4f 39 cb ff 85 c0 74 62 89 c6 a1 00 2d 27 c5 b9 e8 2a 77 c5 ba 85 83 12 c5 <89> 46 38 8d 86 58 22 00 00 e8 8c 12 c0 ff 8d 86 a4 22 00 00 b9 e0 [ 7744.723738] EAX: 001c65c0 EBX: f2339000 ECX: c5772ae8 EDX: c5128385 [ 7744.723741] ESI: f8c02000 EDI: 00000000 EBP: f1ac1ee4 ESP: f1ac1edc [ 7744.723745] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210286 [ 7744.723751] CR0: 80050033 CR2: f8c02038 CR3: 31864000 CR4: 000006b0 [ 7744.723755] Call Trace: [ 7744.723763] tty_ldisc_open.isra.0+0x23/0x40 [ 7744.723768] tty_ldisc_reinit+0x99/0xe0 [ 7744.723772] tty_ldisc_hangup+0xc4/0x1e0 [ 7744.723776] __tty_hangup.part.0+0x13f/0x250 [ 7744.723781] tty_vhangup_session+0x11/0x20 [ 7744.723786] disassociate_ctty.part.0+0x34/0x230 [ 7744.723790] disassociate_ctty+0x28/0x30 [ 7744.723797] do_exit+0x456/0x960 [ 7744.723803] ? exit_to_user_mode_prepare+0x53/0x100 [ 7744.723808] rewind_stack_do_exit+0x11/0x13 [ 7744.723812] EIP: 0xb7fd3092 [ 7744.723815] Code: Bad RIP value. [ 7744.723819] EAX: ffffffda EBX: 0000000a ECX: c0406469 EDX: bfe67abc [ 7744.723823] ESI: b73c1000 EDI: c0406469 EBP: 0000000a ESP: bfe67a34 [ 7744.723827] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200292 [ 7744.723837] ? asm_exc_nmi+0xcc/0x2bc [ 7744.723839] Modules linked in: [ 7744.723845] CR2: 00000000f8c02038 [ 7744.723851] ---[ end trace 121f748dd4d0d6ed ]--- [ 7744.723857] EIP: eb_relocate_vma+0xdbf/0xf20 [ 7744.723861] Code: 48 74 8b 41 08 89 41 0c 8b 85 a4 fd ff ff 89 95 a0 fd ff ff e8 c2 12 6c 00 8b 95 a0 fd ff ff e9 03 fc ff ff 8b 85 d0 fd ff ff <c7> 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 4a f6 ff [ 7744.723865] EAX: 01397010 EBX: f8c00000 ECX: 01247000 EDX: 00000000 [ 7744.723869] ESI: f519cd80 EDI: f1ac1cd4 EBP: f1ac1c6c ESP: f1ac1a04 [ 7744.723873] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 7744.723877] CR0: 80050033 CR2: f8c02038 CR3: 31864000 CR4: 000006b0 [ 7744.723880] Fixing recursive fault but reboot is needed! [ 7749.589011] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000 [ 7749.589024] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 7749.589030] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new. [ 7749.589036] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. [ 7749.589041] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 7749.589047] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 7749.589053] GPU crash dump saved to /sys/class/drm/card0/error [ 7749.909841] i915 0000:00:02.0: [drm] Resetting chip for no heartbeat on rcs0 [ 7756.504232] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000 [ 7756.817879] i915 0000:00:02.0: [drm] Resetting chip for no heartbeat on rcs0 [ 7763.672921] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000 [ 7763.985882] i915 0000:00:02.0: [drm] Resetting chip for no heartbeat on rcs0 [ 7770.580999] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000 [ 7770.897884] i915 0000:00:02.0: [drm] Resetting chip for no heartbeat on rcs0 [ 7777.497036] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000 [ 7777.825882] i915 0000:00:02.0: [drm] Resetting chip for no heartbeat on rcs0 [ 7784.664999] i915 0000:00:02.0: [drm] GPU HANG: ecode 3:0:00000000