Quoting Pavel Machek (2020-08-19 20:33:26)
Hi!
If we hit an error during construction of the reloc chain, we need to replace the chain into the next batch with the terminator so that upon flushing the relocations so far, we do not execute a hanging batch.
Thanks for the patches. I assume this should fix problem from "5.9-rc1: graphics regression moved from -next to mainline" thread.
I have applied them over current -next, and my machine seems to be working so far (but uptime is less than 30 minutes).
If the machine still works tommorow, I'll assume problem is solved.
Aye, best wait until we have to start competing with Chromium for memory... The suspicion is that it was the resource allocation failure path.
Yep, my machines are low on memory.
But ... test did not work that well. I have dead X and blinking screen. Machine still works reasonably well over ssh, so I guess that's an improvement.
[ 7744.718473] BUG: unable to handle page fault for address: f8c00000 [ 7744.718484] #PF: supervisor write access in kernel mode [ 7744.718487] #PF: error_code(0x0002) - not-present page [ 7744.718491] *pdpt = 0000000031b0b001 *pde = 0000000000000000 [ 7744.718500] Oops: 0002 [#1] PREEMPT SMP PTI [ 7744.718506] CPU: 0 PID: 3004 Comm: Xorg Not tainted 5.9.0-rc1-next-20200819+ #134 [ 7744.718509] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 7744.718518] EIP: eb_relocate_vma+0xdbf/0xf20
To save me guessing, paste the above location into ./scripts/decode_stacktrace.sh ./vmlinux . ./drivers/gpu/drm/i915
The f8c00000 is something running off the end of a kmap, but I didn't spot a path were we would ignore an error and keep on writing. Nevertheless it must exist. -Chris