Quoting Pavel Machek (2020-08-19 20:33:26)
Hi!
If we hit an error during construction of the reloc chain, we need to replace the chain into the next batch with the terminator so that upon flushing the relocations so far, we do not execute a hanging batch.
Thanks for the patches. I assume this should fix problem from "5.9-rc1: graphics regression moved from -next to mainline" thread.
I have applied them over current -next, and my machine seems to be working so far (but uptime is less than 30 minutes).
If the machine still works tommorow, I'll assume problem is solved.
Aye, best wait until we have to start competing with Chromium for memory... The suspicion is that it was the resource allocation failure path.
Yep, my machines are low on memory.
But ... test did not work that well. I have dead X and blinking screen. Machine still works reasonably well over ssh, so I guess that's an improvement.
Well my last remaining 32bit gen3 device is currently pushing up the daises, so could you try removing the attempt to use WC? Something like
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 44df98d85b38..b26f7de913c3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -955,10 +955,7 @@ static u32 *__reloc_gpu_map(struct reloc_cache *cache, { u32 *map;
- map = i915_gem_object_pin_map(pool->obj, - cache->has_llc ? - I915_MAP_FORCE_WB : - I915_MAP_FORCE_WC); + map = i915_gem_object_pin_map(pool->obj, I915_MAP_FORCE_WB);
on top of the previous patch. Faultinjection didn't turn up anything in eb_relocate_vma, so we need to dig deeper. -Chris