The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement on gen9.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: stable@vger.kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 0be5e8683337..c0cc2a972a11 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -53,8 +53,9 @@ int i915_gem_stolen_insert_node(struct drm_i915_private *i915, struct drm_mm_node *node, u64 size, unsigned alignment) { - return i915_gem_stolen_insert_node_in_range(i915, node, size, - alignment, 0, U64_MAX); + return i915_gem_stolen_insert_node_in_range(i915, node, + size, alignment, + SZ_128K, U64_MAX); }
void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
Quoting Chris Wilson (2020-10-07 16:17:18)
The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement on gen9.
Probably should double it to 256K, just to be on the safe side? -Chris
linux-stable-mirror@lists.linaro.org