Quoting Linus Torvalds (2020-04-07 16:44:35)
On Mon, Apr 6, 2020 at 8:10 PM Andrew Morton akpm@linux-foundation.org wrote:
From: Chris Wilson chris@chris-wilson.co.uk Subject: lib/list: prevent compiler reloads inside 'safe' list iteration
Instruct the compiler to read the next element in the list iteration once, and that it is not allowed to reload the value from the stale element later. This is important as during the course of the safe iteration, the stale element may be poisoned (unbeknownst to the compiler).
Andrew, Chris, this one looks rather questionable to me.
How the heck would the ->next pointer be changed without the compiler being aware of it? That implies a bug to begin with - possibly an inline asm that changes kernel memory without having a memory clobber.
Quite fundamentally, the READ_ONCE() doesn't seem to fix anything. If something else can change the list _concurrently_, it's still completely broken, and hiding the KASAN report is just hiding a bug.
What and where was the actual KASAN issue? The commit log doesn't say...
It'll take some time to reconstruct the original report, but the case in question was in removing the last element of the list of the last list, switch to a global lock over all such lists to park the HW, which in doing so added one more element to the original list. [If we happen to be retiring along the kernel timeline in the first place.]
list->next changed from pointing to list_head, to point to the new element instead. However, we don't have to check the next element yet and want to terminate the list iteration.
For reference, drivers/gpu/drm/i915/gt/intel_engine_pm.c::__engine_park()
Global activity is serialised by engine->wakeref.mutex; every active timeline is required to hold an engine wakeref, but retiring is local to timelines and serialised by their own timeline->mutex.
lock(&timeline->lock) list_for_each_safe(&timeline->requests) -> i915_request_retire [list_del(&timeline->requests)] -> intel_timeline_exit -> lock(&engine->wakeref.mutex) engine_park [list_add_tail(&engine->kernel_context->timeline->requests)]
-Chris