From: Chris Wilson chris@chris-wilson.co.uk Subject: lib/list: prevent compiler reloads inside 'safe' list iteration
Instruct the compiler to read the next element in the list iteration once, and that it is not allowed to reload the value from the stale element later. This is important as during the course of the safe iteration, the stale element may be poisoned (unbeknownst to the compiler).
This helps prevent kcsan warnings over 'unsafe' conduct in releasing the list elements during list_for_each_entry_safe() and friends.
Link: http://lkml.kernel.org/r/20200310092119.14965-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Paul E. McKenney paulmck@kernel.org Cc: Randy Dunlap rdunlap@infradead.org Cc: David Laight David.Laight@ACULAB.COM Cc: Mark Rutland mark.rutland@arm.com Cc: Marco Elver elver@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/list.h | 50 +++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-)
--- a/include/linux/list.h~list-prevent-compiler-reloads-inside-safe-list-iteration +++ a/include/linux/list.h @@ -537,6 +537,17 @@ static inline void list_splice_tail_init list_entry((pos)->member.next, typeof(*(pos)), member)
/** + * list_next_entry_safe - get the next element in list [once] + * @pos: the type * to cursor + * @member: the name of the list_head within the struct. + * + * Like list_next_entry() but prevents the compiler from reloading the + * next element. + */ +#define list_next_entry_safe(pos, member) \ + list_entry(READ_ONCE((pos)->member.next), typeof(*(pos)), member) + +/** * list_prev_entry - get the prev element in list * @pos: the type * to cursor * @member: the name of the list_head within the struct. @@ -545,6 +556,17 @@ static inline void list_splice_tail_init list_entry((pos)->member.prev, typeof(*(pos)), member)
/** + * list_prev_entry_safe - get the prev element in list [once] + * @pos: the type * to cursor + * @member: the name of the list_head within the struct. + * + * Like list_prev_entry() but prevents the compiler from reloading the + * previous element. + */ +#define list_prev_entry_safe(pos, member) \ + list_entry(READ_ONCE((pos)->member.prev), typeof(*(pos)), member) + +/** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. * @head: the head for your list. @@ -686,9 +708,9 @@ static inline void list_splice_tail_init */ #define list_for_each_entry_safe(pos, n, head, member) \ for (pos = list_first_entry(head, typeof(*pos), member), \ - n = list_next_entry(pos, member); \ + n = list_next_entry_safe(pos, member); \ &pos->member != (head); \ - pos = n, n = list_next_entry(n, member)) + pos = n, n = list_next_entry_safe(n, member))
/** * list_for_each_entry_safe_continue - continue list iteration safe against removal @@ -700,11 +722,11 @@ static inline void list_splice_tail_init * Iterate over list of given type, continuing after current point, * safe against removal of list entry. */ -#define list_for_each_entry_safe_continue(pos, n, head, member) \ - for (pos = list_next_entry(pos, member), \ - n = list_next_entry(pos, member); \ - &pos->member != (head); \ - pos = n, n = list_next_entry(n, member)) +#define list_for_each_entry_safe_continue(pos, n, head, member) \ + for (pos = list_next_entry(pos, member), \ + n = list_next_entry_safe(pos, member); \ + &pos->member != (head); \ + pos = n, n = list_next_entry_safe(n, member))
/** * list_for_each_entry_safe_from - iterate over list from current point safe against removal @@ -716,10 +738,10 @@ static inline void list_splice_tail_init * Iterate over list of given type from current point, safe against * removal of list entry. */ -#define list_for_each_entry_safe_from(pos, n, head, member) \ - for (n = list_next_entry(pos, member); \ - &pos->member != (head); \ - pos = n, n = list_next_entry(n, member)) +#define list_for_each_entry_safe_from(pos, n, head, member) \ + for (n = list_next_entry_safe(pos, member); \ + &pos->member != (head); \ + pos = n, n = list_next_entry_safe(n, member))
/** * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal @@ -733,9 +755,9 @@ static inline void list_splice_tail_init */ #define list_for_each_entry_safe_reverse(pos, n, head, member) \ for (pos = list_last_entry(head, typeof(*pos), member), \ - n = list_prev_entry(pos, member); \ + n = list_prev_entry_safe(pos, member); \ &pos->member != (head); \ - pos = n, n = list_prev_entry(n, member)) + pos = n, n = list_prev_entry_safe(n, member))
/** * list_safe_reset_next - reset a stale list_for_each_entry_safe loop @@ -750,7 +772,7 @@ static inline void list_splice_tail_init * completing the current iteration of the loop body. */ #define list_safe_reset_next(pos, n, member) \ - n = list_next_entry(pos, member) + n = list_next_entry_safe(pos, member)
/* * Double linked lists with a single pointer list head. _
On Mon, Apr 6, 2020 at 8:10 PM Andrew Morton akpm@linux-foundation.org wrote:
From: Chris Wilson chris@chris-wilson.co.uk Subject: lib/list: prevent compiler reloads inside 'safe' list iteration
Instruct the compiler to read the next element in the list iteration once, and that it is not allowed to reload the value from the stale element later. This is important as during the course of the safe iteration, the stale element may be poisoned (unbeknownst to the compiler).
Andrew, Chris, this one looks rather questionable to me.
How the heck would the ->next pointer be changed without the compiler being aware of it? That implies a bug to begin with - possibly an inline asm that changes kernel memory without having a memory clobber.
Quite fundamentally, the READ_ONCE() doesn't seem to fix anything. If something else can change the list _concurrently_, it's still completely broken, and hiding the KASAN report is just hiding a bug.
What and where was the actual KASAN issue? The commit log doesn't say...
Linus
Quoting Linus Torvalds (2020-04-07 16:44:35)
On Mon, Apr 6, 2020 at 8:10 PM Andrew Morton akpm@linux-foundation.org wrote:
From: Chris Wilson chris@chris-wilson.co.uk Subject: lib/list: prevent compiler reloads inside 'safe' list iteration
Instruct the compiler to read the next element in the list iteration once, and that it is not allowed to reload the value from the stale element later. This is important as during the course of the safe iteration, the stale element may be poisoned (unbeknownst to the compiler).
Andrew, Chris, this one looks rather questionable to me.
How the heck would the ->next pointer be changed without the compiler being aware of it? That implies a bug to begin with - possibly an inline asm that changes kernel memory without having a memory clobber.
Quite fundamentally, the READ_ONCE() doesn't seem to fix anything. If something else can change the list _concurrently_, it's still completely broken, and hiding the KASAN report is just hiding a bug.
What and where was the actual KASAN issue? The commit log doesn't say...
It'll take some time to reconstruct the original report, but the case in question was in removing the last element of the list of the last list, switch to a global lock over all such lists to park the HW, which in doing so added one more element to the original list. [If we happen to be retiring along the kernel timeline in the first place.]
list->next changed from pointing to list_head, to point to the new element instead. However, we don't have to check the next element yet and want to terminate the list iteration.
For reference, drivers/gpu/drm/i915/gt/intel_engine_pm.c::__engine_park()
Global activity is serialised by engine->wakeref.mutex; every active timeline is required to hold an engine wakeref, but retiring is local to timelines and serialised by their own timeline->mutex.
lock(&timeline->lock) list_for_each_safe(&timeline->requests) -> i915_request_retire [list_del(&timeline->requests)] -> intel_timeline_exit -> lock(&engine->wakeref.mutex) engine_park [list_add_tail(&engine->kernel_context->timeline->requests)]
-Chris
On Tue, Apr 7, 2020 at 9:04 AM Chris Wilson chris@chris-wilson.co.uk wrote:
It'll take some time to reconstruct the original report, but the case in question was in removing the last element of the list of the last list, switch to a global lock over all such lists to park the HW, which in doing so added one more element to the original list. [If we happen to be retiring along the kernel timeline in the first place.]
Please point to the real code and the list.
Honestly, what you describe sounds complex enough that I think your locking is simply just buggy.
IOW, this patch seems to really just paper over a locking bug, and the KASAN report tied to it.
Because the fundamental issue is that READ_ONCE() can not fix a bug here. Reading the next pointer once fundamentally cannot matter if it can change concurrently: the code is buggy, and the READ_ONCE() just means that it gets one or the other value randomly, and that the list walking is fundamentally racy.
One the other hand, if the next pointer _cannot_ change concurrently, then READ_ONCE() cannot make a difference.
So as fat as I can tell, we have two possibilities, and in both cases changing the code to use READ_ONCE() is not the right thing to do. In one case it hides a bug, and in another case it's just pointless.
list->next changed from pointing to list_head, to point to the new element instead. However, we don't have to check the next element yet and want to terminate the list iteration.
I'd really like to see the actual code that has that list walking. You say:
For reference, drivers/gpu/drm/i915/gt/intel_engine_pm.c::__engine_park()
.. but that function doesn't have any locking or list-walking. Is it the "call_idle_barriers()" loop? What is it?
I'd really like to see the KASAN report and the discussion about this change.
And if there was no discussion, then the patch just seems like "I changed code randomly and the KASAN report went away, so it's all good".
Global activity is serialised by engine->wakeref.mutex; every active timeline is required to hold an engine wakeref, but retiring is local to timelines and serialised by their own timeline->mutex.
lock(&timeline->lock) list_for_each_safe(&timeline->requests) -> i915_request_retire [list_del(&timeline->requests)] -> intel_timeline_exit -> lock(&engine->wakeref.mutex) engine_park [list_add_tail(&engine->kernel_context->timeline->requests)]
in that particular list_for_each_safe() thing, there's no possibility that the 'next' field would be reloaded, since the list_del() in the above will be somethign the compiler is aware of.
So yes, the beginning list_for_each_safe() might load it twice (or a hundred times), but by the time that list_del() in i915_request_retire() has been called, if the compiler then reloads it afterwards, that would be a major compiler bug, since it's after that value could have been written in the local thread.
So this doesn't explain it to me.
What it *sounds* like is that the "engine" lock that you do *not* hold initially, is not protecting some accessor to that list, so you have a race on the list at the time of that list_del().
And that race may be what KASAN is reporting, and what that patch is _hiding_ from KASAN - but not fixing.
See what I am saying and why I find this patch questionable?
There may be something really subtle going on, but it really smells like "two threads are modifying the same list at the same time".
And there's no way that the READ_ONCE() will fix that bug, it will only make KASAN shut up about it.
Linus
On Tue, Apr 7, 2020 at 10:28 AM Linus Torvalds torvalds@linux-foundation.org wrote:
There may be something really subtle going on, but it really smells like "two threads are modifying the same list at the same time".
And there's no way that the READ_ONCE() will fix that bug, it will only make KASAN shut up about it.
[ And by KASAN I obviously meant KCSAN every time ]
An alternative is that it's just a KCSAN bug or false positive for some other reason. Which is possible. But the more I look at this, the more I think you really have a bug in your list handling.
I'm just not convinced the whole "we have a race where we randomly add a tail object" in another thread is a valid reason for making those "safe" list accessors use READ_ONCE.
The thing is, they were never designed to be safe wrt concurrent accesses. They are safe from the _same_ thread removing the current entry, not from other threads changing the entries - whether it be the last one or not.
Because if _another_ thread removes (or adds) an entry, you have a whole new set of issues. READ_ONCE() isn't sufficient. You need to have memory ordering guarantees etc.
For example, the thread that adds another entry that might - or might not - be visible without locking, would have to fully initialize the entry, and set the ->next pointer on it, before it adds it to the list.
I suspect you could use the RCU list walkers for a use-case like this. So __list_add_rcu() for example uses the proper rcu_assign_pointer() (which uses smp_store_release()) to make sure that the newly added entry is actually _ordered_ wrt the stores to the entry.
But the "safe" list functions simply do not have those kinds of ordering guarantees, and doing concurrent list_add() simply *CANNOT* be right. Adding a READ_ONCE() does not in any way make it right (although it might work in practice on x86).
Linus
Quoting Linus Torvalds (2020-04-07 18:28:34)
On Tue, Apr 7, 2020 at 9:04 AM Chris Wilson chris@chris-wilson.co.uk wrote:
It'll take some time to reconstruct the original report, but the case in question was in removing the last element of the list of the last list, switch to a global lock over all such lists to park the HW, which in doing so added one more element to the original list. [If we happen to be retiring along the kernel timeline in the first place.]
Please point to the real code and the list.
Honestly, what you describe sounds complex enough that I think your locking is simply just buggy.
IOW, this patch seems to really just paper over a locking bug, and the KASAN report tied to it.
Because the fundamental issue is that READ_ONCE() can not fix a bug here. Reading the next pointer once fundamentally cannot matter if it can change concurrently: the code is buggy, and the READ_ONCE() just means that it gets one or the other value randomly, and that the list walking is fundamentally racy.
One the other hand, if the next pointer _cannot_ change concurrently, then READ_ONCE() cannot make a difference.
So as fat as I can tell, we have two possibilities, and in both cases changing the code to use READ_ONCE() is not the right thing to do. In one case it hides a bug, and in another case it's just pointless.
list->next changed from pointing to list_head, to point to the new element instead. However, we don't have to check the next element yet and want to terminate the list iteration.
I'd really like to see the actual code that has that list walking. You say:
For reference, drivers/gpu/drm/i915/gt/intel_engine_pm.c::__engine_park()
.. but that function doesn't have any locking or list-walking. Is it the "call_idle_barriers()" loop? What is it?
list_for_each_safe @ intel_gt_requests.c::retire_requests list_del @ i915_requests.c::i915_request_retire list_add @ i915_requests.c::__i915_request_create
I'd really like to see the KASAN report and the discussion about this change.
And if there was no discussion, then the patch just seems like "I changed code randomly and the KASAN report went away, so it's all good".
Global activity is serialised by engine->wakeref.mutex; every active timeline is required to hold an engine wakeref, but retiring is local to timelines and serialised by their own timeline->mutex.
lock(&timeline->lock) list_for_each_safe(&timeline->requests) -> i915_request_retire [list_del(&timeline->requests)] -> intel_timeline_exit -> lock(&engine->wakeref.mutex) engine_park [list_add_tail(&engine->kernel_context->timeline->requests)]
in that particular list_for_each_safe() thing, there's no possibility that the 'next' field would be reloaded, since the list_del() in the above will be somethign the compiler is aware of.
So yes, the beginning list_for_each_safe() might load it twice (or a hundred times), but by the time that list_del() in i915_request_retire() has been called, if the compiler then reloads it afterwards, that would be a major compiler bug, since it's after that value could have been written in the local thread.
My understanding of how KCSAN works is that it installs a watchpoint after a read and complains if the value changes within X us.
So this doesn't explain it to me.
What it *sounds* like is that the "engine" lock that you do *not* hold initially, is not protecting some accessor to that list, so you have a race on the list at the time of that list_del().
We take an engine wakeref prior to creating a request, and release it upon retiring the request [through the respective context/timelines, across all the engines that might be used for that context]. When the engine wakeref hits zero, that mutex is taken to prevent any other new request being created, and under that mutex while we know that the engine->kernel_context->timeline is empty, we submit a request to switch the GPU to point into our scratch context.
That submission can run concurrently to the list iteration, but only _after_ the final list_del.
And that race may be what KASAN is reporting, and what that patch is _hiding_ from KASAN - but not fixing.
Yes, it was sent as a means to silence KCSAN with a query as to whether or not it could happen in practice with an aggressive compiler.
See what I am saying and why I find this patch questionable?
There may be something really subtle going on, but it really smells like "two threads are modifying the same list at the same time".
In strict succession.
And there's no way that the READ_ONCE() will fix that bug, it will only make KASAN shut up about it.
There's some more shutting up required for KCSAN to bring the noise down to usable levels which I hope has been done so I don't have to argue for it, such as
diff --git a/include/linux/timer.h b/include/linux/timer.h index 1e6650ed066d..c7c8dd89f279 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { } */ static inline int timer_pending(const struct timer_list * timer) { - return timer->entry.pprev != NULL; + return READ_ONCE(timer->entry.pprev) != NULL; }
extern void add_timer_on(struct timer_list *timer, int cpu); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 5352ce50a97e..7461b3f33629 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -565,8 +565,9 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner, /* * Use vcpu_is_preempted to detect lock holder preemption issue. */ - if (!owner->on_cpu || need_resched() || - vcpu_is_preempted(task_cpu(owner))) { + if (!READ_ONCE(owner->on_cpu) || + need_resched() || + vcpu_is_preempted(task_cpu(owner))) { ret = false; break; } @@ -602,7 +603,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) * on cpu or its cpu is preempted */ if (owner) - retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + retval = READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); rcu_read_unlock();
/* diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 1f7734949ac8..4a81fba4cf70 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock, * wait for either @lock to point to us, through its Step-B, or * wait for a new @node->next from its Step-C. */ - if (node->next) { + if (READ_ONCE(node->next)) { next = xchg(&node->next, NULL); if (next) break; @@ -154,7 +154,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) */
for (;;) { - if (prev->next == node && + if (READ_ONCE(prev->next) == node && cmpxchg(&prev->next, node, NULL) == node) break;
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 0d9b6be9ecc8..eef4835cecf2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -650,7 +650,7 @@ static inline bool owner_on_cpu(struct task_struct *owner) * As lock holder preemption issue, we both skip spinning if * task is not on cpu or its cpu is preempted */ - return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner)); + return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); }
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
On Tue, Apr 7, 2020 at 11:04 AM Chris Wilson chris@chris-wilson.co.uk wrote:
That submission can run concurrently to the list iteration, but only _after_ the final list_del.
There is no such thing, Chris.
Not without locks and memory ordering.
The "list_del()" is already ordered on the CPU it happens.
And _there_ it's already ordered with the list_for_each_entry_safe() by the compiler.
There may be something really subtle going on, but it really smells like "two threads are modifying the same list at the same time".
In strict succession.
See above.
There is no such thing as strict succession across two CPU's unless you have memory barriers, locks, or things like release/acquire operations.
So a "strict succession from list_del()" only makes sense on the _local_ CPU. Not across threads.
You may be relying on some very subtle consistency guarantee that is true on x86. For example, x86 does guarantee "causality".
Not everybody else does that.
There's some more shutting up required for KCSAN to bring the noise down to usable levels which I hope has been done so I don't have to argue for it, such as
Stop using KCSAN if you can't deal with the problems it introduces.
I do NOT want to see bogus patches in the kernel that are introduced by bad infrastructure.
That READ_ONCE() is _wrong_.
File a bug with the KCSAN people, don't send garbage upstream.
Linus
On Tue, Apr 7, 2020 at 1:04 PM Linus Torvalds torvalds@linux-foundation.org wrote:
You may be relying on some very subtle consistency guarantee that is true on x86. For example, x86 does guarantee "causality".
Not everybody else does that.
It's worth noting that maybe for the i915 driver it makes sense to just assume x86 memory ordering.
But even if that's the case, then it doesn't make sense to change list_prev_entry_safe() anywhere else.
Linus
On Tue, Apr 07, 2020 at 07:04:10PM +0100, Chris Wilson wrote:
Quoting Linus Torvalds (2020-04-07 18:28:34)
On Tue, Apr 7, 2020 at 9:04 AM Chris Wilson chris@chris-wilson.co.uk wrote:
[ . . . ]
There's some more shutting up required for KCSAN to bring the noise down to usable levels which I hope has been done so I don't have to argue for it, such as
diff --git a/include/linux/timer.h b/include/linux/timer.h index 1e6650ed066d..c7c8dd89f279 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { } */ static inline int timer_pending(const struct timer_list * timer) {
- return timer->entry.pprev != NULL;
- return READ_ONCE(timer->entry.pprev) != NULL;
This one is in mainline, courtesy of Eric Dumazet, though in a different form.
The rest are still TBD.
Thanx, Paul
}
extern void add_timer_on(struct timer_list *timer, int cpu); diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 5352ce50a97e..7461b3f33629 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -565,8 +565,9 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner, /* * Use vcpu_is_preempted to detect lock holder preemption issue. */
if (!owner->on_cpu || need_resched() ||
vcpu_is_preempted(task_cpu(owner))) {
if (!READ_ONCE(owner->on_cpu) ||
need_resched() ||
}vcpu_is_preempted(task_cpu(owner))) { ret = false; break;
@@ -602,7 +603,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) * on cpu or its cpu is preempted */ if (owner)
retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
retval = READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner));
rcu_read_unlock();
/*
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 1f7734949ac8..4a81fba4cf70 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock, * wait for either @lock to point to us, through its Step-B, or * wait for a new @node->next from its Step-C. */
if (node->next) {
if (READ_ONCE(node->next)) { next = xchg(&node->next, NULL); if (next) break;
@@ -154,7 +154,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) */
for (;;) {
if (prev->next == node &&
cmpxchg(&prev->next, node, NULL) == node) break;if (READ_ONCE(prev->next) == node &&
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 0d9b6be9ecc8..eef4835cecf2 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -650,7 +650,7 @@ static inline bool owner_on_cpu(struct task_struct *owner) * As lock holder preemption issue, we both skip spinning if * task is not on cpu or its cpu is preempted */
- return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
- return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner));
}
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
On Tue, Apr 07, 2020 at 08:44:35AM -0700, Linus Torvalds wrote:
On Mon, Apr 6, 2020 at 8:10 PM Andrew Morton akpm@linux-foundation.org wrote:
From: Chris Wilson chris@chris-wilson.co.uk Subject: lib/list: prevent compiler reloads inside 'safe' list iteration
Instruct the compiler to read the next element in the list iteration once, and that it is not allowed to reload the value from the stale element later. This is important as during the course of the safe iteration, the stale element may be poisoned (unbeknownst to the compiler).
Andrew, Chris, this one looks rather questionable to me.
How the heck would the ->next pointer be changed without the compiler being aware of it? That implies a bug to begin with - possibly an inline asm that changes kernel memory without having a memory clobber.
Quite fundamentally, the READ_ONCE() doesn't seem to fix anything. If something else can change the list _concurrently_, it's still completely broken, and hiding the KASAN report is just hiding a bug.
What and where was the actual KASAN issue? The commit log doesn't say...
OK, it sounds like we need to specify what needs to be present in this sort of commit log. Please accept my apologies for this specification not already being in place.
How about the following?
o The KCSAN splat, optionally including file/line numbers.
o Any non-default Kconfig options that are required to reproduce the problem, along with any other repeat-by information.
o Detailed description of the problem this splat identifies, for example, the code might fail to acquire a necessary lock, a plain load might vulnerable to compiler optimizations, and so on.
o If available, references to or excerpts from the comments and documentation defining the design rules that the old code violates.
o If the commit's effect is to silence the warning with no other algorithmic change, an explanation as to why this is the right thing to do. Continuing the plain-load example above, that load might be controlling a loop and the compiler might choose to hoist the load out of the loop, potentially resulting in an infinite loop.
Another example might be diagnostic code where the accesses are not really part of the concurrency design, but where we need KCSAN checking the other code that implements that design.
Thoughts? Additions? Subtractions?
Thanx, Paul
On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney paulmck@kernel.org wrote:
OK, it sounds like we need to specify what needs to be present in this sort of commit log.
That would have helped.
But in this case, after having looked more at it, it wasn't that the commit log was not sufficient, it was that the change was wrong.
With a better commit log and a pointer to the KCSAN report, I would have been able to make a better judgment call earlier, and not have had to ask for more information.
But once I figured out what the problem was, it was clear that
(a) what the i915 driver does is at a minimum questionable, and quite likely actively buggy
(b) even if it's not buggy in the i915 driver, changing the list traversal macros to use READ_ONCE() would hide other places where it most definitely is buggy.
o The KCSAN splat, optionally including file/line numbers.
I do think that the KCSAN splat - very preferably simplified and with explanations - should basically always be included if KCSAN was the reason.
Of course, the "simplified and with explanations" may be simplified to the point where none of the original splat even remains. Those splats aren't so legible that anybody should feel like they should be included.
Put another way: an analysis that is so thorough that it makes the raw splat data pointless is a *good* thing, and if that means that no splat is needed, all the better.
o Detailed description of the problem this splat identifies, for example, the code might fail to acquire a necessary lock, a plain load might vulnerable to compiler optimizations, and so on.
See above: if the description is detailed enough, then the splat itself becomes much less interesting.
But in this case, for example, it's worth pointing out that the "safe" list accessors are used all over the kernel, and they are very much _not_ thread-safe. The "safe" in them is purely about the current thread removing an entry, not some kind of general thread-safeness.
Which is why I decided I really hated that patch - it basically would make KCSAN unable to find _real_ races elsewhere, because it hid a very special race in the i915 driver.
So I started out with the "that can't be right" kind of general feeling of uneasiness about the patch. I ended up with a much more specific "no, that's very much not right" and no amount of commit log should have saved it.
As mentioned, I suspect that the i915 driver could actually use the RCU-safe list walkers. Their particular use is not about RCU per se, but it has some very similar rules about walking a list concurrently with somebody adding an entry to it.
And the RCU list walkers do end up doing special things to load the next pointer.
Whether we want to say - and document - that "ok, the RCU list walkers are also useful for this non-RCU use case" in general might be worth discussing.
It may be that the i915 use case is so special that it's only worth documenting there.
Linus
On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote:
On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney paulmck@kernel.org wrote:
OK, it sounds like we need to specify what needs to be present in this sort of commit log.
That would have helped.
But in this case, after having looked more at it, it wasn't that the commit log was not sufficient, it was that the change was wrong.
With a better commit log and a pointer to the KCSAN report, I would have been able to make a better judgment call earlier, and not have had to ask for more information.
But once I figured out what the problem was, it was clear that
(a) what the i915 driver does is at a minimum questionable, and quite likely actively buggy
(b) even if it's not buggy in the i915 driver, changing the list traversal macros to use READ_ONCE() would hide other places where it most definitely is buggy.
Yeah, and I should especially have spotted this last one, given how many times I have run into it while using KCSAN within RCU. ;-/
In any case, we have review comments, discussion, and the occasional controvery for non-KCSAN bug fixes, so it is only reasonable to expect a similar process for KCSAN bug reports, right? My main goal here is to make things easier for people reviewing future KCSAN-inspired fixes.
o The KCSAN splat, optionally including file/line numbers.
I do think that the KCSAN splat - very preferably simplified and with explanations - should basically always be included if KCSAN was the reason.
Of course, the "simplified and with explanations" may be simplified to the point where none of the original splat even remains. Those splats aren't so legible that anybody should feel like they should be included.
Put another way: an analysis that is so thorough that it makes the raw splat data pointless is a *good* thing, and if that means that no splat is needed, all the better.
Fair point. The KCSAN splat has the added advantage of immediately answering the "but does this really happen?" question, but then again that question could also be answered by just stating that the issue was found using KCSAN.
o Detailed description of the problem this splat identifies, for example, the code might fail to acquire a necessary lock, a plain load might vulnerable to compiler optimizations, and so on.
See above: if the description is detailed enough, then the splat itself becomes much less interesting.
But in this case, for example, it's worth pointing out that the "safe" list accessors are used all over the kernel, and they are very much _not_ thread-safe. The "safe" in them is purely about the current thread removing an entry, not some kind of general thread-safeness.
Which is why I decided I really hated that patch - it basically would make KCSAN unable to find _real_ races elsewhere, because it hid a very special race in the i915 driver.
So I started out with the "that can't be right" kind of general feeling of uneasiness about the patch. I ended up with a much more specific "no, that's very much not right" and no amount of commit log should have saved it.
As mentioned, I suspect that the i915 driver could actually use the RCU-safe list walkers. Their particular use is not about RCU per se, but it has some very similar rules about walking a list concurrently with somebody adding an entry to it.
And the RCU list walkers do end up doing special things to load the next pointer.
Whether we want to say - and document - that "ok, the RCU list walkers are also useful for this non-RCU use case" in general might be worth discussing.
It may be that the i915 use case is so special that it's only worth documenting there.
If documenting is the right approach, KCSAN's data_race() could be thought of as KCSAN-visible documentation.
I will touch base with Chris separately.
Thanx, Paul
On Wed, Apr 08, 2020 at 03:37:57PM -0700, Paul E. McKenney wrote:
On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote:
On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney paulmck@kernel.org wrote:
OK, it sounds like we need to specify what needs to be present in this sort of commit log.
That would have helped.
But in this case, after having looked more at it, it wasn't that the commit log was not sufficient, it was that the change was wrong.
With a better commit log and a pointer to the KCSAN report, I would have been able to make a better judgment call earlier, and not have had to ask for more information.
But once I figured out what the problem was, it was clear that
(a) what the i915 driver does is at a minimum questionable, and quite likely actively buggy
(b) even if it's not buggy in the i915 driver, changing the list traversal macros to use READ_ONCE() would hide other places where it most definitely is buggy.
Yeah, and I should especially have spotted this last one, given how many times I have run into it while using KCSAN within RCU. ;-/
In any case, we have review comments, discussion, and the occasional controvery for non-KCSAN bug fixes, so it is only reasonable to expect a similar process for KCSAN bug reports, right? My main goal here is to make things easier for people reviewing future KCSAN-inspired fixes.
o The KCSAN splat, optionally including file/line numbers.
I do think that the KCSAN splat - very preferably simplified and with explanations - should basically always be included if KCSAN was the reason.
Of course, the "simplified and with explanations" may be simplified to the point where none of the original splat even remains. Those splats aren't so legible that anybody should feel like they should be included.
Put another way: an analysis that is so thorough that it makes the raw splat data pointless is a *good* thing, and if that means that no splat is needed, all the better.
Fair point. The KCSAN splat has the added advantage of immediately answering the "but does this really happen?" question, but then again that question could also be answered by just stating that the issue was found using KCSAN.
o Detailed description of the problem this splat identifies, for example, the code might fail to acquire a necessary lock, a plain load might vulnerable to compiler optimizations, and so on.
See above: if the description is detailed enough, then the splat itself becomes much less interesting.
But in this case, for example, it's worth pointing out that the "safe" list accessors are used all over the kernel, and they are very much _not_ thread-safe. The "safe" in them is purely about the current thread removing an entry, not some kind of general thread-safeness.
Which is why I decided I really hated that patch - it basically would make KCSAN unable to find _real_ races elsewhere, because it hid a very special race in the i915 driver.
So I started out with the "that can't be right" kind of general feeling of uneasiness about the patch. I ended up with a much more specific "no, that's very much not right" and no amount of commit log should have saved it.
As mentioned, I suspect that the i915 driver could actually use the RCU-safe list walkers. Their particular use is not about RCU per se, but it has some very similar rules about walking a list concurrently with somebody adding an entry to it.
And the RCU list walkers do end up doing special things to load the next pointer.
Whether we want to say - and document - that "ok, the RCU list walkers are also useful for this non-RCU use case" in general might be worth discussing.
This might work quite well, given the lockdep annotations that Joel added to list_for_each_entry_rcu(). For example, lockdep_assert_held() would tell it that it didn't need to be in an RCU reader as long as the lock passed to lockdep_assert_held() was held at that point.
It may be that the i915 use case is so special that it's only worth documenting there.
If documenting is the right approach, KCSAN's data_race() could be thought of as KCSAN-visible documentation.
I will touch base with Chris separately.
Of these options, what looks best to you? Or would something else work better?
Thanx, Paul
On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote:
(a) what the i915 driver does is at a minimum questionable, and quite likely actively buggy
By the way, this doesn't even seem like the only place in i915 where locking and object lifetime semantics are... special. I found a combined deadlock and use-after-free last week, inside of interwoven recursive call chains, and posted a patch here [1]. Unfortunately, it's been ignored thus far. But the interesting thing is how the bizarre object lifetime semantics there required even more weird things to enact a proper fix. That patch really should be merged (Chris?), but moreover, it'd be nice to see a whole release cycle devoted to just cleaning some of this stuff up. Showstopper bugs and hangs in i915 have really multiplied in recent times.
Sultan
[1] https://lore.kernel.org/lkml/20200407064007.7599-1-sultan@kerneltoast.com
linux-stable-mirror@lists.linaro.org