On Tue, Apr 7, 2020 at 10:28 AM Linus Torvalds torvalds@linux-foundation.org wrote:
There may be something really subtle going on, but it really smells like "two threads are modifying the same list at the same time".
And there's no way that the READ_ONCE() will fix that bug, it will only make KASAN shut up about it.
[ And by KASAN I obviously meant KCSAN every time ]
An alternative is that it's just a KCSAN bug or false positive for some other reason. Which is possible. But the more I look at this, the more I think you really have a bug in your list handling.
I'm just not convinced the whole "we have a race where we randomly add a tail object" in another thread is a valid reason for making those "safe" list accessors use READ_ONCE.
The thing is, they were never designed to be safe wrt concurrent accesses. They are safe from the _same_ thread removing the current entry, not from other threads changing the entries - whether it be the last one or not.
Because if _another_ thread removes (or adds) an entry, you have a whole new set of issues. READ_ONCE() isn't sufficient. You need to have memory ordering guarantees etc.
For example, the thread that adds another entry that might - or might not - be visible without locking, would have to fully initialize the entry, and set the ->next pointer on it, before it adds it to the list.
I suspect you could use the RCU list walkers for a use-case like this. So __list_add_rcu() for example uses the proper rcu_assign_pointer() (which uses smp_store_release()) to make sure that the newly added entry is actually _ordered_ wrt the stores to the entry.
But the "safe" list functions simply do not have those kinds of ordering guarantees, and doing concurrent list_add() simply *CANNOT* be right. Adding a READ_ONCE() does not in any way make it right (although it might work in practice on x86).
Linus