On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney paulmck@kernel.org wrote:
OK, it sounds like we need to specify what needs to be present in this sort of commit log.
That would have helped.
But in this case, after having looked more at it, it wasn't that the commit log was not sufficient, it was that the change was wrong.
With a better commit log and a pointer to the KCSAN report, I would have been able to make a better judgment call earlier, and not have had to ask for more information.
But once I figured out what the problem was, it was clear that
(a) what the i915 driver does is at a minimum questionable, and quite likely actively buggy
(b) even if it's not buggy in the i915 driver, changing the list traversal macros to use READ_ONCE() would hide other places where it most definitely is buggy.
o The KCSAN splat, optionally including file/line numbers.
I do think that the KCSAN splat - very preferably simplified and with explanations - should basically always be included if KCSAN was the reason.
Of course, the "simplified and with explanations" may be simplified to the point where none of the original splat even remains. Those splats aren't so legible that anybody should feel like they should be included.
Put another way: an analysis that is so thorough that it makes the raw splat data pointless is a *good* thing, and if that means that no splat is needed, all the better.
o Detailed description of the problem this splat identifies, for example, the code might fail to acquire a necessary lock, a plain load might vulnerable to compiler optimizations, and so on.
See above: if the description is detailed enough, then the splat itself becomes much less interesting.
But in this case, for example, it's worth pointing out that the "safe" list accessors are used all over the kernel, and they are very much _not_ thread-safe. The "safe" in them is purely about the current thread removing an entry, not some kind of general thread-safeness.
Which is why I decided I really hated that patch - it basically would make KCSAN unable to find _real_ races elsewhere, because it hid a very special race in the i915 driver.
So I started out with the "that can't be right" kind of general feeling of uneasiness about the patch. I ended up with a much more specific "no, that's very much not right" and no amount of commit log should have saved it.
As mentioned, I suspect that the i915 driver could actually use the RCU-safe list walkers. Their particular use is not about RCU per se, but it has some very similar rules about walking a list concurrently with somebody adding an entry to it.
And the RCU list walkers do end up doing special things to load the next pointer.
Whether we want to say - and document - that "ok, the RCU list walkers are also useful for this non-RCU use case" in general might be worth discussing.
It may be that the i915 use case is so special that it's only worth documenting there.
Linus