On 8/11/22 16:22, Jason Gunthorpe wrote:
On Thu, Aug 11, 2022 at 10:28:27PM +0100, Matthew Wilcox wrote:
On Thu, Aug 11, 2022 at 01:43:09PM -0700, Linus Torvalds wrote:
May I suggest going one step further, and making these WARN_ON_ONCE() instead.
From personal experience, once some scheduler bug (or task struct
corruption) happens, ti often *keeps* happening, and the logs just fill up with more and more data, to the point where you lose sight of the original report (and the machine can even get unusable just from the logging).
I've been thinking about magically turning all the WARN_ON_ONCE() into (effectively) WARN_ON_RATELIMIT(). I had some patches in that direction a while ago but never got round to tidying them up for submission.
If you do that, I'd like to suggest that you avoid using magic here, but instead just rename at the call sites.
Because:
First and foremost, something named WARN_ON_ONCE() clearly has a solemn responsibility to warn exactly "once times"! :)
Second, it's not yet clear (or is it?) that WARN_ON_ONCE() is always worse than rate limiting. It's a trade-off, rather than a clear win for either case, in my experience. The _ONCE variant can get overwritten if the kernel log wraps, but the _RATELIMIT on the other hand, may be excessive.
And finally, if it *is* agreed on here that WARN_ON_RATELIMIT() is always better than WARN_ON_ONCE(), then there is still no harm in spending a patch or two (coccinelle...) to rename WARN_ON_ONCE() --> WARN_ON_RATELIMIT(), so that we end up with accurate names.
I often wonder if we have a justification for WARN_ON to even exist, I see a lot of pressure to make things into WARN_ON_ONCE based on the logic that spamming makes it useless..
Agreed. WARN_ON_ONCE() or WARN_ON_RATELIMIT(), take your pick. But not WARN_ON_EVERY_TIME()--that usually causes a serious problems in the logs.
thanks,