On Wed, Sep 15, 2021 at 11:41:42AM -0700, Linus Torvalds wrote:
On Wed, Sep 15, 2021 at 11:31 AM Frederic Weisbecker frederic@kernel.org wrote:
Right, this should fix the issue: https://lore.kernel.org/lkml/20210913145332.232023-1-frederic@kernel.org/
Hmm.
Can you explain why the fix isn't just to revert that original commit?
It looks like the only real difference is that now it does *extra work* with all that tick_nohz_dep_set_signal().
Isn't it easier to just leave any old timer ticking, and not do the extra work until it expires and you notice "ok, it's not important"?
IOW, that original commit explicitly broke the only case it changed - the timer being disabled. So why isn't it just reverted? What is it that kleeps us wanting to do the extra work for the disabled timer case?
As long as it's fixed, I'm all ok with this, but I'm looking at the commit message for that broken commit, and I'm looking at the commit message for the fix, and I'm not seeing an actual _explanation_ for this churn.
The commit indeed failed to explain correctly the actual issue.
When a process wide posix cpu timer (eg: itimer) is elapsing, all the threads inside that process contend on their cputime updates (account_group_user_time() and account_group_system_time())
The overhead just consists in concurrent atomic64_add() calls on every tick but still... And this can remain for a very long while, until the previous value of the timer expiry is reached.
The other symptom, more of a corner case for most, is that the CPUs running any thread of that process won't be able to enter in nohz_full mode, again until the old timer expiry is reached.