On Wed, Dec 13, 2017 at 11:11:16AM +0100, Rabin Vincent wrote:
From: Rabin Vincent rabinv@axis.com
softirq time accounting is broken on v4.9.x if ksoftirqd runs.
With CONFIG_IRQ_TIME_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
this test code:
struct tasklet_struct tasklet;
static void delay_tasklet(unsigned long data) { udelay(10); tasklet_schedule(&tasklet); }
tasklet_init(&tasklet, delay_tasklet, 0); tasklet_schedule(&tasklet);
results in:
$ while :; do grep cpu0 /proc/stat; done cpu0 5 0 80 25 16 107 1 0 0 0 cpu0 5 0 80 25 16 107 0 0 0 0 cpu0 5 0 80 25 16 107 0 0 0 0 cpu0 5 0 80 25 16 107 0 0 0 0 cpu0 5 0 81 25 16 107 0 0 0 0 cpu0 5 0 81 25 16 107 1 0 0 0 cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0 cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0 cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0 cpu0 5 0 81 25 16 108 0 0 0 0 cpu0 6 0 81 25 16 108 0 0 0 0 cpu0 6 0 81 25 16 108 0 0 0 0
As can be seen, the softirq numbers are totally bogus.
When ksoftirq is running, irqtime_account_process_tick() increments cpustat[CPUSTAT_SOFTIRQ]. This causes the "nsecs_to_cputime64(irqtime)
- cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
underflow the next time a softirq is handled leading to the above values.
The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count actually elapsed irq & softirq time").
But ksoftirqd accounting was wrong even in earlier kernels. In earlier kernels, after a ksoftirq run, the kernel would simply stop accounting softirq time spent outside of ksoftirqd until that (accumulated) time exceeded the time for which ksofirqd previously had run.
Fix both the underflow and the wrong accounting by using a counter specifically for the non-ksoftirqd softirq case.
This code has been fixed in current mainline by a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account") [note also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime accounting regression")], but that patch is a part of the many changes for eliminating of cputime_t so it does not seem suitable for backport.
I _really_ only want to take the exact upstream patches, as every time we do something like what you are proposing to do here, we get it wrong. Seriously, our track record is horrible. Like 90% wrong.
Can you try just those two patches instead?
thanks,
greg k-h