Thanks to Joe Jin's reminding, this patch is applicable to mainline linux kernel, although there is no issue due to this kind of bug in mainline kernel.
Therefore, can I first submit this patch to mainline kernel and then backport it to stable linux with more detailed explanation how the issue is reproduced on xen?
This would help synchronize stable with mainline better.
Thank you very much!
Dongli Zhang
On 3/5/19 3:59 PM, Dongli Zhang wrote:
[ Not relevant upstream, therefore no upstream commit. ]
To fix, use jiffies64_to_nsecs() directly instead of deriving the result according to jiffies_to_usecs().
As the return type of jiffies_to_usecs() is 'unsigned int', when the return value is more than the size of 'unsigned int', the leading 32 bits would be discarded.
Suppose USEC_PER_SEC=1000000L and HZ=1000, below are the expected and actual incorrect result of jiffies_to_usecs(0x7770ef70):
- expected : jiffies_to_usecs(0x7770ef70) = 0x000001d291274d80
- incorrect : jiffies_to_usecs(0x7770ef70) = 0x0000000091274d80
The leading 0x000001d200000000 is discarded.
After xen vcpu hotplug and when the new vcpu steal clock is calculated for the first time, the result of this_rq()->prev_steal_time in steal_account_process_tick() would be far smaller than the expected value, due to that jiffies_to_usecs() discards the leading 32 bits.
As a result, the diff between current steal and this_rq()->prev_steal_time is always very large. Steal usage would become 100% when the initial steal clock obtained from xen hypervisor is very large during xen vcpu hotplug, that is, when the guest is already up for a long time.
The bug can be detected by doing the following:
- Boot xen guest with vcpus=2 and maxvcpus=4
- Leave the guest running for a month so that the initial steal clock for the new vcpu would be very large
- Hotplug 2 extra vcpus
- The steal time of new vcpus in /proc/stat would increase abnormally and sometimes steal usage in top can become 100%
This was incidentally fixed in the patch set starting by commit 93825f2ec736 ("jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY") and ended with commit b672592f0221 ("sched/cputime: Remove generic asm headers").
This version applies to the v4.9 series.
Link: https://lkml.org/lkml/2019/2/28/1373 Suggested-by: Juergen Gross jgross@suse.com Signed-off-by: Dongli Zhang dongli.zhang@oracle.com
include/linux/jiffies.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 734377a..94aff43 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -287,13 +287,13 @@ extern unsigned long preset_lpj; extern unsigned int jiffies_to_msecs(const unsigned long j); extern unsigned int jiffies_to_usecs(const unsigned long j); +extern u64 jiffies64_to_nsecs(u64 j);
static inline u64 jiffies_to_nsecs(const unsigned long j) {
- return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
- return jiffies64_to_nsecs(j);
} -extern u64 jiffies64_to_nsecs(u64 j);
extern unsigned long __msecs_to_jiffies(const unsigned int m); #if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ) /*