On 15/01/2019 11:43, Thomas Gleixner wrote:
On Mon, 14 Jan 2019, Juergen Gross wrote:
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
I see that it's broken, but the changelog could do with an explanation WHY it broke.
This seems to be rather complex.
I believe the mentioned commit just ignored Xen guests resulting in a "stable" clock where it shouldn't, but maybe I have missed another aspect of this commit which is to blame. I tried to fix that by replacing using_native_sched_clock() with a hypervisor specific pvops function.
Unfortunately this didn't work, maybe due to other uses of using_native_sched_clock() added by later patches. In the end it was quite clear that updating the Xen clock offset was the right thing to do, so I ended up with this patch.
Juergen