Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for xen

11 Jan 2019


      On 1/11/19 4:57 PM, Hans van Kranenburg wrote:
...
On 1/11/19 3:01 PM, Juergen Gross wrote:
...
On 11/01/2019 14:12, Hans van Kranenburg wrote:
...
Hi,
On 1/11/19 1:08 PM, Juergen Gross wrote:
...
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
sched_clock() interface") broke Xen guest time handling across
migration:
[  187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  187.251137] OOM killer disabled.
[  187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  187.252299] suspending xenstore...
[  187.266987] xen:grant_table: Grant tables using version 1 layout
[18446743811.706476] OOM killer enabled.
[18446743811.706478] Restarting tasks ... done.
[18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a
monotonic clock value.
[...]
I'm throwing around a PV domU over a bunch of test servers with live
migrate now, and in between the kernel logging, I'm seeing this:
[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
0.002 seconds) done.
[Fri Jan 11 13:58:42 2019] OOM killer disabled.
[Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[Fri Jan 11 13:58:42 2019] suspending xenstore...
[Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
[Fri Jan 11 13:58:42 2019] Current state: 1
[Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
[...]
This always happens on every *first* live migrate that I do after
starting the domU.
Yeah, its a WARN_ONCE().
Ok, false alarm. It's there, but not caused by this patch.
I changed the WARN_ONCE to WARN for funs, and now I get it a lot more
already (v2):
https://paste.debian.net/plainh/d535a379
...
...
And you didn't see it with v1 of the patch?
No.
I was wrong. I tried a bit more, and I can also reproduce without v1 or
v2 patch at all, and I can reproduce it with v4.19.9. Just sometimes
needs a dozen times more live migrating it before it shows up.
I cannot make it happen to show up with the Debian 4.19.9 distro kernel,
that's why it was new for me.
So, let's ignore it in this thread now.
...
...
On the first glance this might be another bug just being exposed by
my patch.
I'm investigating further, but this might take some time. Could you
meanwhile verify the same happens with kernel 5.0-rc1? That was the
one I tested with and I didn't spot that WARN.
I have Linux 5.0-rc1 with v2 on top now, which gives me this on live
migrate:
[...]
[   51.871076] BUG: unable to handle kernel NULL pointer dereference at
0000000000000098
[   51.871091] #PF error: [normal kernel read fault]
[   51.871100] PGD 0 P4D 0
[   51.871109] Oops: 0000 [#1] SMP NOPTI
[   51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1
[   51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270
[...]
Dunno about all the 5.0-rc1 crashes yet. I can provide more feedback
about that if you want, but not in here I presume.
Hans

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for xen