Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value.
Fixes: f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") Cc: stable@vger.kernel.org # 4.11 Reported-by: Hans van Kranenburg hans@knorrie.org Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/xen/time.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 72bf446c3fee..3c1dfdf26284 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -379,12 +379,15 @@ static const struct pv_time_ops xen_time_ops __initconst = { };
static struct pvclock_vsyscall_time_info *xen_clock __read_mostly; +static u64 xen_clock_value_saved;
void xen_save_time_memory_area(void) { struct vcpu_register_time_memory_area t; int ret;
+ xen_clock_value_saved = xen_clocksource_read() - xen_sched_clock_offset; + if (!xen_clock) return;
@@ -404,7 +407,7 @@ void xen_restore_time_memory_area(void) int ret;
if (!xen_clock) - return; + goto out;
t.addr.v = &xen_clock->pvti;
@@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void) if (ret != 0) pr_notice("Cannot restore secondary vcpu_time_info (err %d)", ret); + +out: + /* Need pvclock_resume() before using xen_clocksource_read(). */ + pvclock_resume(); + xen_sched_clock_offset = xen_clocksource_read() - xen_clock_value_saved; }
static void xen_setup_vsyscall_time_info(void)
Hi,
On 1/11/19 1:08 PM, Juergen Gross wrote:
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value.
[...]
I'm throwing around a PV domU over a bunch of test servers with live migrate now, and in between the kernel logging, I'm seeing this:
[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed 0.002 seconds) done. [Fri Jan 11 13:58:42 2019] OOM killer disabled. [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [Fri Jan 11 13:58:42 2019] suspending xenstore... [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------ [Fri Jan 11 13:58:42 2019] Current state: 1 [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Modules linked in: [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.14+ #1 [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01 e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0 74 69 48 89 [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082 [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80 RCX: 0000000000000006 [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff88805df963f0 [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000119 [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d R12: ffff88805df9ca40 [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000 R15: 0000000000000000 [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000 CR4: 0000000000002660 [Fri Jan 11 13:58:42 2019] Call Trace: [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70 [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100 [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20 [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0 [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0 [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260 [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70 [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]--- [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1 layout [Fri Jan 11 13:58:42 2019] OOM killer enabled. [Fri Jan 11 13:58:42 2019] Restarting tasks ... done. [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456 [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
This always happens on every *first* live migrate that I do after starting the domU.
OTOH, With v1 I've indeed also observed the 'leftover' time jump on the first line when resuming. It didn't happen always, but it seems to not happen anymore at all now, so that's good.
Hans
On 11/01/2019 14:12, Hans van Kranenburg wrote:
Hi,
On 1/11/19 1:08 PM, Juergen Gross wrote:
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value.
[...]
I'm throwing around a PV domU over a bunch of test servers with live migrate now, and in between the kernel logging, I'm seeing this:
[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed 0.002 seconds) done. [Fri Jan 11 13:58:42 2019] OOM killer disabled. [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [Fri Jan 11 13:58:42 2019] suspending xenstore... [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------ [Fri Jan 11 13:58:42 2019] Current state: 1 [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Modules linked in: [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.14+ #1 [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01 e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0 74 69 48 89 [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082 [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80 RCX: 0000000000000006 [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff88805df963f0 [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000119 [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d R12: ffff88805df9ca40 [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000 R15: 0000000000000000 [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000 CR4: 0000000000002660 [Fri Jan 11 13:58:42 2019] Call Trace: [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70 [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100 [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20 [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0 [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0 [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260 [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70 [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]--- [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1 layout [Fri Jan 11 13:58:42 2019] OOM killer enabled. [Fri Jan 11 13:58:42 2019] Restarting tasks ... done. [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456 [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
This always happens on every *first* live migrate that I do after starting the domU.
Yeah, its a WARN_ONCE().
And you didn't see it with v1 of the patch?
On the first glance this might be another bug just being exposed by my patch.
I'm investigating further, but this might take some time. Could you meanwhile verify the same happens with kernel 5.0-rc1? That was the one I tested with and I didn't spot that WARN.
Juergen
On 1/11/19 3:01 PM, Juergen Gross wrote:
On 11/01/2019 14:12, Hans van Kranenburg wrote:
Hi,
On 1/11/19 1:08 PM, Juergen Gross wrote:
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value.
[...]
I'm throwing around a PV domU over a bunch of test servers with live migrate now, and in between the kernel logging, I'm seeing this:
[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed 0.002 seconds) done. [Fri Jan 11 13:58:42 2019] OOM killer disabled. [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [Fri Jan 11 13:58:42 2019] suspending xenstore... [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------ [Fri Jan 11 13:58:42 2019] Current state: 1 [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Modules linked in: [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.14+ #1 [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0 [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01 e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0 74 69 48 89 [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082 [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80 RCX: 0000000000000006 [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff88805df963f0 [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000119 [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d R12: ffff88805df9ca40 [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000 R15: 0000000000000000 [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000 CR4: 0000000000002660 [Fri Jan 11 13:58:42 2019] Call Trace: [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70 [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100 [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20 [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0 [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0 [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260 [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70 [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]--- [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1 layout [Fri Jan 11 13:58:42 2019] OOM killer enabled. [Fri Jan 11 13:58:42 2019] Restarting tasks ... done. [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456 [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
This always happens on every *first* live migrate that I do after starting the domU.
Yeah, its a WARN_ONCE().
And you didn't see it with v1 of the patch?
No.
On the first glance this might be another bug just being exposed by my patch.
I'm investigating further, but this might take some time. Could you meanwhile verify the same happens with kernel 5.0-rc1? That was the one I tested with and I didn't spot that WARN.
I have Linux 5.0-rc1 with v2 on top now, which gives me this on live migrate:
[ 51.845967] xen:grant_table: Grant tables using version 1 layout [ 51.871076] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 51.871091] #PF error: [normal kernel read fault] [ 51.871100] PGD 0 P4D 0 [ 51.871109] Oops: 0000 [#1] SMP NOPTI [ 51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1 [ 51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270 [ 51.871141] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1 04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7 [ 51.871161] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282 [ 51.871173] RAX: 0000000000000000 RBX: ffffffff82541728 RCX: 0000000000000000 [ 51.871184] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI: 0000000000000001 [ 51.871196] RBP: ffff88805be8b720 R08: 0000000000000001 R09: ffffea0001699900 [ 51.871206] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88805be8b218 [ 51.871217] R13: ffff88805d0fae68 R14: 0000000000000001 R15: 0000000000000004 [ 51.871237] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000) knlGS:0000000000000000 [ 51.871252] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.871261] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4: 0000000000002660 [ 51.871275] Call Trace: [ 51.871285] blk_mq_update_nr_hw_queues+0x2fd/0x380 [ 51.871297] blkfront_resume+0x200/0x3f0 [ 51.871307] xenbus_dev_resume+0x48/0xf0 [ 51.871317] ? xenbus_dev_probe+0x120/0x120 [ 51.871326] dpm_run_callback+0x3c/0x160 [ 51.871336] device_resume+0xce/0x1d0 [ 51.871344] dpm_resume+0x115/0x2f0 [ 51.871352] ? find_watch+0x40/0x40 [ 51.871360] dpm_resume_end+0x8/0x10 [ 51.871370] do_suspend+0xef/0x1b0 [ 51.871378] shutdown_handler+0x123/0x150 [ 51.871387] xenwatch_thread+0xbb/0x160 [ 51.871397] ? wait_woken+0x80/0x80 [ 51.871406] kthread+0xf3/0x130 [ 51.871416] ? kthread_create_worker_on_cpu+0x70/0x70 [ 51.871427] ret_from_fork+0x35/0x40 [ 51.871435] Modules linked in: [ 51.871443] CR2: 0000000000000098 [ 51.871452] ---[ end trace 84a3a6932d70aa71 ]--- [ 51.871461] RIP: e030:blk_mq_map_swqueue+0x103/0x270 [ 51.871471] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1 04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7 [ 51.871491] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282 [ 51.871501] RAX: 0000000000000000 RBX: ffffffff82541728 RCX: 0000000000000000 [ 51.871512] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI: 0000000000000001 [ 51.871523] RBP: ffff88805be8b720 R08: 0000000000000001 R09: ffffea0001699900 [ 51.871533] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88805be8b218 [ 51.871545] R13: ffff88805d0fae68 R14: 0000000000000001 R15: 0000000000000004 [ 51.871562] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000) knlGS:0000000000000000 [ 51.871573] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.871582] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4: 0000000000002660
When starting it on another test dom0 to see if the direction of movement matters, it mostly fails to boot with:
[Fri Jan 11 16:16:34 2019] BUG: unable to handle kernel paging request at ffff88805c61e9f0 [Fri Jan 11 16:16:34 2019] #PF error: [PROT] [WRITE] [Fri Jan 11 16:16:34 2019] PGD 2410067 P4D 2410067 PUD 2c00067 PMD 5ff26067 PTE 801000005c61e065 [Fri Jan 11 16:16:34 2019] Oops: 0003 [#1] SMP NOPTI [Fri Jan 11 16:16:34 2019] CPU: 3 PID: 1943 Comm: apt-get Not tainted 5.0.0-rc1 #1 [Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970 [Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8 c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18 b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00 00 48 8b 44 [Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282 [Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58 RCX: 000000000000000c [Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000201 [Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000 R09: 00000000000260a0 [Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000 R12: ffffea00017187a8 [Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000 R15: 00007f04e9600000 [Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000 CR4: 0000000000002660 [Fri Jan 11 16:16:34 2019] Call Trace: [Fri Jan 11 16:16:34 2019] move_vma.isra.34+0xd1/0x2d0 [Fri Jan 11 16:16:34 2019] __x64_sys_mremap+0x1b3/0x370 [Fri Jan 11 16:16:34 2019] do_syscall_64+0x49/0x100 [Fri Jan 11 16:16:34 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fri Jan 11 16:16:34 2019] RIP: 0033:0x7f04ee2e227a [Fri Jan 11 16:16:34 2019] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8 64 89 01 48 [Fri Jan 11 16:16:34 2019] RSP: 002b:00007fffb3da3e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019 [Fri Jan 11 16:16:34 2019] RAX: ffffffffffffffda RBX: 000056533fa1bf50 RCX: 00007f04ee2e227a [Fri Jan 11 16:16:34 2019] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f04e95ac000 [Fri Jan 11 16:16:34 2019] RBP: 0000000001a00000 R08: 2e8ba2e8ba2e8ba3 R09: 0000000000000040 [Fri Jan 11 16:16:34 2019] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f04e95ac060 [Fri Jan 11 16:16:34 2019] R13: 00007f04e95ac000 R14: 000056533fa45d73 R15: 000056534024bd10 [Fri Jan 11 16:16:34 2019] Modules linked in: [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 [Fri Jan 11 16:16:34 2019] ---[ end trace 443702bd9ba5d6b2 ]--- [Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970 [Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8 c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18 b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00 00 48 8b 44 [Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282 [Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58 RCX: 000000000000000c [Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000201 [Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000 R09: 00000000000260a0 [Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000 R12: ffffea00017187a8 [Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000 R15: 00007f04e9600000 [Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000 CR4: 0000000000002660
I can log in over ssh, but a command like ps afxu hangs. Oh, it seems that 5.0-rc1 is doing this all the time. Next time it's after 500 seconds uptime.
xl destroy and trying again, it boots. 1st live migrate successful (and no clockevents_switch_state complaints), second one explodes with blk_mq_update_nr_hw_queues again.
Hmok, as long as I live migrate the 5.0-rc1 domU around between dom0s with Xen 4.11.1-pre from commit 5acdd26fdc (the one we had in debian until yesterday) and Linux 4.19.9 in the dom0, it works. As soon as I live migrate to the one box running the new Xen 4.11.1 package from Debian unstable, and Linux 4.19.12, then I get the blk_mq_update_nr_hw_queues.
If I do the same with 4.19 in the domU, I don't get blk_mq_update_nr_hw_queues.
Now, back to 4.19.14 + guard_hole + v2, I can't seem to reproduce the clockevents_switch_state any more. I'll take a break and then see to find out if I'm doing anything different than earlier today when I could reproduce it 100% consistently.
O_o :)
Hans
On 1/11/19 4:57 PM, Hans van Kranenburg wrote:
On 1/11/19 3:01 PM, Juergen Gross wrote:
On 11/01/2019 14:12, Hans van Kranenburg wrote:
Hi,
On 1/11/19 1:08 PM, Juergen Gross wrote:
Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration:
[ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216
Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value.
[...]
I'm throwing around a PV domU over a bunch of test servers with live migrate now, and in between the kernel logging, I'm seeing this:
[Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed 0.002 seconds) done. [Fri Jan 11 13:58:42 2019] OOM killer disabled. [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [Fri Jan 11 13:58:42 2019] suspending xenstore... [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------ [Fri Jan 11 13:58:42 2019] Current state: 1 [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0 [...]
This always happens on every *first* live migrate that I do after starting the domU.
Yeah, its a WARN_ONCE().
Ok, false alarm. It's there, but not caused by this patch.
I changed the WARN_ONCE to WARN for funs, and now I get it a lot more already (v2):
https://paste.debian.net/plainh/d535a379
And you didn't see it with v1 of the patch?
No.
I was wrong. I tried a bit more, and I can also reproduce without v1 or v2 patch at all, and I can reproduce it with v4.19.9. Just sometimes needs a dozen times more live migrating it before it shows up.
I cannot make it happen to show up with the Debian 4.19.9 distro kernel, that's why it was new for me.
So, let's ignore it in this thread now.
On the first glance this might be another bug just being exposed by my patch.
I'm investigating further, but this might take some time. Could you meanwhile verify the same happens with kernel 5.0-rc1? That was the one I tested with and I didn't spot that WARN.
I have Linux 5.0-rc1 with v2 on top now, which gives me this on live migrate:
[...] [ 51.871076] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 51.871091] #PF error: [normal kernel read fault] [ 51.871100] PGD 0 P4D 0 [ 51.871109] Oops: 0000 [#1] SMP NOPTI [ 51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1 [ 51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270 [...]
Dunno about all the 5.0-rc1 crashes yet. I can provide more feedback about that if you want, but not in here I presume.
Hans
On 1/11/19 7:08 AM, Juergen Gross wrote:
@@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void) if (ret != 0) pr_notice("Cannot restore secondary vcpu_time_info (err %d)", ret);
+out:
- /* Need pvclock_resume() before using xen_clocksource_read(). */
- pvclock_resume();
Do we want to drop it from xen_timer_resume then?
BTW, I ran this code a few times on top of for-linus-4.21 and didn't see any of the problems that Hans described. I didn't do LM, only save/restore, but I'd think it shouldn't make any difference.
-boris
On 11/01/2019 21:35, Boris Ostrovsky wrote:
On 1/11/19 7:08 AM, Juergen Gross wrote:
@@ -421,6 +424,11 @@ void xen_restore_time_memory_area(void) if (ret != 0) pr_notice("Cannot restore secondary vcpu_time_info (err %d)", ret);
+out:
- /* Need pvclock_resume() before using xen_clocksource_read(). */
- pvclock_resume();
Do we want to drop it from xen_timer_resume then?
Yes. V3 on its way...
Juergen
linux-stable-mirror@lists.linaro.org