On 11.04.18 at 09:08, jgross@suse.com wrote:
On 14/03/18 09:48, Jan Beulich wrote:
On 26.02.18 at 15:08, jgross@suse.com wrote:
@@ -35,6 +40,9 @@ void xen_arch_post_suspend(int cancelled) static void xen_vcpu_notify_restore(void *data) {
- if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL))
wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl));
- /* Boot processor notified via generic timekeeping_resume() */ if (smp_processor_id() == 0) return;
@@ -44,7 +52,15 @@ static void xen_vcpu_notify_restore(void *data) static void xen_vcpu_notify_suspend(void *data) {
- u64 tmp;
- tick_suspend_local();
- if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
rdmsrl(MSR_IA32_SPEC_CTRL, tmp);
this_cpu_write(spec_ctrl, tmp);
wrmsrl(MSR_IA32_SPEC_CTRL, 0);
- }
}
While investigating ways how to do something similar on our old, non-pvops kernels I've started wondering if this solution is actually correct in all cases. Of course discussing this is complicated by the fact that the change there might be a conflict with hasn't landed in Linus'es tree yet (see e.g. https://patchwork.kernel.org/patch/10153843/ for an upstream submission; I haven't been able to find any discussion on that patch or why it isn't upstream yet), but we have it in our various branches. The potential problem I'm seeing is with the clearing and re-setting of SPEC_CTRL around CPUs going idle. While the active CPU could have preemption disabled (if that isn't the case already), the passive CPUs are - afaict - neither under full control of drivers/xen/manage.c:do_suspend() nor excluded yet from any further scheduling activity. Hence with code like this (taken from one of our branches)
static void mwait_idle(void) { if (!current_set_polling_and_test()) { trace_cpu_idle_rcuidle(1, smp_processor_id()); if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) { smp_mb(); /* quirk */ clflush((void *)¤t_thread_info()->flags); smp_mb(); /* quirk */ }
x86_disable_ibrs(); __monitor((void *)¤t_thread_info()->flags, 0, 0); if (!need_resched()) __sti_mwait(0, 0); else local_irq_enable(); x86_enable_ibrs(); ...
the MSR might get set to non-zero again after having been cleared by the code your patch adds. I therefore think that the only race free solution would be to do the clearing from stop-machine context. But maybe I'm overlooking something.
Currently and with the above mentioned patch there is no problem: Xen pv guests always use default_idle(), so mwait_idle() eventually playing with MSR_IA32_SPEC_CTRL won't affect us.
It's pretty unclear to me why default_idle() doesn't have this - in Xen we do it for all idle routines.
In order to ensure that won't change in future default_idle() should never modify MSR_IA32_SPEC_CTRL. In case something like that would be required we should rather add another idle function doing that.
This looks like a pretty strange/non-obvious requirement, which I don't think anyone would remember.
Additionally, x86 maintainers: is there a particular reason this (or any functionally equivalent patch) isn't upstream yet? As indicated before, I had not been able to find any discussion, and hence I see no reason why this is a patch we effectively carry privately in our distro branches (and likely other distros do so too).
Jan