On Mon, Apr 29, 2019 at 11:53 AM Linus Torvalds torvalds@linux-foundation.org wrote:
On Mon, Apr 29, 2019, 11:42 Andy Lutomirski luto@kernel.org wrote:
I'm less than 100% convinced about this argument. Sure, an NMI right there won't cause a problem. But an NMI followed by an interrupt will kill us if preemption is on. I can think of three solutions:
No, because either the sti shadow disables nmi too (that's the case on some CPUs at least) or the iret from nmi does.
Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
Is this documented somewhere? And do you actually believe that this is true under KVM, Hyper-V, etc? As I recall, Andrew Cooper dug in to the way that VMX dealt with this stuff and concluded that the SDM was blatantly wrong in many cases, which leads me to believe that Xen HVM/PVH is the *only* hypervisor that gets it right.
Steven's point about batched updates is quite valid, though. My personal favorite solution to this whole mess is to rework the whole thing so that the int3 handler simply returns and retries and to replace the sync_core() broadcast with an SMI broadcast. I don't know whether this will actually work on real CPUs and on VMs and whether it's going to crash various BIOSes out there.
On Mon, Apr 29, 2019 at 11:57 AM Andy Lutomirski luto@kernel.org wrote:
Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
Is this documented somewhere?
Btw, if you really don't trust the sti shadow despite it going all the way back to the 8086, then you could instead make the irqoff code do
push %gs:bp_call_return push %gs:bp_call_target sti ret
which just keeps interrupts explicitly disabled over the whole use of the percpu data.
The actual "ret" instruction doesn't matter, it's not going to change in this model (where the code isn't dynamically generated or changed). So I claim that it will still be protected by the sti shadow, but when written that way it doesn't actually matter, and you could reschedule immediately after the sti (getting an interrupt there might make the stack frame look odd, but it doesn't really affect anything else)
Linus
On Mon, Apr 29, 2019 at 03:06:30PM -0700, Linus Torvalds wrote:
On Mon, Apr 29, 2019 at 11:57 AM Andy Lutomirski luto@kernel.org wrote:
Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
Is this documented somewhere?
Btw, if you really don't trust the sti shadow despite it going all the way back to the 8086, then you could instead make the irqoff code do
push %gs:bp_call_return push %gs:bp_call_target sti ret
This variant cures the RETPOLINE complaint; due to there not actually being an indirect jump anymore. And it cures the sibling call complaint, but trades it for "return with modified stack frame".
Something like so is clean:
+extern asmlinkage void emulate_call_irqon(void); +extern asmlinkage void emulate_call_irqoff(void); + +asm( + ".text\n" + ".global emulate_call_irqoff\n" + ".type emulate_call_irqoff, @function\n" + "emulate_call_irqoff:\n\t" + "push %gs:bp_call_return\n\t" + "push %gs:bp_call_target\n\t" + "sti\n\t" + "ret\n" + ".size emulate_call_irqoff, .-emulate_call_irqoff\n" + + ".global emulate_call_irqon\n" + ".type emulate_call_irqon, @function\n" + "emulate_call_irqon:\n\t" + "push %gs:bp_call_return\n\t" + "push %gs:bp_call_target\n\t" + "ret\n" + ".size emulate_call_irqon, .-emulate_call_irqon\n" + ".previous\n"); + +STACK_FRAME_NON_STANDARD(emulate_call_irqoff); +STACK_FRAME_NON_STANDARD(emulate_call_irqon);
linux-kselftest-mirror@lists.linaro.org