On Thu, Feb 18, 2021 at 04:28:36PM -0800, Andy Lutomirski wrote:
On Thu, Feb 18, 2021 at 11:21 AM Joerg Roedel jroedel@suse.de wrote: Can you give me an example, even artificial, in which the linked-list logic is useful?
So here we go, its of course artificial, but still:
1. #VC happens, not important where 2. NMI in the #VC prologue before it moved off its IST stack - first VC IST adjustment happening here 3. #VC in the NMI handler 4. #HV in the #VC prologue again - second VC IST adjustment happening here, so the #HV handler can cause its own #VC exceptions.
Can only happen if the #HV handler is allowed to cause #VC exceptions. But even if its not allowed, it can happen with SNP and a malicious Hypervisor. But in this case the only option is to reliably panic.
Can you explain your reasoning in considering the entry stack unsafe? It's 4k bytes these days.
I wasn't aware that it is 4k in size now. I still thought it was just these 64 words large and one can not simply execute C code on it.
You forgot about entry_SYSCALL_compat.
Right, thanks for pointing this out.
Your 8-byte alignment is confusing to me. In valid kernel code, SP should be 8-byte-aligned already, and, if you're trying to match architectural behavior, the CPU aligns to 16 bytes.
Yeah, I was just being cautious. The explicit alignment can be removed, Boris also pointed this out.
We're not robust against #VC, NMI in the #VC prologue before the magic stack switch, and a new #VC in the NMI prologue. Nor do we appear to have any detection of the case where #VC nests directly inside its own prologue. Or did I miss something else here?
No, you don't miss anything here. At the moment #VC can't happen at those places, so this is not handled yet. With SNP it can happen and needs to be handled in a way to at least allow a reliable panic (because if it really happens the Hypervisor is messing with us).
If we get NMI and get #VC in the NMI *asm*, the #VC magic stack switch looks like it will merrily run itself in the NMI special-stack-layout section, and that sounds really quite bad.
Yes, I havn't looked at the details yet, but if a #VC happens there it probably better not returns.
I mean that, IIRC, a malicious hypervisor can inject inappropriate vectors at inappropriate times if the #HV mechanism isn't enabled. For example, it could inject a page fault or an interrupt in a context in which we have the wrong GSBASE loaded.
Yes, a malicious Hypervisor can do that, and without #HV there is no real protection against this besides turning all vectors (even IRQs) into paranoid entries. Maybe even more care is needed, but I think its not worth to care about this.
But the #DB issue makes this moot. We have to use IST unless we turn off SCE. But I admit I'm leaning toward turning off SCE until we have a solution that seems convincingly robust.
Turning off SCE might be tempting, but I guess doing so would break a quite some user-space code, no?
Regards,
Joerg