Virtualization Exceptions (#VE) are delivered to TDX guests due to specific guest actions such as using specific instructions or accessing a specific MSR.
Notable reason for #VE is access to specific guest physical addresses. It requires special security considerations as it is not fully in control of the guest kernel. VMM can remove a page from EPT page table and trigger #VE on access.
The primary use-case for #VE on a memory access is MMIO: VMM removes page from EPT to trigger exception in the guest which allows guest to emulate MMIO with hypercalls.
MMIO only happens on shared memory. All conventional kernel memory is private. This includes everything from kernel stacks to kernel text.
Handling exceptions on arbitrary accesses to kernel memory is essentially impossible as handling #VE may require access to memory that also triggers the exception.
TDX module provides mechanism to disable #VE delivery on access to private memory. If SEPT_VE_DISABLE TD attribute is set, private EPT violation will not be reflected to the guest as #VE, but will trigger exit to VMM.
Make sure the attribute is set by VMM. Panic otherwise.
There's small window during the boot before the check where kernel has early #VE handler. But the handler is only for port I/O and panic as soon as it sees any other #VE reason.
SEPT_VE_DISABLE makes SEPT violation unrecoverable and terminating the TD is the only option.
Kernel has no legitimate use-cases for #VE on private memory. It is either a guest kernel bug (like access of unaccepted memory) or malicious/buggy VMM that removes guest page that is still in use.
In both cases terminating TD is the right thing to do.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Fixes: 9a22bf6debbf ("x86/traps: Add #VE support for TDX guest") Cc: stable@vger.kernel.org # v5.19 --- arch/x86/coco/tdx/tdx.c | 49 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 343d60853b71..a376a0c3fddc 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -34,6 +34,9 @@ #define VE_GET_PORT_NUM(e) ((e) >> 16) #define VE_IS_IO_STRING(e) ((e) & BIT(4))
+/* TD Attributes */ +#define ATTR_SEPT_VE_DISABLE BIT(28) + /* Caches GPA width from TDG.VP.INFO TDCALL */ static unsigned int gpa_width __ro_after_init;
@@ -770,6 +773,52 @@ void __init tdx_early_init(void) */ tdx_parse_tdinfo();
+ /* + * Do not allow #VE due to EPT violation on the private memory + * + * Virtualization Exceptions (#VE) are delivered to TDX guests due to + * specific guest actions such as using specific instructions or + * accessing a specific MSR. + * + * Notable reason for #VE is access to specific guest physical + * addresses. It requires special security considerations as it is not + * fully in control of the guest kernel. VMM can remove a page from EPT + * page table and trigger #VE on access. + * + * The primary use-case for #VE on a memory access is MMIO: VMM removes + * page from EPT to trigger exception in the guest which allows guest to + * emulate MMIO with hypercalls. + * + * MMIO only happens on shared memory. All conventional kernel memory is + * private. This includes everything from kernel stacks to kernel text. + * + * Handling exceptions on arbitrary accesses to kernel memory is + * essentially impossible as handling #VE may require access to memory + * that also triggers the exception. + * + * TDX module provides mechanism to disable #VE delivery on access to + * private memory. If SEPT_VE_DISABLE TD attribute is set, private EPT + * violation will not be reflected to the guest as #VE, but will trigger + * exit to VMM. + * + * Make sure the attribute is set by VMM. Panic otherwise. + * + * There's small window during the boot before the check where kernel has + * early #VE handler. But the handler is only for port I/O and panic as + * soon as it sees any other #VE reason. + * + * SEPT_VE_DISABLE makes SEPT violation unrecoverable and terminating + * the TD is the only option. + * + * Kernel has no legitimate use-cases for #VE on private memory. It is + * either a guest kernel bug (like access of unaccepted memory) or + * malicious/buggy VMM that removes guest page that is still in use. + * + * In both cases terminating TD is the right thing to do. + */ + if (!(td_attr & ATTR_SEPT_VE_DISABLE)) + panic("TD misconfiguration: SEPT_VE_DISABLE attibute must be set.\n"); + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
cc_set_vendor(CC_VENDOR_INTEL);
The core of this vulnerability is not directly related to the ATTR_SEPT_VE_DISABLE, but the MMIO processing logic in #VE.
We have encountered similar problems on SEV-ES, here are their fixes on Kernel [1] and OVMF[2].
Instead of enforcing the ATTR_SEPT_VE_DISABLE in TDX guest kernel, I think the fix should also include necessary check on the MMIO path of the #VE routine.
static int handle_mmio(struct pt_regs *regs, struct ve_info *ve) { unsigned long *reg, val, vaddr; char buffer[MAX_INSN_SIZE]; struct insn insn = {}; enum mmio_type mmio; int size, extend_size; u8 extend_val = 0;
// Some addtional security check about ve->gpa should be introduced here.
/* Only in-kernel MMIO is supported */ if (WARN_ON_ONCE(user_mode(regs))) return -EFAULT;
// ... }
If we don't fix the problem at the point where we found, but rely on complicated composite logic and long comments in the kernel, I'm confident we'll fall back into the same pit in the near future :).
[1] https://github.com/torvalds/linux/blob/1a2dcbdde82e3a5f1db9b2f4c48aa1aeba534... [2] OVMF: https://github.com/tianocore/edk2/blob/db2c22633f3c761975d8f469a0e195d8b79e1...
On Mon, Oct 31, 2022 at 12:07:45PM +0800, Guorui Yu wrote:
The core of this vulnerability is not directly related to the ATTR_SEPT_VE_DISABLE, but the MMIO processing logic in #VE.
We have encountered similar problems on SEV-ES, here are their fixes on Kernel [1] and OVMF[2].
Instead of enforcing the ATTR_SEPT_VE_DISABLE in TDX guest kernel, I think the fix should also include necessary check on the MMIO path of the #VE routine.
Missing SEPT_VE_DISABLE exposes to more security problems than confused handle_mmio(). Rogue #VE that is rightly timed can be used to escalate privileges and more. Just adding check there would solve only some potential attacks.
static int handle_mmio(struct pt_regs *regs, struct ve_info *ve) { unsigned long *reg, val, vaddr; char buffer[MAX_INSN_SIZE]; struct insn insn = {}; enum mmio_type mmio; int size, extend_size; u8 extend_val = 0;
// Some addtional security check about ve->gpa should be introduced here.
/* Only in-kernel MMIO is supported */ if (WARN_ON_ONCE(user_mode(regs))) return -EFAULT;
// ... }
If we don't fix the problem at the point where we found, but rely on complicated composite logic and long comments in the kernel, I'm confident we'll fall back into the same pit in the near future :).
The plan is to add the check there along with relaxing SEPT_VE_DISABLE for debug TD. It is required to debug guest kernel effectively. Otherwise access to unaccepted memory would terminate TD with zero info on why.
But it is not the urgent fix. It can be submitted separately.
linux-stable-mirror@lists.linaro.org