RE: [PATCHv2 2/3] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()

2 Jun 2023


      From: Tom Lendacky thomas.lendacky@amd.com Sent: Thursday, June 1, 2023 11:19 AM
...
On 5/31/23 15:00, Michael Kelley (LINUX) wrote:
...
From: Sathyanarayanan Kuppuswamy
sathyanarayanan.kuppuswamy@linux.intel.com
...
Sent: Tuesday, May 30, 2023 6:22 AM
...
Hi,
On 5/30/23 5:57 AM, Tom Lendacky wrote:
...
On 5/29/23 19:57, Kirill A. Shutemov wrote:
...
On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy
wrote:
...
...
...
...
...
On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
> Touching privately mapped GPA that is not properly converted to private
> with MapGPA and accepted leads to unrecoverable exit to VMM.
>
> load_unaligned_zeropad() can touch memory that is not owned by the
> caller, but just happened to next after the owned memory.
/s/to/to be ?
Yep, my bad.
...
> This load_unaligned_zeropad() behaviour makes it important when kernel
> asks VMM to convert a GPA from shared to private or back. Kernel must
> never have a page mapped into direct mapping (and aliases) as private
> when the GPA is already converted to shared or when GPA is not yet
> converted to private.
I am wondering whether this issue exist in the AMD code?
IMO, you can add some info on the window in set_memory_encrypted()
where this race exists.
I don't think AMD affected by load_unaligned_zeropad() the same way as
Intel does. But I'm not sure.
Tom, do you have any comments?
Right, shouldn't be an issue for SNP.
Thanks for confirming.
Tom -- For my education, could you elaborate on why this problem can't
occur in an SEV-SNP guest?  There's still a window where the direct map
PTE and the RMP as maintained by the hypervisor are out-of-sync.  If
load_unaligned_zeropad() does a read using the direct map PTE during
this out-of-sync window, isn't that going to trap to the hypervisor?  How
is the scenario is handled from there to provide the zeros to
load_unaligned_zeropad()?  I need to make sure Hyper-V is doing whatever
is needed. :-)
Ah, I think I misunderstood this when it was being talked about. The issue
SNP would have would be between setting the c-bit but before the PVALIDATE
is issued. Prior to the RMP being updated, referencing the page will
generate an #NPF and automatically change the RMP over to private (in
KVM). However, after the guest is resumed, the page will not have been
validated resulting in a #VC with error code 0x404 being generated,
causing the guest to terminate itself.
I suppose, when a 0x404 error code is encountered by the #VC handler, it
could call search_exception_tables() and call ex_handler_zeropad() for the
EX_TYPE_ZEROPAD type (ex_handler_zeropad is currently static, though).
Tom -- Does the above sequence *depend* on the hypervisor doing anything
to make it work?  I'm not clear on why KVM would automatically change the
page over to private.  If there's a dependency on the hypervisor doing
something, then it seems like we'll need to standardize that "something"
across hypervisors, lest we end up with per-hypervisor code in Linux to handle
this scenario.  And running SEV-SNP with multiple VMPLs probably makes it
even more complicated.
Kirill -- Same question about TDX.  Does making load_unaligned_zeropad()
work in a TDX VM depend on the hypervisor doing anything?  Or is the
behavior seen by the guest dependent only on architected behavior of
the TDX processor?
Looking at this problem from a slightly higher level, and thinking out loud
a bit, load_unaligned_zeropad() functionality is provided only for certain
architectures:  x86/64, arm, arm64, and PowerPC 64 (little endian).  There are
fallbacks for architectures that don't support it.  With two minor tweaks to
Kconfig files, I've built x86 with load_unaligned_zeropad() disabled. Maybe
with today's processors the performance benefits are past their prime,
and running with it disabled in CoCo VMs is the better solution.  Does
anyone have a sense of whether the perf impact would be measureable?
If doing the load_unaligned_zeropad() enable/disable at build time is too
limiting, maybe it could be runtime based on whether page private/shared
state is being enforced.  I haven't looked at the details.
Thoughts?
Michael

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCHv2 2/3] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()