On Thu, Oct 21, 2021 at 05:26:26PM +0200, Paolo Bonzini wrote:
On 06/10/21 01:44, Michael Roth wrote:
SEV guests rely on an encyption bit which resides within the range that current code treats as address bits. Guest code will expect these bits to be set appropriately in their page tables, whereas helpers like addr_gpa2hva() will expect these bits to be masked away prior to translation. Add proper handling for these cases.
This is not what you're doing below in addr_gpa2hva, though---or did I misunderstand?
The confusion is warranted, addr_gpa2hva() *doesn't* expect the C bit to be masked in advance so the wording is pretty confusing.
I think I was referring the fact that internally it doesn't need/want the C-bit, in this case it just masks it away as a convenience to callers, as opposed to the other functions modified in the patch that actually make use of it.
It's convenient because page table walkers/mappers make use of addr_gpa2hva() to do things like silently mask away C-bits via when translating PTEs to host addresses. We easily convert those callers from:
addr_gpa2hva(paddr)
to this:
addr_gpa2hva(addr_raw2gpa(paddr))
but now all new code needs to consider whether it might be dealing with C-bits or not prior to deciding to pass it to addr_gpa2hva() (or not really think about it, and add addr_gpa2raw() "just in case"). So since it's always harmless to mask it away silently addr_gpa2hva(), the logic/code seems to benefit a good deal if we indicate clearly that addr_gpa2hva() can accept a 'raw' GPA, and will ignore it completely.
But not a big deal either way if you prefer to keep that explicit. And commit message still needs to be clarified.
I may be wrong due to not actually having written the code, but I'd prefer if most of these APIs worked only if the C bit has already been stripped. In general it's quite unlikely for host code to deal with C=1 pages, so it's worth pointing out explicitly the cases where it does.
I've tried to indicate functions that expect the C-bit by adding the 'raw_' prefix to the gpa/paddr parameters, but as you pointed out with addr_gpa2hva() it's already a bit inconsistent in that regard, and there's a couple cases like virt_map() where I should use the 'raw_' prefix as well that I've missed here.
So that should be addressed, and maybe some additional comments/assertions might be warranted to guard against cases where the C-bit is passed in unexpectedly.
But I should probably re-assess why the C-bit is being passed around in the first place:
- vm_phy_page[s]_alloc() is the main 'source' for 'raw' GPAs with the C-bit set. it determines this based on vm_memcrypt encryption policy, and updates the encryption bitmask as well. - vm_phy_page[s]_alloc() is callable both in kvm_util lib as well as individual tests. - in theory, encoding the C-bit in the returned vm_paddr_t means that vm_phy_page[s]_alloc() callers can pass that directly into virt_map/virt_pg_map() and this will "just work" for both encrypted/non-encrypted guests. - by masking it away in addr_gpa2hva(), existing tests/code flow mostly "just works" as well.
But taking a closer look, in cases where vm_phy_page[s]_alloc() is called directly by tests, like set_memory_region_test, emulator_error_test, and smm_test, that raw GPA is compared to hardcoded non-raw GPAs, so they'd still end up needing fixups to work with the proposed transparent-SEV-mode stuff. And future code would need to be written to account for this, so it doesn't really "just work" after all..
So it's worth considering the alternative approach of *not* encoding the C-bit into GPAs returned by vm_phy_page[s]_alloc(). That would likely involve introducing something like addr_gpa2raw(), which adds in the C-bit according to the encryption bitmap as-needed. If we do that:
- virt_map()/virt_pg_map() still need to accept 'raw' GPAs, since they need to deal with cases where pages are being mapping that weren't allocated by vm_phy_page[s]_alloc(), and so aren't recorded in the bitmap. in those cases it is up to test code to provide the C-bit when needed (e.g. things like separate linear mappings for pa()-like stuff in guest code).
- for cases where vm_phy_page[s]_alloc() determines whether the page is encrypted, addr_gpa2raw() needs to be used to add back the C-bit prior to passing it to virt_map()/virt_pg_map(), both in the library and the test code. vm_vaddr_* allocations would handle all this under the covers as they do now.
So test code would need to consider cases where addr_gpa2raw() needs to be used to set the C-bit (which is basically only when they want to mix usage of the vm_phy_page[s]_alloc with their own mapping of the guest page tables, which doesn't seem to be done in any existing tests anyway).
The library code would need these addr_gpa2raw() hooks in places where it calls virt_*map() internally. Probably just a handful of places though.
Assuming there's no issues with this alternative approach that I may be missing, I'll look at doing it this way for the next spin.
Even in this alternative approach though, having addr_gpa2hva() silently mask away C-bit still seems useful for the reasons above, but again, no strong feelings one way or the other on that.
Paolo
@@ -1460,9 +1480,10 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
- address providing the memory to the vm physical address is returned.
- A TEST_ASSERT failure occurs if no region containing gpa exists.
*/ -void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa) +void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa_raw) { struct userspace_mem_region *region;