On Fri, May 14, 2021, Greg KH wrote:
On Fri, May 14, 2021 at 01:38:53PM +0200, Jack Wang wrote:
From: Sean Christopherson seanjc@google.com
commit c5e2184d1544f9e56140791eff1a351bea2e63b9 upstream
Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7a077 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will.
The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped.
Reported-by: Yu Zhang yu.c.zhang@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210115004051.4099250-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com (jwang: backport to 5.4 to fix a warning on AMD nested Virtualization) Signed-off-by: Jack Wang jinpu.wang@ionos.com
We hit a warning in WARNING: CPU: 62 PID: 29302 at arch/x86/kvm/mmu.c:2250 nonpaging_update_pte+0x5/0x10 [kvm] on AMD Opteron(tm) Processor 6386 SE with kernel 5.4.113, it seems nested L2 is running, I notice a similar bug report on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1884058.
I did test with kvm-unit-tests on both Intel Broadwell/Skylake, AMD Opteron, no regression, basic VM tests work fine too on 5.4 kernel. the commit c5e2184d1544f9e56140791eff1a351bea2e63b9 can be cherry-picked cleanly to kernel 5.10+.
Ah, drat. The WARN fires because update_pte() comes from the current MMU context, and older kernels are missing a check to ensure the current MMU context is "compatible" with the target shadow page's context.
That bug was inadvertantly fixed by commit a102a674e423 ("KVM: x86/mmu: Don't drop level/direct from MMU role calculation"), but I didn't tag that for stable because I incorrectly thought that adding a check on the "direct" role was a nop. From that changelog:
While doing so, stop ignoring "direct". The field is effectively ignored anyways because kvm_mmu_pte_write() is only reached with an indirect mmu and the loop only walks indirect shadow pages, but double checking "direct" literally costs nothing.
Specifically, the "kvm_mmu_pte_write() is only reached with an indirect mmu" part fails to account for the case where where the VM has one or more indirect MMUs _and_ a direct MMU (the current MMU context).
All that said, ripping out this code works too, and is preferable since the code is dead for its intended purpose, i.e. reaching the update_pte() code can only be a random, unwanted collision.
Now queued up, thanks.
Belated thumbs up, thanks!