On Thu, Jan 13, 2022, David Matlack wrote:
On Thu, Jan 13, 2022 at 9:04 AM David Matlack dmatlack@google.com wrote:
On Wed, Jan 12, 2022 at 4:29 PM Sean Christopherson seanjc@google.com wrote:
Oh interesting. I actually find that confusing because it can easily lead to the MMU-writable bit staying set. Here we are protecting GFNs and we're opting to leave the MMU-writable bit set. It takes a lot of digging to figure out that this is safe because if MMU-writable is set and the SPTE cannot be locklessly be made writable then it implies Host-writable is clear, and Host-writable can't be reset without syncing the all shadow pages reachable by the MMU. Oh and the MMU-writable bit is never consulted on its own (e.g. We never iterate through all SPTEs to find the ones that are !MMU-writable).
Ah, you've missed the other wrinkle: MMU-writable can bet set iff Host-writable is set. In other words, the MMU-writable bit is never left set because it can't be set if spte_can_locklessly_be_made_writable() returns false.
The changed_pte notifier looks like it clears Host-writable without clearing MMU-writable. Specifically the call chain:
kvm_mmu_notifier_change_pte() kvm_set_spte_gfn() kvm_tdp_mmu_set_spte_gfn() set_spte_gfn() kvm_mmu_changed_pte_notifier_make_spte()
Is there some guarantee that old_spte is !MMU-writable at this point?
Ugh, I misread that code, multiple times. There's no guarantee, it was likely just missed when MMU-writable was introduced.
Note, you literally cannot hit that code path in current kernels. See commit c13fda237f08 ("KVM: Assert that notifier count is elevated in .change_pte()"). So whatever you do is effectively untestable.
I really want to rip out .change_pte(), but I also don't want to do any performance testing to justify removing the code instead of fixing it proper, so it's hung around as a zombie...
If not I could easily change kvm_mmu_changed_pte_notifier_make_spte() to also clear MMU-writable and preserve the invariant.
Yes, please.