Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU

12 Jan 2022

      On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson seanjc@google.com wrote:
...
On Wed, Jan 12, 2022, David Matlack wrote:
...
When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.
This behavior is incorrect because the SPTE may be write-protected for
dirty logging. This implies that the SPTE could be locklessly be made
writable on the next write access, and that vCPUs could still be running
with writable SPTEs cached in their TLB.
Fix this by unconditionally setting the SPTE and only skipping the TLB
flush if the SPTE was already marked !MMU-writable or !Host-writable,
which guarantees the SPTE cannot be locklessly be made writable and no
vCPUs are running the writable SPTEs cached in their TLBs.
Technically it would be safe to skip setting the SPTE as well since:
(a) If MMU-writable is set then Host-writable must be cleared
      and the only way to set Host-writable is to fault the SPTE
      back in entirely (at which point any unsynced shadow pages
      reachable by the new SPTE will be synced and MMU-writable can
      be safetly be set again).
and
(b) MMU-writable is never consulted on its own.
And in fact this is what the shadow MMU does when write-protecting guest
page tables. However setting the SPTE unconditionally is much easier to
reason about and does not require a huge comment explaining why it is safe.
I disagree.  I looked at the code+comment before reading the full changelog and
typed up a response saying the code should be:
            if (!is_writable_pte(iter.old_spte) &&
                !spte_can_locklessly_be_made_writable(spte))
                    break;

Then I went read the changelog and here we are :-)
I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
writable and can't be made writable, there's nothing to do".
Oh interesting. I actually find that confusing because it can easily
lead to the MMU-writable bit staying set. Here we are protecting GFNs
and we're opting to leave the MMU-writable bit set. It takes a lot of
digging to figure out that this is safe because if MMU-writable is set
and the SPTE cannot be locklessly be made writable then it implies
Host-writable is clear, and Host-writable can't be reset without
syncing the all shadow pages reachable by the MMU. Oh and the
MMU-writable bit is never consulted on its own (e.g. We never iterate
through all SPTEs to find the ones that are !MMU-writable).
Maybe my understanding is horribly off since this all seems
unnecessarily convoluted, and the cost of always clearing MMU-writable
is just an extra bitwise-OR.
The TLB flush is certainly unnecessary if the SPTE is already
!Host-writable, which is what this commit does.
...
Versus "unconditionally clear the writable bits because ???, but only flush if
the write was actually necessary", with a slightly opinionated translation :-)
If MMU-writable is already clear we can definitely break. I had that
in a previous version of the patch by checking if iter.old_spte ==
new_spte but it seemed unnecessary since the guts of
tdp_mmu_spte_set() already optimizes for this.
...
And with that, you don't need to do s/spte_set/flush.  Though I would be in favor
of a separate patch to do s/spte_set/write_protected here and in the caller, to
match kvm_mmu_slot_gfn_write_protect().
I'm not sure write_protected would not be a good variable name because
even if we did not write-protect the SPTE (i.e. PT_WRITABLE_MASK was
already clear) we may still need a TLB flush to ensure no CPUs have a
writable SPTE in their TLB. Perhaps we have different definitions for
"write-protecting"?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU