On Sat, 24 Dec 2022 at 13:19, Marc Zyngier maz@kernel.org wrote:
On Thu, 22 Dec 2022 13:01:55 +0000, Ard Biesheuvel ardb@kernel.org wrote:
On Tue, 20 Dec 2022 at 21:09, Marc Zyngier maz@kernel.org wrote:
A recent development on the EFI front has resulted in guests having their page tables baked in the firmware binary, and mapped into the IPA space as part as a read-only memslot.
Not only this is legitimate, but it also results in added security, so thumbs up. However, this clashes mildly with our handling of a S1PTW as a write to correctly handle AF/DB updates to the S1 PTs, and results in the guest taking an abort it won't recover from (the PTs mapping the vectors will suffer freom the same problem...).
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:
On S1PTW translation fault, handle the fault as a read
On S1PTW permission fault, handle the fault as a write
This is of no consequence to SW that *writes* to its PTs (the write will trigger a non-S1PTW fault), and SW that uses RO PTs will not use AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") do we end-up with two back-to-back faults (page being evicted and faulted back). I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch") Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel ardb@kernel.org
I have tested this patch on my TX2 with one of the EFI builds in question, and everything works as before (I never observed the issue itself)
If you get the chance, could you try with non-4kB page sizes? Here, I could only reproduce it with 16kB pages. It was firing like clockwork on Cortex-A55 with that.
I'll try on 64k but I don't have access to a 16k capable machine that runs KVM atm (I'm still enjoying working wifi and GPU etc on my M1 Macbook Air)
Regression-tested-by: Ard Biesheuvel ardb@kernel.org
For the record, the EFI build in question targets QEMU/mach-virt and switches to a set of read-only page tables in emulated NOR flash straight out of reset, so it can create and populate the real page tables with MMU and caches enabled. EFI does not use virtual memory or paging so managing access flags or dirty bits in hardware is unlikely to add any value, and it is not being used at the moment. And given that this is emulated NOR flash, any ordinary write to it tears down the r/o memslot altogether, and kicks the NOR flash emulation in QEMU into programming mode, which is fully based on MMIO emulation and does not use a memslot at all. IOW, even if we could figure out what store the PTW was attempting to do, it is always going to be rejected since the r/o page tables can only be modified by 'programming' the NOR flash sector.
Indeed, and this would be a pretty dodgy setup anyway.
Thanks for having had a look,
M.
-- Without deviation from the norm, progress is not possible.