Hi Christian, Janosch, Heiko,
Here is a pair of patches to replace the RFC I recently sent [1].
Patch 1 performs the host/guest access register swap that Christian suggested (instead of a full sync_reg/store_reg process).
Patch 2 provides a selftest patch that exercises this scenario. Applying patch 2 without patch 1 fails in the following way:
[eric@host linux]# ./tools/testing/selftests/kvm/s390x/memop TAP version 13 1..16 ok 1 simple copy ok 2 generic error checks ok 3 copy with storage keys ok 4 cmpxchg with storage keys ok 5 concurrently cmpxchg with storage keys ok 6 copy with key storage protection override ok 7 copy with key fetch protection ok 8 copy with key fetch protection override ==== Test Assertion Failure ==== s390x/memop.c:186: !r pid=5720 tid=5720 errno=4 - Interrupted system call 1 0x00000000010042af: memop_ioctl at memop.c:186 (discriminator 3) 2 0x0000000001006697: test_copy_access_register at memop.c:400 (discriminator 2) 3 0x0000000001002aaf: main at memop.c:1181 4 0x000003ffaec33a5b: ?? ??:0 5 0x000003ffaec33b5d: ?? ??:0 6 0x0000000001002ba9: _start at ??:? KVM_S390_MEM_OP failed, rc: 40 errno: 4 (Interrupted system call)
Thoughts on this approach?
Thanks, Eric
[1] https://lore.kernel.org/r/20240131205832.2179029-1-farman@linux.ibm.com/
Eric Farman (2): KVM: s390: load guest access registers in MEM_OP ioctl KVM: s390: selftests: memop: add a simple AR test
arch/s390/kvm/kvm-s390.c | 6 +++++ tools/testing/selftests/kvm/s390x/memop.c | 28 +++++++++++++++++++++++ 2 files changed, 34 insertions(+)
The routine ar_translation() is called by get_vcpu_asce(), which is called by both the instruction intercept path (where the access registers had been loaded with the guest's values), and the MEM_OP ioctl (which hadn't). This means that any ALET the guest expects to be used would be ignored.
Furthermore, the logic in ar_translation() will store the contents of the access registers back to the KVM_RUN struct. This unexpected change of AR values can lead to problems after invoking the MEM_OP, for example an ALET Specification Exception.
Fix this by swapping the host/guest access registers around the MEM_OP ioctl, in the same way that the KVM_RUN ioctl does with sync_regs()/store_regs(). The full register swap isn't needed here, since only the access registers are used in this interface.
Suggested-by: Christian Borntraeger borntraeger@linux.ibm.com Signed-off-by: Eric Farman farman@linux.ibm.com --- arch/s390/kvm/kvm-s390.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ea63ac769889..c2dfeea55dcf 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -5391,6 +5391,10 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcpu, return -ENOMEM; }
+ /* Swap host/guest access registers in the event of a MEM_OP with AR specified */ + save_access_regs(vcpu->arch.host_acrs); + restore_access_regs(vcpu->run->s.regs.acrs); + acc_mode = mop->op == KVM_S390_MEMOP_LOGICAL_READ ? GACC_FETCH : GACC_STORE; if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, @@ -5420,6 +5424,8 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcpu, kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm);
out_free: + save_access_regs(vcpu->run->s.regs.acrs); + restore_access_regs(vcpu->arch.host_acrs); vfree(tmpbuf); return r; }
On Fri, Feb 09, 2024 at 09:45:38PM +0100, Eric Farman wrote:
The routine ar_translation() is called by get_vcpu_asce(), which is called by both the instruction intercept path (where the access registers had been loaded with the guest's values), and the MEM_OP ioctl (which hadn't). This means that any ALET the guest expects to be used would be ignored.
Furthermore, the logic in ar_translation() will store the contents of the access registers back to the KVM_RUN struct. This unexpected change of AR values can lead to problems after invoking the MEM_OP, for example an ALET Specification Exception.
Fix this by swapping the host/guest access registers around the MEM_OP ioctl, in the same way that the KVM_RUN ioctl does with sync_regs()/store_regs(). The full register swap isn't needed here, since only the access registers are used in this interface.
Suggested-by: Christian Borntraeger borntraeger@linux.ibm.com Signed-off-by: Eric Farman farman@linux.ibm.com
arch/s390/kvm/kvm-s390.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ea63ac769889..c2dfeea55dcf 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -5391,6 +5391,10 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcpu, return -ENOMEM; }
- /* Swap host/guest access registers in the event of a MEM_OP with AR specified */
- save_access_regs(vcpu->arch.host_acrs);
- restore_access_regs(vcpu->run->s.regs.acrs);
- acc_mode = mop->op == KVM_S390_MEMOP_LOGICAL_READ ? GACC_FETCH : GACC_STORE; if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size,
@@ -5420,6 +5424,8 @@ static long kvm_s390_vcpu_mem_op(struct kvm_vcpu *vcpu, kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm); out_free:
- save_access_regs(vcpu->run->s.regs.acrs);
- restore_access_regs(vcpu->arch.host_acrs);
I guess we will end up with more and more of such constructs until nobody knows when which register contents are loaded. I would higly prefer a TIF flag which indicates if the access registers contain the host or guest register contents, and actual users grab the required content from the correct location - or better: always take it from guest save area, and write to the guest save area if the to be invented TIF flag indicates that access registers contain guest registers...
Or maybe a TIF flag with different semantics: "guest save area does not reflect current state - which is within registers".
On Mon, Feb 12, 2024 at 11:21:30AM +0100, Heiko Carstens wrote:
Or maybe a TIF flag with different semantics: "guest save area does not reflect current state - which is within registers".
Something like the below; untested of course. But I guess there must be some arch specific vcpu flags, which can be used to achieve the same?
diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index a674c7d25da5..b9ff8b125fb8 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -69,6 +69,7 @@ void arch_setup_new_exec(void); #define TIF_PATCH_PENDING 5 /* pending live patching update */ #define TIF_PGSTE 6 /* New mm's will use 4K page tables */ #define TIF_NOTIFY_SIGNAL 7 /* signal notifications exist */ +#define TIF_KVM_ACRS 8 /* access registers contain guest content */ #define TIF_ISOLATE_BP_GUEST 9 /* Run KVM guests with isolated BP */ #define TIF_PER_TRAP 10 /* Need to handle PER trap on exit to usermode */
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 5bfcc50c1a68..b0ef242d2371 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -391,7 +391,8 @@ static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar, if (ar >= NUM_ACRS) return -EINVAL;
- save_access_regs(vcpu->run->s.regs.acrs); + if (test_thread_flag(TIF_KVM_ACRS)) + save_access_regs(vcpu->run->s.regs.acrs); alet.val = vcpu->run->s.regs.acrs[ar];
if (ar == 0 || alet.val == 0) { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ea63ac769889..3ee0913639d5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4951,6 +4951,7 @@ static void sync_regs(struct kvm_vcpu *vcpu) } save_access_regs(vcpu->arch.host_acrs); restore_access_regs(vcpu->run->s.regs.acrs); + set_thread_flag(TIF_KVM_ACRS); /* save host (userspace) fprs/vrs */ save_fpu_regs(); vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc; @@ -5020,6 +5021,7 @@ static void store_regs(struct kvm_vcpu *vcpu) kvm_run->s.regs.pfs = vcpu->arch.pfault_select; kvm_run->s.regs.pfc = vcpu->arch.pfault_compare; save_access_regs(vcpu->run->s.regs.acrs); + clear_thread_flag(TIF_KVM_ACRS); restore_access_regs(vcpu->arch.host_acrs); /* Save guest register state */ save_fpu_regs();
On Mon, 2024-02-12 at 12:52 +0100, Heiko Carstens wrote:
On Mon, Feb 12, 2024 at 11:21:30AM +0100, Heiko Carstens wrote:
Or maybe a TIF flag with different semantics: "guest save area does not reflect current state - which is within registers".
Something like the below; untested of course.
Ooops, yeah. Christian suggested something similar in his first response to the RFC which I'd overlooked.
But I guess there must be some arch specific vcpu flags, which can be used to achieve the same?
Agreed. Putting something there probably makes sense to keep it in the KVM sphere
diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index a674c7d25da5..b9ff8b125fb8 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -69,6 +69,7 @@ void arch_setup_new_exec(void); #define TIF_PATCH_PENDING 5 /* pending live patching update */ #define TIF_PGSTE 6 /* New mm's will use 4K page tables */ #define TIF_NOTIFY_SIGNAL 7 /* signal notifications exist */ +#define TIF_KVM_ACRS 8 /* access registers contain guest content */ #define TIF_ISOLATE_BP_GUEST 9 /* Run KVM guests with isolated BP */ #define TIF_PER_TRAP 10 /* Need to handle PER trap on exit to usermode */ diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 5bfcc50c1a68..b0ef242d2371 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -391,7 +391,8 @@ static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, u8 ar, if (ar >= NUM_ACRS) return -EINVAL;
- save_access_regs(vcpu->run->s.regs.acrs);
- if (test_thread_flag(TIF_KVM_ACRS))
- save_access_regs(vcpu->run->s.regs.acrs);
...or WARN if not set, so that we know of the missing path. Will send this all as a v2. Thanks.
alet.val = vcpu->run->s.regs.acrs[ar]; if (ar == 0 || alet.val == 0) { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ea63ac769889..3ee0913639d5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4951,6 +4951,7 @@ static void sync_regs(struct kvm_vcpu *vcpu) } save_access_regs(vcpu->arch.host_acrs); restore_access_regs(vcpu->run->s.regs.acrs);
- set_thread_flag(TIF_KVM_ACRS);
/* save host (userspace) fprs/vrs */ save_fpu_regs(); vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc; @@ -5020,6 +5021,7 @@ static void store_regs(struct kvm_vcpu *vcpu) kvm_run->s.regs.pfs = vcpu->arch.pfault_select; kvm_run->s.regs.pfc = vcpu->arch.pfault_compare; save_access_regs(vcpu->run->s.regs.acrs);
- clear_thread_flag(TIF_KVM_ACRS);
restore_access_regs(vcpu->arch.host_acrs); /* Save guest register state */ save_fpu_regs();
There is a selftest that checks for an (expected) error when an invalid AR is specified, but not one that exercises the AR path.
Add a simple test that mirrors the vanilla write/read test while providing an AR. An AR that contains zero will direct the CPU to use the primary address space normally used anyway. AR[1] is selected for this test because the host AR[1] is usually non-zero, and KVM needs to correctly swap those values.
Signed-off-by: Eric Farman farman@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 28 +++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index bb3ca9a5d731..be20c26ee545 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -375,6 +375,29 @@ static void test_copy(void) kvm_vm_free(t.kvm_vm); }
+static void test_copy_access_register(void) +{ + struct test_default t = test_default_init(guest_copy); + + HOST_SYNC(t.vcpu, STAGE_INITED); + + prepare_mem12(); + t.run->psw_mask &= ~(3UL << (63 - 17)); + t.run->psw_mask |= 1UL << (63 - 17); /* Enable AR mode */ + + CHECK_N_DO(MOP, t.vcpu, LOGICAL, WRITE, mem1, t.size, + GADDR_V(mem1), AR(1)); + HOST_SYNC(t.vcpu, STAGE_COPIED); + + CHECK_N_DO(MOP, t.vcpu, LOGICAL, READ, mem2, t.size, + GADDR_V(mem2), AR(1)); + ASSERT_MEM_EQ(mem1, mem2, t.size); + + t.run->psw_mask &= ~(3UL << (63 - 17)); /* Disable AR mode */ + + kvm_vm_free(t.kvm_vm); +} + static void set_storage_key_range(void *addr, size_t len, uint8_t key) { uintptr_t _addr, abs, i; @@ -1101,6 +1124,11 @@ int main(int argc, char *argv[]) .test = test_copy_key_fetch_prot_override, .requirements_met = extension_cap > 0, }, + { + .name = "copy with access register mode", + .test = test_copy_access_register, + .requirements_met = true, + }, { .name = "error checks with key", .test = test_errors_key,
linux-kselftest-mirror@lists.linaro.org