The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 indicates that MTE_ASYNC is supported. On a host with ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the MTE capability enabled will incorrectly see MTE_ASYNC advertised as supported. This series fixes that.
This was found by inspection and the current behaviour is not known to break anything. Linux doesn't check MTE_frac, and wrongly, assumes MTE async faults can be generated whenever MTE is supported. This is a separate problem and not addressed here.
I am looking for feedback on whether this change is valuable or otherwise.
Ben Horgan (3): arm64/sysreg: Expose MTE_frac so that it is visible to KVM KVM: arm64: Make MTE_frac masking conditional on MTE capability KVM: selftests: Confirm exposing MTE_frac does not break migration
arch/arm64/kernel/cpufeature.c | 1 + arch/arm64/kvm/sys_regs.c | 26 ++++++- .../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++- 3 files changed, 101 insertions(+), 3 deletions(-)
base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
KVM exposes the sanitised ID registers to guests. Currently these ignore the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of zero.
This is a problem for platforms without the MTE_ASYNC feature where ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when no async fault will ever occur.
Before KVM can fix this, the architecture needs to sanitise the ID register field for MTE_frac.
Linux itself does not use MTE_frac field and just assumes MTE async faults can be generated if MTE is supported.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- arch/arm64/kernel/cpufeature.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9c4d6d552b25..e952f4b07ce1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -298,6 +298,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_GCS), FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_GCS_SHIFT, 4, 0), + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MTE_frac_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_SME_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MPAM_frac_SHIFT, 4, 0),
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- arch/arm64/kvm/sys_regs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 005ad28f7306..9ae647082684 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1600,13 +1600,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, val = sanitise_id_aa64pfr0_el1(vcpu, val); break; case SYS_ID_AA64PFR1_EL1: - if (!kvm_has_mte(vcpu->kvm)) + if (!kvm_has_mte(vcpu->kvm)) { val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); + }
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI); - val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX); @@ -1953,11 +1954,32 @@ static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu, { u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK; + u8 mte = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE, hw_val); + u8 user_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, user_val);
/* See set_id_aa64pfr0_el1 for comment about MPAM */ if ((hw_val & mpam_mask) == (user_val & mpam_mask)) user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK;
+ /* + * Previously MTE_frac was hidden from guest. However, if the + * hardware supports MTE2 but not MTE_ASYM_FAULT then a value + * of 0 for this field indicates that the hardware supports + * MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported. + * + * As KVM must accept values from KVM provided by user-space, + * when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set + * ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid + * incorrectly claiming hardware support for MTE_ASYNC in the + * guest. + */ + + if (mte == ID_AA64PFR1_EL1_MTE_MTE2 && + user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) { + user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; + user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK; + } + return set_id_reg(vcpu, rd, user_val); }
On Mon, 14 Apr 2025 13:40:58 +0100, Ben Horgan ben.horgan@arm.com wrote:
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan ben.horgan@arm.com
arch/arm64/kvm/sys_regs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 005ad28f7306..9ae647082684 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1600,13 +1600,14 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, val = sanitise_id_aa64pfr0_el1(vcpu, val); break; case SYS_ID_AA64PFR1_EL1:
if (!kvm_has_mte(vcpu->kvm))
if (!kvm_has_mte(vcpu->kvm)) { val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac);
}
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX);val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac);
@@ -1953,11 +1954,32 @@ static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu, { u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK;
- u8 mte = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE, hw_val);
- u8 user_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, user_val);
/* See set_id_aa64pfr0_el1 for comment about MPAM */ if ((hw_val & mpam_mask) == (user_val & mpam_mask)) user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK;
- /*
* Previously MTE_frac was hidden from guest. However, if the
* hardware supports MTE2 but not MTE_ASYM_FAULT then a value
* of 0 for this field indicates that the hardware supports
* MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported.
*
* As KVM must accept values from KVM provided by user-space,
* when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set
* ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid
* incorrectly claiming hardware support for MTE_ASYNC in the
* guest.
*/
- if (mte == ID_AA64PFR1_EL1_MTE_MTE2 &&
The spec says that MTE_frac is valid if ID_AA64PFR1_EL1.MTE >= 0b0010. Not strictly equal to 0b0010 (which represents MTE2). Crucially, MTE3 should receive the same treatment.
user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) {
user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK;
user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK;
This means you are unconditionally propagating what the HW supports, which feels dodgy, specially considering that we don't know how MTE_frac is going to evolve in the future.
I think you should limit the fix to the exact case we're mitigating here, not blindly overwrite the guest's view with the HW's capability.
Thanks,
M.
Hi Marc,
On 4/27/25 18:24, Marc Zyngier wrote:
On Mon, 14 Apr 2025 13:40:58 +0100, Ben Horgan ben.horgan@arm.com wrote:
[...]
- /*
* Previously MTE_frac was hidden from guest. However, if the
* hardware supports MTE2 but not MTE_ASYM_FAULT then a value
* of 0 for this field indicates that the hardware supports
* MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported.
*
* As KVM must accept values from KVM provided by user-space,
* when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set
* ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid
* incorrectly claiming hardware support for MTE_ASYNC in the
* guest.
*/
- if (mte == ID_AA64PFR1_EL1_MTE_MTE2 &&
The spec says that MTE_frac is valid if ID_AA64PFR1_EL1.MTE >= 0b0010. Not strictly equal to 0b0010 (which represents MTE2). Crucially, MTE3 should receive the same treatment.
This is specific to MTE2 as when MTE3 is supported MTE_ASYM_FAULT is also supported and when MTE_ASYM_FAULT is supported the spec says MTE_frac is 0.
user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) {
user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK;
user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK;
This means you are unconditionally propagating what the HW supports, which feels dodgy, specially considering that we don't know how MTE_frac is going to evolve in the future.
I think you should limit the fix to the exact case we're mitigating here, not blindly overwrite the guest's view with the HW's capability.
Sure, better safe than sorry. I can update the if condition to the below.
u8 hw_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, hw_val); ... if (mte == ID_AA64PFR1_EL1_MTE_MTE2 && hw_mte_frac == ID_AA64PFR1_EL1_MTE_frac_NI && user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC)
Thanks,
M.
Thanks,
Ben
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest and user-space, check that using SET_ONE_REG to set it to 0 succeeds but does not change MTE_frac from unsupported (0xF) to supported (0). This is required as values originating from KVM from user-space must be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan ben.horgan@arm.com --- .../testing/selftests/kvm/arm64/set_id_regs.c | 77 ++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/set_id_regs.c b/tools/testing/selftests/kvm/arm64/set_id_regs.c index 322b9d3b0125..34f4174e7285 100644 --- a/tools/testing/selftests/kvm/arm64/set_id_regs.c +++ b/tools/testing/selftests/kvm/arm64/set_id_regs.c @@ -15,6 +15,8 @@ #include "test_util.h" #include <linux/bitfield.h>
+bool have_cap_arm_mte; + enum ftr_type { FTR_EXACT, /* Use a predefined safe value */ FTR_LOWER_SAFE, /* Smaller value is safe */ @@ -543,6 +545,70 @@ static void test_user_set_mpam_reg(struct kvm_vcpu *vcpu) ksft_test_result_fail("ID_AA64PFR1_EL1.MPAM_frac value should not be ignored\n"); }
+#define MTE_IDREG_TEST 1 +static void test_user_set_mte_reg(struct kvm_vcpu *vcpu) +{ + uint64_t masks[KVM_ARM_FEATURE_ID_RANGE_SIZE]; + struct reg_mask_range range = { + .addr = (__u64)masks, + }; + uint64_t val; + uint64_t mte; + uint64_t mte_frac; + int idx, err; + + if (!have_cap_arm_mte) { + ksft_test_result_skip("MTE capability not supported, nothing to test\n"); + return; + } + + /* Get writable masks for feature ID registers */ + memset(range.reserved, 0, sizeof(range.reserved)); + vm_ioctl(vcpu->vm, KVM_ARM_GET_REG_WRITABLE_MASKS, &range); + + idx = encoding_to_range_idx(SYS_ID_AA64PFR1_EL1); + if ((masks[idx] & ID_AA64PFR1_EL1_MTE_frac_MASK) == ID_AA64PFR1_EL1_MTE_frac_MASK) { + ksft_test_result_skip("ID_AA64PFR1_EL1.MTE_frac is officially writable, nothing to test\n"); + return; + } + + /* + * When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) + * ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported + * and MTE_frac == 0 indicates it is supported. + * + * As MTE_frac was previously unconditionally read as 0, check + * that the set to 0 succeeds but does not change MTE_frac + * from unsupported (0xF) to supported (0). + * + */ + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); + + mte = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE), val); + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); + if (mte != ID_AA64PFR1_EL1_MTE_MTE2 || + mte_frac != ID_AA64PFR1_EL1_MTE_frac_NI) { + ksft_test_result_skip("MTE_ASYNC or MTE_ASYMM are supported, nothing to test\n"); + return; + } + + /* Try to set MTE_frac=0. */ + val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; + val |= FIELD_PREP(ID_AA64PFR1_EL1_MTE_frac_MASK, 0); + err = __vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1), val); + if (err) { + ksft_test_result_fail("ID_AA64PFR1_EL1.MTE_frac=0 was not accepted\n"); + return; + } + + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); + if (mte_frac == ID_AA64PFR1_EL1_MTE_frac_NI) + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac=0 accepted and still 0xF\n"); + else + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac no longer 0xF\n"); +} + static void test_guest_reg_read(struct kvm_vcpu *vcpu) { bool done = false; @@ -673,6 +739,14 @@ static void test_reset_preserves_id_regs(struct kvm_vcpu *vcpu) ksft_test_result_pass("%s\n", __func__); }
+void kvm_arch_vm_post_create(struct kvm_vm *vm) +{ + if (vm_check_cap(vm, KVM_CAP_ARM_MTE)) { + vm_enable_cap(vm, KVM_CAP_ARM_MTE, 0); + have_cap_arm_mte = true; + } +} + int main(void) { struct kvm_vcpu *vcpu; @@ -701,7 +775,7 @@ int main(void) ARRAY_SIZE(ftr_id_aa64pfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) - ARRAY_SIZE(test_regs) + 3 + - MPAM_IDREG_TEST; + MPAM_IDREG_TEST + MTE_IDREG_TEST;
ksft_set_plan(test_cnt);
@@ -709,6 +783,7 @@ int main(void) test_vcpu_ftr_id_regs(vcpu); test_vcpu_non_ftr_id_regs(vcpu); test_user_set_mpam_reg(vcpu); + test_user_set_mte_reg(vcpu);
test_guest_reg_read(vcpu);
linux-kselftest-mirror@lists.linaro.org