In commit 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR later in the boot process in order to ensure that we don't attempt to interact with them if SVE or SME is disabled on the command line. Unfortunately when initialising the boot CPU in init_cpu_features() we work on a copy of the struct cpuinfo_arm64 for the boot CPU used only during boot, not the percpu copy used by the sysfs code.
Fix this by moving the handling for SMIDR_EL1 for the boot CPU to cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data. This reduces the potential for error that could come from having both the percpu and boot CPU copies in init_cpu_features().
This issue wasn't apparent when testing on emulated platforms that do not report values in this ID register.
Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org --- arch/arm64/kernel/cpufeature.c | 6 ------ arch/arm64/kernel/cpuinfo.c | 11 +++++++++++ 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { unsigned long cpacr = cpacr_save_enable_kernel_sme();
- /* - * We mask out SMPS since even if the hardware - * supports priorities the kernel does not at present - * and we block access to them. - */ - info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS; vec_init_vq_map(ARM64_VEC_SME);
cpacr_restore(cpacr); diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -499,4 +499,15 @@ void __init cpuinfo_store_boot_cpu(void)
boot_cpu_data = *info; init_cpu_features(&boot_cpu_data); + + /* SMIDR_EL1 needs to be stored in the percpu data for sysfs */ + if (IS_ENABLED(CONFIG_ARM64_SME) && + id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { + /* + * We mask out SMPS since even if the hardware + * supports priorities the kernel does not at present + * and we block access to them. + */ + info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS; + } }
--- base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 change-id: 20241213-arm64-fix-boot-cpu-smidr-386b8db292b2
Best regards,
[+ Mark] On Sat, 14 Dec 2024 00:52:08 +0000, Mark Brown broonie@kernel.org wrote:
In commit 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR later in the boot process in order to ensure that we don't attempt to interact with them if SVE or SME is disabled on the command line. Unfortunately when initialising the boot CPU in init_cpu_features() we work on a copy of the struct cpuinfo_arm64 for the boot CPU used only during boot, not the percpu copy used by the sysfs code.
Fix this by moving the handling for SMIDR_EL1 for the boot CPU to cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data. This reduces the potential for error that could come from having both the percpu and boot CPU copies in init_cpu_features().
This issue wasn't apparent when testing on emulated platforms that do not report values in this ID register.
Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org
arch/arm64/kernel/cpufeature.c | 6 ------ arch/arm64/kernel/cpuinfo.c | 11 +++++++++++ 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { unsigned long cpacr = cpacr_save_enable_kernel_sme();
/*
* We mask out SMPS since even if the hardware
* supports priorities the kernel does not at present
* and we block access to them.
*/
vec_init_vq_map(ARM64_VEC_SME);info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
cpacr_restore(cpacr); diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -499,4 +499,15 @@ void __init cpuinfo_store_boot_cpu(void) boot_cpu_data = *info; init_cpu_features(&boot_cpu_data);
- /* SMIDR_EL1 needs to be stored in the percpu data for sysfs */
- if (IS_ENABLED(CONFIG_ARM64_SME) &&
id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
/*
* We mask out SMPS since even if the hardware
* supports priorities the kernel does not at present
* and we block access to them.
*/
info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
- }
}
I don't understand the need to single out SMIDR_EL1. It seems to only make things even more fragile than they already are by adding more synchronisation phases.
Why isn't the following a good enough fix? It makes it plain that boot_cpu_data is only a copy of CPU0's initial boot state.
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..0cbb42fd48850 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void) struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0); __cpuinfo_store_cpu(info);
+ init_cpu_features(info); boot_cpu_data = *info; - init_cpu_features(&boot_cpu_data); }
Thanks,
M.
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
I don't understand the need to single out SMIDR_EL1. It seems to only make things even more fragile than they already are by adding more synchronisation phases.
Why isn't the following a good enough fix? It makes it plain that boot_cpu_data is only a copy of CPU0's initial boot state.
That would work but it's not clear to me that that is what the intent is here. The current ordering seemed like a strange enough decision to be deliberate, though I couldn't identify the reasoning.
On Mon, Dec 16, 2024 at 12:17:54PM +0000, Mark Brown wrote:
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
I don't understand the need to single out SMIDR_EL1. It seems to only make things even more fragile than they already are by adding more synchronisation phases.
Why isn't the following a good enough fix? It makes it plain that boot_cpu_data is only a copy of CPU0's initial boot state.
That would work but it's not clear to me that that is what the intent is here. The current ordering seemed like a strange enough decision to be deliberate, though I couldn't identify the reasoning.
The original intent was that __cpuinfo_store_cpu() read *all* of a CPU's implemented ID regs, and init_cpu_features() initialised the expected system features based on the boot CPU's ID regs.
The expectation was that init_cpu_features() would only consume the register values, and would not alter the cpuinfo_arm64 values, so the order of:
boot_cpu_data = *info; init_cpu_features(&boot_cpu_data);
... didn't matter either way, and using '&boot_cpu_data' was intended to make it clear that the features were based on the boot CPU's info, even if you just grepped for that and didn't see the surrounding context.
I think the real fix here is to move the reading back into __cpuinfo_store_cpu(), but to have an explicit check that SME has been disabled on the commandline, with a comment explaining that this is a bodge for broken FW which traps the SME ID regs.
Mark.
On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
... didn't matter either way, and using '&boot_cpu_data' was intended to make it clear that the features were based on the boot CPU's info, even if you just grepped for that and didn't see the surrounding context.
Right, that was my best guess as to what was supposed to be going on but it wasn't super clear. The code could use some more comments.
I think the real fix here is to move the reading back into __cpuinfo_store_cpu(), but to have an explicit check that SME has been disabled on the commandline, with a comment explaining that this is a bodge for broken FW which traps the SME ID regs.
That should be doable.
There's a few other similar ID registers (eg, we already read GMID_EL1 and MPAMIDR_EL1) make me a bit nervous that we might need to generalise it a bit, but we can deal with that if it comes up. Even for SME the disable was added speculatively, the factors that made this come up for SVE are less likely to be an issue with SME.
On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
... didn't matter either way, and using '&boot_cpu_data' was intended to make it clear that the features were based on the boot CPU's info, even if you just grepped for that and didn't see the surrounding context.
Right, that was my best guess as to what was supposed to be going on but it wasn't super clear. The code could use some more comments.
I think the real fix here is to move the reading back into __cpuinfo_store_cpu(), but to have an explicit check that SME has been disabled on the commandline, with a comment explaining that this is a bodge for broken FW which traps the SME ID regs.
That should be doable.
There's a few other similar ID registers (eg, we already read GMID_EL1 and MPAMIDR_EL1) make me a bit nervous that we might need to generalise it a bit, but we can deal with that if it comes up. Even for SME the disable was added speculatively, the factors that made this come up for SVE are less likely to be an issue with SME.
FWIW, I had a quick go (with only the SME case), and I think the shape that we want is roughly as below, which I think is easy to generalise to those other cases.
MarcZ, thoughts?
Mark.
---->8---- diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 8b4e5a3cd24c8..f16eb99c10723 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -91,6 +91,16 @@ struct arm64_ftr_override { u64 mask; };
+static inline u64 +arm64_ftr_override_apply(const struct arm64_ftr_override *override, + u64 val) +{ + val &= ~override->mask; + val |= override->val & override->mask; + + return val; +} + /* * @arm64_ftr_reg - Feature register * @strict_mask Bits which should match across all CPUs for sanity. diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6ce71f444ed84..faad7d3e4cf5f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { unsigned long cpacr = cpacr_save_enable_kernel_sme();
- /* - * We mask out SMPS since even if the hardware - * supports priorities the kernel does not at present - * and we block access to them. - */ - info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS; vec_init_vq_map(ARM64_VEC_SME);
cpacr_restore(cpacr); @@ -1550,10 +1544,8 @@ u64 __read_sysreg_by_encoding(u32 sys_id) }
regp = get_arm64_ftr_reg(sys_id); - if (regp) { - val &= ~regp->override->mask; - val |= (regp->override->val & regp->override->mask); - } + if (regp) + val = arm64_ftr_override_apply(regp->override, val);
return val; } diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..1460e3541132f 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -439,6 +439,24 @@ static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info) info->reg_mvfr2 = read_cpuid(MVFR2_EL1); }
+static void __cpuinfo_store_cpu_sme(struct cpuinfo_arm64 *info) +{ + /* + * TODO: explain that this bodge is to avoid trapping. + */ + u64 pfr1 = info->reg_id_aa64pfr1; + pfr1 = arm64_ftr_override_apply(&id_aa64pfr1_override, pfr1); + if (!id_aa64pfr1_sme(pfr1)) + return; + + /* + * We mask out SMPS since even if the hardware + * supports priorities the kernel does not at present + * and we block access to them. + */ + info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS; +} + static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) { info->reg_cntfrq = arch_timer_get_cntfrq(); @@ -482,6 +500,8 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) info->reg_mpamidr = read_cpuid(MPAMIDR_EL1);
+ __cpuinfo_store_cpu_sme(info); + cpuinfo_detect_icache_policy(info); }
On Mon, 16 Dec 2024 14:31:47 +0000, Mark Rutland mark.rutland@arm.com wrote:
On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
... didn't matter either way, and using '&boot_cpu_data' was intended to make it clear that the features were based on the boot CPU's info, even if you just grepped for that and didn't see the surrounding context.
Right, that was my best guess as to what was supposed to be going on but it wasn't super clear. The code could use some more comments.
I think the real fix here is to move the reading back into __cpuinfo_store_cpu(), but to have an explicit check that SME has been disabled on the commandline, with a comment explaining that this is a bodge for broken FW which traps the SME ID regs.
That should be doable.
There's a few other similar ID registers (eg, we already read GMID_EL1 and MPAMIDR_EL1) make me a bit nervous that we might need to generalise it a bit, but we can deal with that if it comes up. Even for SME the disable was added speculatively, the factors that made this come up for SVE are less likely to be an issue with SME.
FWIW, I had a quick go (with only the SME case), and I think the shape that we want is roughly as below, which I think is easy to generalise to those other cases.
MarcZ, thoughts?
Mark.
---->8---- diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 8b4e5a3cd24c8..f16eb99c10723 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -91,6 +91,16 @@ struct arm64_ftr_override { u64 mask; }; +static inline u64 +arm64_ftr_override_apply(const struct arm64_ftr_override *override,
u64 val)
+{
- val &= ~override->mask;
- val |= override->val & override->mask;
- return val;
+}
/*
- @arm64_ftr_reg - Feature register
- @strict_mask Bits which should match across all CPUs for sanity.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6ce71f444ed84..faad7d3e4cf5f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { unsigned long cpacr = cpacr_save_enable_kernel_sme();
/*
* We mask out SMPS since even if the hardware
* supports priorities the kernel does not at present
* and we block access to them.
*/
vec_init_vq_map(ARM64_VEC_SME);info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
cpacr_restore(cpacr); @@ -1550,10 +1544,8 @@ u64 __read_sysreg_by_encoding(u32 sys_id) } regp = get_arm64_ftr_reg(sys_id);
- if (regp) {
val &= ~regp->override->mask;
val |= (regp->override->val & regp->override->mask);
- }
- if (regp)
val = arm64_ftr_override_apply(regp->override, val);
return val; } diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..1460e3541132f 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -439,6 +439,24 @@ static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info) info->reg_mvfr2 = read_cpuid(MVFR2_EL1); } +static void __cpuinfo_store_cpu_sme(struct cpuinfo_arm64 *info) +{
- /*
* TODO: explain that this bodge is to avoid trapping.
*/
- u64 pfr1 = info->reg_id_aa64pfr1;
- pfr1 = arm64_ftr_override_apply(&id_aa64pfr1_override, pfr1);
- if (!id_aa64pfr1_sme(pfr1))
return;
I don't think blindly applying the override at this stage is a good thing. Specially not the whole register, and definitely not non-disabling values.
It needs to be done on a per feature basis, and only to disable things.
See the hack I posted for the things I think need checking.
M.
On Mon, Dec 16, 2024 at 02:44:07PM +0000, Marc Zyngier wrote:
On Mon, 16 Dec 2024 14:31:47 +0000, Mark Rutland mark.rutland@arm.com wrote:
On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
... didn't matter either way, and using '&boot_cpu_data' was intended to make it clear that the features were based on the boot CPU's info, even if you just grepped for that and didn't see the surrounding context.
Right, that was my best guess as to what was supposed to be going on but it wasn't super clear. The code could use some more comments.
I think the real fix here is to move the reading back into __cpuinfo_store_cpu(), but to have an explicit check that SME has been disabled on the commandline, with a comment explaining that this is a bodge for broken FW which traps the SME ID regs.
That should be doable.
There's a few other similar ID registers (eg, we already read GMID_EL1 and MPAMIDR_EL1) make me a bit nervous that we might need to generalise it a bit, but we can deal with that if it comes up. Even for SME the disable was added speculatively, the factors that made this come up for SVE are less likely to be an issue with SME.
FWIW, I had a quick go (with only the SME case), and I think the shape that we want is roughly as below, which I think is easy to generalise to those other cases.
MarcZ, thoughts?
Mark.
[... dodgy patch was here ...]
I don't think blindly applying the override at this stage is a good thing. Specially not the whole register, and definitely not non-disabling values.
It needs to be done on a per feature basis, and only to disable things.
See the hack I posted for the things I think need checking.
Understood; sorry for the noise -- we raced when replying and I only spotted your reply after sending this. I think I'm more in favour of the revert option now; I've repled with more details at:
https://lore.kernel.org/linux-arm-kernel/Z2BCI61c9QWG7mMB@J2N7QTR9R3.cambrid...
Mark.
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
[+ Mark] On Sat, 14 Dec 2024 00:52:08 +0000, Mark Brown broonie@kernel.org wrote:
In commit 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR later in the boot process in order to ensure that we don't attempt to interact with them if SVE or SME is disabled on the command line. Unfortunately when initialising the boot CPU in init_cpu_features() we work on a copy of the struct cpuinfo_arm64 for the boot CPU used only during boot, not the percpu copy used by the sysfs code.
Fix this by moving the handling for SMIDR_EL1 for the boot CPU to cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data. This reduces the potential for error that could come from having both the percpu and boot CPU copies in init_cpu_features().
This issue wasn't apparent when testing on emulated platforms that do not report values in this ID register.
Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org
arch/arm64/kernel/cpufeature.c | 6 ------ arch/arm64/kernel/cpuinfo.c | 11 +++++++++++ 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) { unsigned long cpacr = cpacr_save_enable_kernel_sme();
/*
* We mask out SMPS since even if the hardware
* supports priorities the kernel does not at present
* and we block access to them.
*/
vec_init_vq_map(ARM64_VEC_SME);info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
cpacr_restore(cpacr); diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -499,4 +499,15 @@ void __init cpuinfo_store_boot_cpu(void) boot_cpu_data = *info; init_cpu_features(&boot_cpu_data);
- /* SMIDR_EL1 needs to be stored in the percpu data for sysfs */
- if (IS_ENABLED(CONFIG_ARM64_SME) &&
id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
/*
* We mask out SMPS since even if the hardware
* supports priorities the kernel does not at present
* and we block access to them.
*/
info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
- }
}
I don't understand the need to single out SMIDR_EL1. It seems to only make things even more fragile than they already are by adding more synchronisation phases.
Why isn't the following a good enough fix? It makes it plain that boot_cpu_data is only a copy of CPU0's initial boot state.
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..0cbb42fd48850 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void) struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0); __cpuinfo_store_cpu(info);
- init_cpu_features(info); boot_cpu_data = *info;
- init_cpu_features(&boot_cpu_data);
}
I think that change in isolation is fine, but I don't think that's the right fix.
I think that what we did in commit:
892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
... introduces an anti-pattern that'd be nice to avoid. That broke the existing split of __cpuinfo_store_cpu() and init_cpu_features(), where the former read the ID regs, and the latter set up the features *without* altering the copy of the ID regs that was read. i.e. init_cpu_features() shouldn't write to its info argument at all.
I understand that we have to do something as a bodge for broken FW which traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
Can we add something to check whether SME was disabled on the command line, and use that in __cpuinfo_store_cpu(), effectively reverting 892f7237b3ff?
Mark.
On Mon, 16 Dec 2024 12:38:17 +0000, Mark Rutland mark.rutland@arm.com wrote:
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
Why isn't the following a good enough fix? It makes it plain that boot_cpu_data is only a copy of CPU0's initial boot state.
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..0cbb42fd48850 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void) struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0); __cpuinfo_store_cpu(info);
- init_cpu_features(info); boot_cpu_data = *info;
- init_cpu_features(&boot_cpu_data);
}
I think that change in isolation is fine, but I don't think that's the right fix.
I think that what we did in commit:
892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
... introduces an anti-pattern that'd be nice to avoid. That broke the existing split of __cpuinfo_store_cpu() and init_cpu_features(), where the former read the ID regs, and the latter set up the features *without* altering the copy of the ID regs that was read. i.e. init_cpu_features() shouldn't write to its info argument at all.
I understand that we have to do something as a bodge for broken FW which traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
Honestly, I'd rather revert that patch, together with b3000e2133d8 ("arm64: Add the arm64.nosme command line option"). I'm getting tired of the FW nonsense, and we are only allowing vendors to ship untested crap.
Furthermore, given the state of SME in the kernel, I don't think this is makes any difference. So maybe this is the right time to reset everything to a sane state.
Can we add something to check whether SME was disabled on the command line, and use that in __cpuinfo_store_cpu(), effectively reverting 892f7237b3ff?
Maybe, but that'd be before any sanitisation of the overrides, so it would have to severely limit its scope. Something like this, which I haven't tested:
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..9e9295e045009 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -492,10 +492,22 @@ void cpuinfo_store_cpu(void) update_cpu_features(smp_processor_id(), info, &boot_cpu_data); }
+static void cpuinfo_apply_overrides(struct cpuinfo_arm64 *info) +{ + if (FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.mask) && + !FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.val)) + info->reg_id_aa64pfr0 &= ~ID_AA64PFR0_EL1_SVE; + + if (FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.mask) && + !FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.val)) + info->reg_id_aa64pfr1 &= ~ID_AA64PFR1_EL1_SME; +} + void __init cpuinfo_store_boot_cpu(void) { struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0); __cpuinfo_store_cpu(info); + cpuinfo_apply_overrides(info);
boot_cpu_data = *info; init_cpu_features(&boot_cpu_data);
But this will have ripple effects on the rest of the override code (the kernel messages are likely to be wrong).
M.
On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
Mark Rutland mark.rutland@arm.com wrote:
I understand that we have to do something as a bodge for broken FW which traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
Honestly, I'd rather revert that patch, together with b3000e2133d8 ("arm64: Add the arm64.nosme command line option"). I'm getting tired of the FW nonsense, and we are only allowing vendors to ship untested crap.
I'd certainly be happy to remove the override for SME, the circumstances that lead to the need to override SVE are much less likely to occur with SME. We can add it again later if there's a need for it.
Furthermore, given the state of SME in the kernel, I don't think this is makes any difference. So maybe this is the right time to reset everything to a sane state.
I'm not aware of any issues that don't have fixes on the list (the fixes have all been on the list for about a month, apart from this one).
On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
On Mon, 16 Dec 2024 12:38:17 +0000, Mark Rutland mark.rutland@arm.com wrote:
I think that what we did in commit:
892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
... introduces an anti-pattern that'd be nice to avoid. That broke the existing split of __cpuinfo_store_cpu() and init_cpu_features(), where the former read the ID regs, and the latter set up the features *without* altering the copy of the ID regs that was read. i.e. init_cpu_features() shouldn't write to its info argument at all.
I understand that we have to do something as a bodge for broken FW which traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
Honestly, I'd rather revert that patch, together with b3000e2133d8 ("arm64: Add the arm64.nosme command line option"). I'm getting tired of the FW nonsense, and we are only allowing vendors to ship untested crap.
Furthermore, given the state of SME in the kernel, I don't think this is makes any difference. So maybe this is the right time to reset everything to a sane state.
Looking again, a revert does look to be the best option.
We removed reg_zcr and reg_smcr in v6.7 in commits:
abef0695f9665c3d ("arm64/sve: Remove ZCR pseudo register from cpufeature code") 391208485c3ad50f ("arm64/sve: Remove SMCR pseudo register from cpufeature code")
As of those commits, ZCR and SCMR no longer matter to __cpuinfo_store_cpu(), and only SMIDR_EL1 remains...
Per ARM DDI 0487 L.a, accesses to SMIDR_EL1 never trap to EL3, so we can read that safely as long as ID_AA64PFR1_EL1.SME indicates that SME is implemented.
Which is to say that if we revert the remaining portion of 892f7237b3ff and restore the read of SMIDR, that should be good as far back as v6.7, which sounds good to me.
Mark.
Can we add something to check whether SME was disabled on the command line, and use that in __cpuinfo_store_cpu(), effectively reverting 892f7237b3ff?
Maybe, but that'd be before any sanitisation of the overrides, so it would have to severely limit its scope. Something like this, which I haven't tested:
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index d79e88fccdfce..9e9295e045009 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -492,10 +492,22 @@ void cpuinfo_store_cpu(void) update_cpu_features(smp_processor_id(), info, &boot_cpu_data); } +static void cpuinfo_apply_overrides(struct cpuinfo_arm64 *info) +{
- if (FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.mask) &&
!FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.val))
info->reg_id_aa64pfr0 &= ~ID_AA64PFR0_EL1_SVE;
- if (FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.mask) &&
!FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.val))
info->reg_id_aa64pfr1 &= ~ID_AA64PFR1_EL1_SME;
+}
void __init cpuinfo_store_boot_cpu(void) { struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0); __cpuinfo_store_cpu(info);
- cpuinfo_apply_overrides(info);
boot_cpu_data = *info; init_cpu_features(&boot_cpu_data);
But this will have ripple effects on the rest of the override code (the kernel messages are likely to be wrong).
M.
-- Without deviation from the norm, progress is not possible.
On Mon, 16 Dec 2024 15:07:15 +0000, Mark Rutland mark.rutland@arm.com wrote:
On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
On Mon, 16 Dec 2024 12:38:17 +0000, Mark Rutland mark.rutland@arm.com wrote:
I think that what we did in commit:
892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
... introduces an anti-pattern that'd be nice to avoid. That broke the existing split of __cpuinfo_store_cpu() and init_cpu_features(), where the former read the ID regs, and the latter set up the features *without* altering the copy of the ID regs that was read. i.e. init_cpu_features() shouldn't write to its info argument at all.
I understand that we have to do something as a bodge for broken FW which traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
Honestly, I'd rather revert that patch, together with b3000e2133d8 ("arm64: Add the arm64.nosme command line option"). I'm getting tired of the FW nonsense, and we are only allowing vendors to ship untested crap.
Furthermore, given the state of SME in the kernel, I don't think this is makes any difference. So maybe this is the right time to reset everything to a sane state.
Looking again, a revert does look to be the best option.
We removed reg_zcr and reg_smcr in v6.7 in commits:
abef0695f9665c3d ("arm64/sve: Remove ZCR pseudo register from cpufeature code") 391208485c3ad50f ("arm64/sve: Remove SMCR pseudo register from cpufeature code")
As of those commits, ZCR and SCMR no longer matter to __cpuinfo_store_cpu(), and only SMIDR_EL1 remains...
Per ARM DDI 0487 L.a, accesses to SMIDR_EL1 never trap to EL3, so we can read that safely as long as ID_AA64PFR1_EL1.SME indicates that SME is implemented.
Which is to say that if we revert the remaining portion of 892f7237b3ff and restore the read of SMIDR, that should be good as far back as v6.7, which sounds good to me.
Sounds reasonable indeed. I guess *someone* will want it for the previous kernel versions, but they can have fun with the backport on their own.
M.
linux-stable-mirror@lists.linaro.org