When running identity mapped and depending on the kernel configuration, it is possible that cc_platform_has() can have compiler generated code that uses jump tables. This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity mapped addresses. This has been seen with CONFIG_RETPOLINE=n.
Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity mapped.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216385 Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/ Cc: stable@vger.kernel.org # 5.19.x Reported-by: Sean Christopherson seanjc@google.com Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Tom Lendacky thomas.lendacky@amd.com --- arch/x86/kernel/sev.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 63dc626627a0..4f84c3f11af5 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -701,7 +701,13 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { - if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) + /* + * This can be invoked in early boot while running identity mapped, so + * use an open coded check for SNP instead of using cc_platform_has(). + * This eliminates worries about jump tables or checking boot_cpu_data + * in the cc_platform_has() function. + */ + if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED)) return;
/* @@ -717,7 +723,13 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { - if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) + /* + * This can be invoked in early boot while running identity mapped, so + * use an open coded check for SNP instead of using cc_platform_has(). + * This eliminates worries about jump tables or checking boot_cpu_data + * in the cc_platform_has() function. + */ + if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED)) return;
/* Invalidate the memory pages before they are marked shared in the RMP table. */
On 8/23/22 14:55, Tom Lendacky wrote:
When running identity mapped and depending on the kernel configuration, it is possible that cc_platform_has() can have compiler generated code that uses jump tables. This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity mapped addresses. This has been seen with CONFIG_RETPOLINE=n.
So, we don't have *ANY* control over where the compiler uses jump tables. The kernel just happened to add some code that uses them, fell over, and this adds a hack to get booting again.
Isn't this a bigger problem?
On Wed, Aug 24, 2022 at 11:43:10AM -0700, Dave Hansen wrote:
So, we don't have *ANY* control over where the compiler uses jump tables. The kernel just happened to add some code that uses them, fell over, and this adds a hack to get booting again.
Isn't this a bigger problem?
I had the same question already. Was thinking of maybe disabling the compiler from producing jump tables in the ident-mapped code. Tom's argument is that that might prevent the compiler from doing optimizations but I haven't talked to compiler folks whether those optimizations are even worth the effort.
Regardless, the potential problem is limited:
"# (jump-tables are implicitly disabled by RETPOLINE)"
i.e., only RETPOLINE=n builds for now which should be a minority?
I guess when this explodes somewhere else again, we will have to generalize a fix.
Thx.
On 8/24/22 11:48, Borislav Petkov wrote:
On Wed, Aug 24, 2022 at 11:43:10AM -0700, Dave Hansen wrote:
So, we don't have *ANY* control over where the compiler uses jump tables. The kernel just happened to add some code that uses them, fell over, and this adds a hack to get booting again.
Isn't this a bigger problem?
I had the same question already. Was thinking of maybe disabling the compiler from producing jump tables in the ident-mapped code. Tom's argument is that that might prevent the compiler from doing optimizations but I haven't talked to compiler folks whether those optimizations are even worth the effort.
Regardless, the potential problem is limited:
"# (jump-tables are implicitly disabled by RETPOLINE)"
Ahh, I missed the connection with retpoline. The ubiquity of RETPOLINE=y probably means we'll see more of these issues because people won't find them unless they're building and running weirdo configurations.
i.e., only RETPOLINE=n builds for now which should be a minority?
I guess when this explodes somewhere else again, we will have to generalize a fix.
Yep. It also reminds me to add RETPOLINE=n build to my tests.
linux-stable-mirror@lists.linaro.org