From: Ashwin Dayanand Kamat ashwin.kamat@broadcom.com
kernel crash was observed because of page fault, while running cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was observed during hotplug after the CPU was offlined and the process was migrated to different cpu. setup_ghcb() is called again which tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this is a read_only variable which is initialised during booting. This results in pagefault.
From logs, [ 256.447466] BUG: unable to handle page fault for address: ffffffffba556e70 [ 256.447476] #PF: supervisor write access in kernel mode [ 256.447478] #PF: error_code(0x0003) - permissions violation [ 256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1 [ 256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI [ 256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon . . . . . [ 256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0 [ 256.447514] PKRU: 55555554 [ 256.447515] Call Trace: [ 256.447516] <TASK> [ 256.447519] ? __die_body.cold+0x1a/0x1f [ 256.447526] ? __die+0x2a/0x35 [ 256.447528] ? page_fault_oops+0x10c/0x270 [ 256.447531] ? setup_ghcb+0x71/0x100 [ 256.447533] ? __x86_return_thunk+0x5/0x6 [ 256.447537] ? search_exception_tables+0x60/0x70 [ 256.447541] ? __x86_return_thunk+0x5/0x6 [ 256.447543] ? fixup_exception+0x27/0x320 [ 256.447546] ? kernelmode_fixup_or_oops+0xa2/0x120 [ 256.447549] ? __bad_area_nosemaphore+0x16a/0x1b0 [ 256.447551] ? kernel_exc_vmm_communication+0x60/0xb0 [ 256.447556] ? bad_area_nosemaphore+0x16/0x20 [ 256.447558] ? do_kern_addr_fault+0x7a/0x90 [ 256.447560] ? exc_page_fault+0xbd/0x160 [ 256.447563] ? asm_exc_page_fault+0x27/0x30 [ 256.447570] ? setup_ghcb+0x71/0x100 [ 256.447572] ? setup_ghcb+0xe/0x100 [ 256.447574] cpu_init_exception_handling+0x1b9/0x1f0
Fix is to call sev_es_negotiate_protocol() only in the BSP boot phase (and it only needs to be done once)
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active") Co-developed-by: Bo Gan bo.gan@broadcom.com Signed-off-by: Bo Gan bo.gan@broadcom.com Signed-off-by: Ashwin Dayanand Kamat ashwin.kamat@broadcom.com --- v2: As per the review comments given by Tom Lendacky, did below changes in v2, - Moved sev_es_negotiate_protocol() after initial_vc_handler if-check in setup_ghcb() - Added Signed-off of Co-developer --- arch/x86/kernel/sev.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 70472eebe719..c67285824e82 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -1234,10 +1234,6 @@ void setup_ghcb(void) if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) return;
- /* First make sure the hypervisor talks a supported protocol. */ - if (!sev_es_negotiate_protocol()) - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); - /* * Check whether the runtime #VC exception handler is active. It uses * the per-CPU GHCB page which is set up by sev_es_init_vc_handling(). @@ -1254,6 +1250,13 @@ void setup_ghcb(void) return; }
+ /* + * Make sure the hypervisor talks a supported protocol. + * This gets called only in the BSP boot phase. + */ + if (!sev_es_negotiate_protocol()) + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); + /* * Clear the boot_ghcb. The first exception comes in before the bss * section is cleared.