From: Peter Zijlstra peterz@infradead.org Sent: Friday, July 21, 2023 12:59 AM
On Fri, Jul 21, 2023 at 12:41:35AM +0000, Michael Kelley (LINUX) wrote:
Other than that, this seems fairly straight forward. One thing I wondered about; wouldn't it be possible to re-write the indirect hypercall thingies to a direct call? I mean, once we have the hypercall page mapped, the address is known right?
Yes, the address is known. It does not change across things like hibernation. But the indirect call instruction is part of an inline assembly sequence, so the call instructions that need re-writing are scattered throughout the code. There's also the SEV-SNP case from the latest version of Tianyu Lan's patch set [1] where vmmcall may be used instead, based on your recent enhancement for nested ALTERNATIVE. Re-writing seems like that's more complexity than warranted for a mostly interim situation until the Hyper-V patch is available and users install it.
Well, we have a lot of infrastructure for this already. Specifically this is very like the paravirt patching.
Also, direct calls are both faster and have less speculation issues, so it might still be worth looking at.
The way to do something like this would be:
asm volatile (" ANNOTATE_RETPOLINE_SAFE \n\t" "1: call *hv_hypercall_page \n\t" ".pushsection .hv_call_sites \n\t" ".long 1b - . \n\t" ".popsection \n\t");
And then (see alternative.c for many other examples):
patch_hypercalls() { s32 *s;
for (s = __hv_call_sites_begin; s < __hv_call_sites_end; s++) { void *addr = (void *)s + *s; struct insn insn;
ret = insn_decode_kernel(&insn, addr); if (WARN_ON_ONCE(ret < 0)) continue; /* * indirect call: ff 15 disp32 * direct call: 2e e8 disp32 */ if (insn.length == 6 && insn.opcode.bytes[0] == 0xFF && X86_MODRM_REG(insn.modrm.bytes[0]) == 2) { /* verify it was calling hy_hypercall_page */ if (WARN_ON_ONCE(addr + 6 + insn.displacement.value != &hv_hypercall_page)) continue; /* * write a CS padded direct call -- assumes the * hypercall page is in the 2G immediate range * of the kernel text
Probably not true -- the hypercall page has a vmalloc address.
*/ addr[0] = 0x2e; /* CS prefix */ addr[1] = CALL_INSN_OPCODE; (s32 *)&Addr[2] = *hv_hypercall_page - (addr + 6); }
} }
See, easy :-)
OK, worth looking into. This is a corner of the Linux kernel code that I've never looked at before. I appreciate the pointers.
Hypercall sites also exist in loadable modules, so would need to hook into module_finalize() as well. Processing a new section type looks straightforward.
But altogether, this feels like more change than should go as a bug fix to be backported to stable kernels. It's something to look at for a future kernel release.
Michael