Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
The impact of this bug includes (and most likely is not limited to) kernel BUGs when wine is run in Xen PV virtual machines. One example is included below. This example depicts the bug when wine is used to run TeamViewer version 13's portable version under a Xen PV virtual machine using an up-to-date version of Qubes OS R3.2.
[...........] kernel tried to execute NX-protected page - exploit attempt? (uid: 1000) [ +0.000021] BUG: unable to handle kernel paging request at ffffffff82012480 [ +0.000020] IP: init_task+0x0/0x2ac0 [ +0.000009] PGD 200e067 P4D 200e067 PUD 200f067 PMD 2d5e067 PTE 8010000002012067 [ +0.000019] Oops: 0011 [#1] SMP DEBUG_PAGEALLOC NOPTI [ +0.000015] Modules linked in: fuse bluetooth ecdh_generic rfkill ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c intel_rapl x86_pkg_temp_thermal crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf pcspkr binfmt_misc dummy_hcd udc_core xen_gntdev xen_gntalloc u2mfn(O) xen_blkback xenfs xen_evtchn xen_privcmd xen_blkfront [ +0.000091] CPU: 0 PID: 1350 Comm: TeamViewer.exe Tainted: G O 4.14.20-11.d10d0bb86d #15 [ +0.000018] task: ffff8800042fda00 task.stack: ffffc900006f8000 [ +0.000015] RIP: e030:init_task+0x0/0x2ac0 [ +0.000009] RSP: e02b:ffffffff82003df8 EFLAGS: 00010002 [ +0.000012] RAX: 0000000000000040 RBX: 0000000000000004 RCX: 0000000000000000 [ +0.000015] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffffff810011aa [ +0.000015] RBP: 000000000033eb88 R08: 0000000000000000 R09: 0000000000000000 [ +0.000015] R10: 0000000000000000 R11: ffff88000d772600 R12: 0000000000000000 [ +0.000015] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000027] FS: 000000007ffd8000(0063) GS:ffff88000f400000(006b) knlGS:0000000000000000 [ +0.000016] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 [ +0.000014] CR2: ffffffff82012480 CR3: 0000000004880000 CR4: 0000000000042660 [ +0.000025] Call Trace: [ +0.000007] Code: ff ff ff d0 5a 1e 81 ff ff ff ff 00 00 00 00 00 00 00 00 e0 d7 04 82 ff ff ff ff e8 98 c0 0d 00 88 ff ff 01 80 00 00 00 00 00 00 <00> 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ +0.000068] RIP: init_task+0x0/0x2ac0 RSP: ffffffff82003df8 [ +0.000011] CR2: ffffffff82012480 [ +0.000010] ---[ end trace 7fb0075d4247b2a2 ]---
The commit that introduces this bug was found with git-bisect in conjunction with some scripts and Qubes OS's cross-VM RPCs for automation, and the correction was prepared after a manual inspection of the changes introduced by the commit in question.
To the best of my recollection, this issue is also reproducible in an Ubuntu 18.04 LTS dom0 instance with the version of the Xen hypervisor in Ubuntu's package repositories.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.14 and up Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") --- arch/x86/entry/calling.h | 4 +++- arch/x86/entry/entry_64.S | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 20d0885b00fb..71795767da94 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -97,7 +97,7 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0 +.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0 clear_rbx=1 /* * Push registers and sanitize registers of values that a * speculation attack might otherwise want to exploit. The @@ -127,7 +127,9 @@ For 32-bit we have the following conventions - kernel is built with pushq %r11 /* pt_regs->r11 */ xorl %r11d, %r11d /* nospec r11*/ pushq %rbx /* pt_regs->rbx */ + .if \clear_rbx xorl %ebx, %ebx /* nospec rbx*/ + .endif pushq %rbp /* pt_regs->rbp */ xorl %ebp, %ebp /* nospec rbp*/ pushq %r12 /* pt_regs->r12 */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */ - PUSH_AND_CLEAR_REGS + PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback)
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the "x86_virt_bits" and "x86_phys_bits" fields of the "cpuinfo_x86" struct from the "get_cpu_cap" function to a new function named "get_cpu_address_sizes".
One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit, and this prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") --- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/cpu.h | 1 + arch/x86/xen/enlighten_pv.c | 1 + 3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index eb4cb3efd20e..2322d0c4bfd2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c) apply_forced_caps(c); }
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c) +void get_cpu_address_sizes(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 38216f678fc3..12a5f0cec0b2 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], *const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c); +extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); extern u32 get_scattered_cpuid_leaf(unsigned int level, diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
/* Work out if we support NX */ get_cpu_cap(&boot_cpu_data); + get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
/* Let's presume PV guests always boot on vCPU with id 0. */
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void) /* Work out if we support NX */ get_cpu_cap(&boot_cpu_data);
- get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only called here to set X86_FEATURE_NX, and is then called again, together with get_cpu_address_sizes(), from early_identify_cpu().
-boris
On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void) /* Work out if we support NX */ get_cpu_cap(&boot_cpu_data);
- get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only called here to set X86_FEATURE_NX, and is then called again, together with get_cpu_address_sizes(), from early_identify_cpu().
Hello Boris,
Thank you for the reviews! Without the call to get_cpu_address_sizes, paravirtualized virtual machines do not boot up kernels with versions 4.17 and up at all; this includes dom0 and domU. No domU logs are generated in dom0's /var/log/xen/console/ directory either, despite having earlyprintk=xen on the kernel command line for my test domU.
(For the record, I am using the patched version of Xen 4.6.6 provided by Qubes OS R3.2.)
Thank you,
Vefa
On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void) /* Work out if we support NX */ get_cpu_cap(&boot_cpu_data); + get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only called here to set X86_FEATURE_NX, and is then called again, together with get_cpu_address_sizes(), from early_identify_cpu().
Thank you for the reviews! Without the call to get_cpu_address_sizes, paravirtualized virtual machines do not boot up kernels with versions 4.17 and up at all; this includes dom0 and domU. No domU logs are generated in dom0's /var/log/xen/console/ directory either, despite having earlyprintk=xen on the kernel command line for my test domU.
Hello Boris,
I debugged this further with a debugging version of Xen (so that I can get early kernel print-outs via the "xen_raw_console_write" function), and I found the root cause of the boot up failure.
In summary, the issue is due to the following call path in version 4.17 (and higher, I assume), which the kernel goes through /only/ when CONFIG_DEBUG_VIRTUAL is enabled:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON macro, which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
It looks like the call to get_cpu_address_size is required in the xen_start_kernel function. Perhaps there is a more elegant way to resolve this issue as well.
Another approach could be to check in the phys_addr_valid function whether boot_cpu_data.x86_phys_bits has been initialized or not, I think, but I am not sure about the correctness of this approach.
Thank you,
Vefa
On 07/22/2018 11:57 AM, M. Vefa Bicakci wrote:
On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void) /* Work out if we support NX */ get_cpu_cap(&boot_cpu_data); + get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only called here to set X86_FEATURE_NX, and is then called again, together with get_cpu_address_sizes(), from early_identify_cpu().
Thank you for the reviews! Without the call to get_cpu_address_sizes, paravirtualized virtual machines do not boot up kernels with versions 4.17 and up at all; this includes dom0 and domU. No domU logs are generated in dom0's /var/log/xen/console/ directory either, despite having earlyprintk=xen on the kernel command line for my test domU.
Hello Boris,
I debugged this further with a debugging version of Xen (so that I can get early kernel print-outs via the "xen_raw_console_write" function), and I found the root cause of the boot up failure.
In summary, the issue is due to the following call path in version 4.17 (and higher, I assume), which the kernel goes through /only/ when CONFIG_DEBUG_VIRTUAL is enabled:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON macro, which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
Ah, that's why it hasn't been detected.
It looks like the call to get_cpu_address_size is required in the xen_start_kernel function. Perhaps there is a more elegant way to resolve this issue as well.
Another approach could be to check in the phys_addr_valid function whether boot_cpu_data.x86_phys_bits has been initialized or not, I think, but I am not sure about the correctness of this approach.
No, I think your patch is good. The only thing I'd suggest is to move the call a few lines down. The way it is placed now may create impression that we are calling get_cpu_address_sizes() to figure out NX support.
-boris
On 07/23/2018 11:04 AM, Boris Ostrovsky wrote:
On 07/22/2018 11:57 AM, M. Vefa Bicakci wrote:
On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 439a94bf89ad..87afb000142a 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void) /* Work out if we support NX */ get_cpu_cap(&boot_cpu_data); + get_cpu_address_sizes(&boot_cpu_data); x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only called here to set X86_FEATURE_NX, and is then called again, together with get_cpu_address_sizes(), from early_identify_cpu().
Thank you for the reviews! Without the call to get_cpu_address_sizes, paravirtualized virtual machines do not boot up kernels with versions 4.17 and up at all; this includes dom0 and domU. No domU logs are generated in dom0's /var/log/xen/console/ directory either, despite having earlyprintk=xen on the kernel command line for my test domU.
Hello Boris,
I debugged this further with a debugging version of Xen (so that I can get early kernel print-outs via the "xen_raw_console_write" function), and I found the root cause of the boot up failure.
In summary, the issue is due to the following call path in version 4.17 (and higher, I assume), which the kernel goes through /only/ when CONFIG_DEBUG_VIRTUAL is enabled:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON macro, which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
Ah, that's why it hasn't been detected.
It looks like the call to get_cpu_address_size is required in the xen_start_kernel function. Perhaps there is a more elegant way to resolve this issue as well.
Another approach could be to check in the phys_addr_valid function whether boot_cpu_data.x86_phys_bits has been initialized or not, I think, but I am not sure about the correctness of this approach.
No, I think your patch is good. The only thing I'd suggest is to move the call a few lines down. The way it is placed now may create impression that we are calling get_cpu_address_sizes() to figure out NX support.
Sorry for the delay, and thank you for your comments! I will send an updated version of this patch in a few moments.
Vefa
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct from the get_cpu_cap function to a new function named get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit. This prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical addresses. boot_cpu_data.x86_phys_bits is no longer populated before the call to xen_reserve_special_pages due to the aforementioned commit though, so the validation performed by phys_addr_valid fails, which causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
---
Changes since v1: - Move the call to get_cpu_address_sizes below the call to x86_configure_nx. - Amend the commit message to describe why PV VMs do not boot up successfully when CONFIG_DEBUG_VIRTUAL is enabled. --- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/cpu.h | 1 + arch/x86/xen/enlighten_pv.c | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f73fa6f6d85e..dd282482de09 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c) apply_forced_caps(c); }
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c) +void get_cpu_address_sizes(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 38216f678fc3..12a5f0cec0b2 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], *const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c); +extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); extern u32 get_scattered_cpuid_leaf(unsigned int level, diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 105a57d73701..ee3b00c7acda 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void) get_cpu_cap(&boot_cpu_data); x86_configure_nx();
+ /* Determine virtual and physical address sizes */ + get_cpu_address_sizes(&boot_cpu_data); + /* Let's presume PV guests always boot on vCPU with id 0. */ per_cpu(xen_vcpu_id, 0) = 0;
On 07/24/2018 08:45 AM, M. Vefa Bicakci wrote:
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct from the get_cpu_cap function to a new function named get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit. This prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical addresses. boot_cpu_data.x86_phys_bits is no longer populated before the call to xen_reserve_special_pages due to the aforementioned commit though, so the validation performed by phys_addr_valid fails, which causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct from the get_cpu_cap function to a new function named get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit. This prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical addresses. boot_cpu_data.x86_phys_bits is no longer populated before the call to xen_reserve_special_pages due to the aforementioned commit though, so the validation performed by phys_addr_valid fails, which causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
---
Changes since v1: - Move the call to get_cpu_address_sizes below the call to x86_configure_nx. - Amend the commit message to describe why PV VMs do not boot up successfully when CONFIG_DEBUG_VIRTUAL is enabled.
Changes since v2: - Add a Reviewed-by tag to note Boris Ostrovsky's code review. - Rebase onto next-20180725. --- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/cpu.h | 1 + arch/x86/xen/enlighten_pv.c | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f73fa6f6d85e..dd282482de09 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c) apply_forced_caps(c); }
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c) +void get_cpu_address_sizes(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 38216f678fc3..12a5f0cec0b2 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], *const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c); +extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); extern u32 get_scattered_cpuid_leaf(unsigned int level, diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 105a57d73701..ee3b00c7acda 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void) get_cpu_cap(&boot_cpu_data); x86_configure_nx();
+ /* Determine virtual and physical address sizes */ + get_cpu_address_sizes(&boot_cpu_data); + /* Let's presume PV guests always boot on vCPU with id 0. */ per_cpu(xen_vcpu_id, 0) = 0;
x86 maintainers, this needs your ack please.
-boris
On 07/24/2018 08:45 AM, M. Vefa Bicakci wrote:
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct from the get_cpu_cap function to a new function named get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit. This prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical addresses. boot_cpu_data.x86_phys_bits is no longer populated before the call to xen_reserve_special_pages due to the aforementioned commit though, so the validation performed by phys_addr_valid fails, which causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci m.v.b@runbox.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Changes since v1:
- Move the call to get_cpu_address_sizes below the call to x86_configure_nx.
- Amend the commit message to describe why PV VMs do not boot up successfully when CONFIG_DEBUG_VIRTUAL is enabled.
arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/cpu/cpu.h | 1 + arch/x86/xen/enlighten_pv.c | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f73fa6f6d85e..dd282482de09 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c) apply_forced_caps(c); } -static void get_cpu_address_sizes(struct cpuinfo_x86 *c) +void get_cpu_address_sizes(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx; diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 38216f678fc3..12a5f0cec0b2 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], *const __x86_cpu_dev_end[]; extern void get_cpu_cap(struct cpuinfo_x86 *c); +extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); extern u32 get_scattered_cpuid_leaf(unsigned int level, diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 105a57d73701..ee3b00c7acda 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void) get_cpu_cap(&boot_cpu_data); x86_configure_nx();
- /* Determine virtual and physical address sizes */
- get_cpu_address_sizes(&boot_cpu_data);
- /* Let's presume PV guests always boot on vCPU with id 0. */ per_cpu(xen_vcpu_id, 0) = 0;
On Mon, 6 Aug 2018, Boris Ostrovsky wrote:
x86 maintainers, this needs your ack please.
Reviewed-by: Thomas Gleixner tglx@linutronix.de
On 08/06/2018 04:16 PM, Thomas Gleixner wrote:
On Mon, 6 Aug 2018, Boris Ostrovsky wrote:
x86 maintainers, this needs your ack please.
Reviewed-by: Thomas Gleixner tglx@linutronix.de
Thanks.
Applied to for-linus-4.19
* Boris Ostrovsky boris.ostrovsky@oracle.com wrote:
x86 maintainers, this needs your ack please.
LGTM:
Acked-by: Ingo Molnar mingo@kernel.org
Thanks,
Ingo
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
- PUSH_AND_CLEAR_REGS
- PUSH_AND_CLEAR_REGS clear_rbx=0
Do we need this at all? We are returning from the hypervisor here.
-boris
ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback)
On 07/21/2018 05:19 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
- PUSH_AND_CLEAR_REGS
- PUSH_AND_CLEAR_REGS clear_rbx=0
Do we need this at all? We are returning from the hypervisor here.
-boris
ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback)
Hello Boris,
If you are referring to the PUSH_AND_CLEAR_REGS macro itself, I am not sure; however, not clearing the RBX register seemed to resolve the issues mentioned in the commit message for me. Given Andy's comment though, I believe that the approach in this patch may not be correct.
Thank you,
Vefa
On 07/21/2018 07:18 PM, M. Vefa Bicakci wrote:
On 07/21/2018 05:19 PM, Boris Ostrovsky wrote:
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */ - PUSH_AND_CLEAR_REGS + PUSH_AND_CLEAR_REGS clear_rbx=0
Do we need this at all? We are returning from the hypervisor here.
-boris
ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback)
Hello Boris,
If you are referring to the PUSH_AND_CLEAR_REGS macro itself, I am not sure; however, not clearing the RBX register seemed to resolve the issues mentioned in the commit message for me. Given Andy's comment though, I believe that the approach in this patch may not be correct.
I was only referring to register clearing part of PUSH_AND_CLEAR_REGS.
-boris
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
[I also realized that I forgot to include Andi Kleen and Dan Williams, authors of 3ac6d8c787b8, in the discussion; I am copying this e-mail to them as well.]
Thank you,
Vefa
$ git show -W 3ac6d8c787b8 -- arch/x86/entry/entry_64.S ... ENTRY(xen_failsafe_callback) UNWIND_HINT_EMPTY movl %ds, %ecx cmpw %cx, 0x10(%rsp) jne 1f movl %es, %ecx cmpw %cx, 0x18(%rsp) jne 1f movl %fs, %ecx cmpw %cx, 0x20(%rsp) jne 1f movl %gs, %ecx cmpw %cx, 0x28(%rsp) jne 1f /* All segments match their saved values => Category 2 (Bad IRET). */ movq (%rsp), %rcx movq 8(%rsp), %r11 addq $0x30, %rsp pushq $0 /* RIP */ UNWIND_HINT_IRET_REGS offset=8 jmp general_protection 1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */ movq (%rsp), %rcx movq 8(%rsp), %r11 addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */ ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS + CLEAR_REGS_NOSPEC ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback) ...
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
Any chance you could test that and see if it fixes your problem?
On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
No problem! :-)
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
Any chance you could test that and see if it fixes your problem?
Of course; I will report back with the result in a few hours.
On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */ - PUSH_AND_CLEAR_REGS + PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
No problem! :-)
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
Any chance you could test that and see if it fixes your problem?
Of course; I will report back with the result in a few hours.
Hello Andy,
I confirm that the commit at [1] resolves the issue in question as well.
To test, I first reverted my commit, applied your commit and verified that the bug cannot be reproduced. Afterwards, I reverted your commit and verified that the bug is reproducible.
I am not sure about the best way to document the bug I encountered in your commit message, but in case you plan to have your commit merged, please feel free to add a "Reported-and-tested-by: M. Vefa Bicakci m.v.b@runbox.com" tag to the commit message, possibly with a link to this e-mail thread.
Finally, as I had mentioned in my commit message, this bug exists in all kernel versions 4.14 and greater, so it would be nice if you could carbon-copy "stable@vger.kernel.org" in the commit message as well.
Thank you,
Vefa
[1] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") unintendedly broke Xen PV virtual machines by clearing the %rbx register at the end of xen_failsafe_callback before the latter jumps to error_exit. error_exit expects the %rbx register to be a flag indicating whether there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c7449f377a77..96e8ff34129e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) addq $0x30, %rsp UNWIND_HINT_IRET_REGS pushq $-1 /* orig_ax = -1 => not a system call */
PUSH_AND_CLEAR_REGS
PUSH_AND_CLEAR_REGS clear_rbx=0 ENCODE_FRAME_POINTER jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
No problem! :-)
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
Any chance you could test that and see if it fixes your problem?
Of course; I will report back with the result in a few hours.
Hello Andy,
I confirm that the commit at [1] resolves the issue in question as well.
To test, I first reverted my commit, applied your commit and verified that the bug cannot be reproduced. Afterwards, I reverted your commit and verified that the bug is reproducible.
I am not sure about the best way to document the bug I encountered in your commit message, but in case you plan to have your commit merged, please feel free to add a "Reported-and-tested-by: M. Vefa Bicakci m.v.b@runbox.com" tag to the commit message, possibly with a link to this e-mail thread.
Finally, as I had mentioned in my commit message, this bug exists in all kernel versions 4.14 and greater, so it would be nice if you could carbon-copy "stable@vger.kernel.org" in the commit message as well.
Thank you,
I'm curious why the sigreturn_64 selftest didn't catch this bug. Do you happen to know why? The xen_failsafe_callback mechanism is a bit mysterious to me.
Vefa
[1] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
On 22/07/18 18:56, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci m.v.b@runbox.com wrote:
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci m.v.b@runbox.com wrote: > > Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for > exceptions/interrupts, to reduce speculation attack surface") > unintendedly > broke Xen PV virtual machines by clearing the %rbx register at the end > of > xen_failsafe_callback before the latter jumps to error_exit. > error_exit expects the %rbx register to be a flag indicating whether > there should be a return to kernel mode. > > This commit makes sure that the %rbx register is not cleared by > the PUSH_AND_CLEAR_REGS macro, when the macro in question is > instantiated > by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index c7449f377a77..96e8ff34129e 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback) > addq $0x30, %rsp > UNWIND_HINT_IRET_REGS > pushq $-1 /* orig_ax = -1 => not a system call */ > - PUSH_AND_CLEAR_REGS > + PUSH_AND_CLEAR_REGS clear_rbx=0 > ENCODE_FRAME_POINTER > jmp error_exit
The old code first set RBX to zero then, if frame pointers are on, sets it to some special non-zero value, then crosses its fingers and hopes for the best. Your patched code just skips the zeroing part, so RBX either contains the ENCODE_FRAME_POINTER result or is uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully. I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the frame pointer (i.e., RBP) register, as opposed to the RBX register, which the patch aims to avoid changing before jumping to error_exit. It is possible that I am missing something though -- I am not sure about the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
No problem! :-)
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it be valid to state that the original code had the same issue that you referred to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86...
Any chance you could test that and see if it fixes your problem?
Of course; I will report back with the result in a few hours.
Hello Andy,
I confirm that the commit at [1] resolves the issue in question as well.
To test, I first reverted my commit, applied your commit and verified that the bug cannot be reproduced. Afterwards, I reverted your commit and verified that the bug is reproducible.
I am not sure about the best way to document the bug I encountered in your commit message, but in case you plan to have your commit merged, please feel free to add a "Reported-and-tested-by: M. Vefa Bicakci m.v.b@runbox.com" tag to the commit message, possibly with a link to this e-mail thread.
Finally, as I had mentioned in my commit message, this bug exists in all kernel versions 4.14 and greater, so it would be nice if you could carbon-copy "stable@vger.kernel.org" in the commit message as well.
Thank you,
I'm curious why the sigreturn_64 selftest didn't catch this bug. Do you happen to know why? The xen_failsafe_callback mechanism is a bit mysterious to me.
It is invoked whenever Xen suffers a fault when trying to load a guest segment register.
In practice, it is used far more rarely with 64bit builds of Xen than it used to be with 32bit builds. This is because the only time we reload guest data segments is in the vcpu context switch path.
More recent versions of Xen also don't have a fallback in the iret path, due to what think was actually some overzealous cleanup. As a result, #GP gets delivered pointing at the target of the iret instruction.
Looking at the sigreturn tests, I'd expect the test to complain a lot, not least because Xen doesn't have espfix64.
~Andrew
linux-stable-mirror@lists.linaro.org