From: Andy Lutomirski luto@kernel.org
commit: 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Steven Rostedt rostedt@goodmis.org Cc: Borislav Petkov bp@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Backported-by: Chenggang chenggang.qin@linux.alibaba.com --- 7u/arch/x86/kernel/nmi.c | 122 ++++++++++++++++++++--------------------------- 1 file changed, 51 insertions(+), 71 deletions(-)
diff --git a/7u/arch/x86/kernel/nmi.c b/7u/arch/x86/kernel/nmi.c index 6030805..a735412 100644 --- a/7u/arch/x86/kernel/nmi.c +++ b/7u/arch/x86/kernel/nmi.c @@ -359,15 +359,15 @@ static __kprobes void default_do_nmi(struct pt_regs *regs) }
/* - * NMIs can hit breakpoints which will cause it to lose its - * NMI context with the CPU when the breakpoint does an iret. - */ -#ifdef CONFIG_X86_32 -/* - * For i386, NMIs use the same stack as the kernel, and we can - * add a workaround to the iret problem in C (preventing nested - * NMIs if an NMI takes a trap). Simply have 3 states the NMI - * can be in: + * NMIs can hit breakpoints which will cause it to lose its NMI context + * with the CPU when the breakpoint or page fault does an IRET. + * + * As a result, NMIs can nest if NMIs get unmasked due an IRET during + * NMI processing. On x86_64, the asm glue protects us from nested NMIs + * if the outer NMI came from kernel mode, but we can still nest if the + * outer NMI came from user mode. + * + * To handle these nested NMIs, we have three states: * * 1) not running * 2) executing @@ -381,15 +381,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs) * (Note, the latch is binary, thus multiple NMIs triggering, * when one is running, are ignored. Only one NMI is restarted.) * - * If an NMI hits a breakpoint that executes an iret, another - * NMI can preempt it. We do not want to allow this new NMI - * to run, but we want to execute it when the first one finishes. - * We set the state to "latched", and the exit of the first NMI will - * perform a dec_return, if the result is zero (NOT_RUNNING), then - * it will simply exit the NMI handler. If not, the dec_return - * would have set the state to NMI_EXECUTING (what we want it to - * be when we are running). In this case, we simply jump back - * to rerun the NMI handler again, and restart the 'latched' NMI. + * If an NMI executes an iret, another NMI can preempt it. We do not + * want to allow this new NMI to run, but we want to execute it when the + * first one finishes. We set the state to "latched", and the exit of + * the first NMI will perform a dec_return, if the result is zero + * (NOT_RUNNING), then it will simply exit the NMI handler. If not, the + * dec_return would have set the state to NMI_EXECUTING (what we want it + * to be when we are running). In this case, we simply jump back to + * rerun the NMI handler again, and restart the 'latched' NMI. * * No trap (breakpoint or page fault) should be hit before nmi_restart, * thus there is no race between the first check of state for NOT_RUNNING @@ -412,49 +411,36 @@ enum nmi_states { static DEFINE_PER_CPU(enum nmi_states, nmi_state); static DEFINE_PER_CPU(unsigned long, nmi_cr2);
-#define nmi_nesting_preprocess(regs) \ - do { \ - if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) { \ - this_cpu_write(nmi_state, NMI_LATCHED); \ - return; \ - } \ - this_cpu_write(nmi_state, NMI_EXECUTING); \ - this_cpu_write(nmi_cr2, read_cr2()); \ - } while (0); \ - nmi_restart: - -#define nmi_nesting_postprocess() \ - do { \ - if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \ - write_cr2(this_cpu_read(nmi_cr2)); \ - if (this_cpu_dec_return(nmi_state)) \ - goto nmi_restart; \ - } while (0) -#else /* x86_64 */ +#ifdef CONFIG_X86_64 /* - * In x86_64 things are a bit more difficult. This has the same problem - * where an NMI hitting a breakpoint that calls iret will remove the - * NMI context, allowing a nested NMI to enter. What makes this more - * difficult is that both NMIs and breakpoints have their own stack. - * When a new NMI or breakpoint is executed, the stack is set to a fixed - * point. If an NMI is nested, it will have its stack set at that same - * fixed address that the first NMI had, and will start corrupting the - * stack. This is handled in entry_64.S, but the same problem exists with - * the breakpoint stack. + * In x86_64, we need to handle breakpoint -> NMI -> breakpoint. Without + * some care, the inner breakpoint will clobber the outer breakpoint's + * stack. * - * If a breakpoint is being processed, and the debug stack is being used, - * if an NMI comes in and also hits a breakpoint, the stack pointer - * will be set to the same fixed address as the breakpoint that was - * interrupted, causing that stack to be corrupted. To handle this case, - * check if the stack that was interrupted is the debug stack, and if - * so, change the IDT so that new breakpoints will use the current stack - * and not switch to the fixed address. On return of the NMI, switch back - * to the original IDT. + * If a breakpoint is being processed, and the debug stack is being + * used, if an NMI comes in and also hits a breakpoint, the stack + * pointer will be set to the same fixed address as the breakpoint that + * was interrupted, causing that stack to be corrupted. To handle this + * case, check if the stack that was interrupted is the debug stack, and + * if so, change the IDT so that new breakpoints will use the current + * stack and not switch to the fixed address. On return of the NMI, + * switch back to the original IDT. */ static DEFINE_PER_CPU(int, update_debug_stack); +#endif
-static inline void nmi_nesting_preprocess(struct pt_regs *regs) +dotraplinkage notrace void +do_nmi(struct pt_regs *regs, long error_code) { + if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) { + this_cpu_write(nmi_state, NMI_LATCHED); + return; + } + this_cpu_write(nmi_state, NMI_EXECUTING); + this_cpu_write(nmi_cr2, read_cr2()); +nmi_restart: + +#ifdef CONFIG_X86_64 /* * If we interrupted a breakpoint, it is possible that * the nmi handler will have breakpoints too. We need to @@ -465,22 +451,8 @@ static inline void nmi_nesting_preprocess(struct pt_regs *regs) debug_stack_set_zero(); this_cpu_write(update_debug_stack, 1); } -} - -static inline void nmi_nesting_postprocess(void) -{ - if (unlikely(this_cpu_read(update_debug_stack))) { - debug_stack_reset(); - this_cpu_write(update_debug_stack, 0); - } -} #endif
-dotraplinkage notrace __kprobes void -do_nmi(struct pt_regs *regs, long error_code) -{ - nmi_nesting_preprocess(regs); - nmi_enter();
inc_irq_stat(__nmi_count); @@ -489,9 +461,17 @@ do_nmi(struct pt_regs *regs, long error_code) default_do_nmi(regs);
nmi_exit(); +#ifdef CONFIG_X86_64 + if (unlikely(this_cpu_read(update_debug_stack))) { + debug_stack_reset(); + this_cpu_write(update_debug_stack, 0); + } +#endif
- /* On i386, may loop back to preprocess */ - nmi_nesting_postprocess(); + if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) + write_cr2(this_cpu_read(nmi_cr2)); + if (this_cpu_dec_return(nmi_state)) + goto nmi_restart; }
void stop_nmi(void)
On 2018/4/12 下午12:58, chenggang.qin@linux.alibaba.com wrote:
From: Andy Lutomirski luto@kernel.org
commit: 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting.
好奇为什么移植这个 patch?
Linux Nested NMI 的设计很丑陋,过去主要是为了 trace 和 debug 不得已而为之,我们踩中啥坑了?
On 2018/4/12 下午1:22, Oliver Yang wrote:
On 2018/4/12 下午12:58, chenggang.qin@linux.alibaba.com wrote:
From: Andy Lutomirski luto@kernel.org
commit: 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting.
好奇为什么移植这个 patch?
Linux Nested NMI 的设计很丑陋,过去主要是为了 trace 和 debug 不得已而为之,我们踩中啥坑了?
执行perf,触发了NMI Nesting: https://aone.alibaba-inc.com/issue/14185470
On Thu, Apr 12, 2018 at 01:59:32PM +0800, Chenggang wrote:
On 2018/4/12 下午1:22, Oliver Yang wrote:
On 2018/4/12 下午12:58, chenggang.qin@linux.alibaba.com wrote:
From: Andy Lutomirski luto@kernel.org
commit: 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting.
好奇为什么移植这个 patch?
Linux Nested NMI 的设计很丑陋,过去主要是为了 trace 和 debug 不得已而为之,我们踩中啥坑了?
执行perf,触发了NMI Nesting: https://aone.alibaba-inc.com/issue/14185470
Could you guys maybe keep private stuff private? :-)
Hi Chenggang,
Have a doubt about the old code.
On 2018/4/12 PM 12:58, chenggang.qin@linux.alibaba.com Wrote:
From: Andy Lutomirski luto@kernel.org
commit: 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting.
...... -static inline void nmi_nesting_preprocess(struct pt_regs *regs) +dotraplinkage notrace void +do_nmi(struct pt_regs *regs, long error_code) {
- if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
this_cpu_write(nmi_state, NMI_LATCHED);
return;
- }
- this_cpu_write(nmi_state, NMI_EXECUTING);
- this_cpu_write(nmi_cr2, read_cr2());
+nmi_restart:
Here if there are more than 2 NMIs nested, but the nmi_state is always NMI_LATCHED.
- /* On i386, may loop back to preprocess */
- nmi_nesting_postprocess();
- if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
write_cr2(this_cpu_read(nmi_cr2));
- if (this_cpu_dec_return(nmi_state))
}goto nmi_restart;
But here at most re-execute 2 NMIs, so some nmi lost ? And cr2 is always the first NMI's cr2. CR2 is wrong for the later NMIs.
void stop_nmi(void)
Thanks & Regards Qixuan Wu.
linux-stable-mirror@lists.linaro.org