Commit f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling:
WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 RIP: 0010:__flush_tlb_all+0x1b/0x3a [..] Call Trace: phys_pud_init+0x29c/0x2bb kernel_physical_mapping_init+0xfc/0x219 init_memory_mapping+0x1a5/0x3b0 arch_add_memory+0x2c/0x50 devm_memremap_pages+0x3aa/0x610 pmem_attach_disk+0x585/0x700 [nd_pmem]
Rather than audit all __flush_tlb_all() callers to add preemption, just do it internally to __flush_tlb_all().
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()") Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@alien8.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- arch/x86/include/asm/tlbflush.h | 8 ++++---- arch/x86/mm/pageattr.c | 6 +----- 2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index d760611cfc35..049e0aca0fb5 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned long addr) static inline void __flush_tlb_all(void) { /* - * This is to catch users with enabled preemption and the PGE feature - * and don't trigger the warning in __native_flush_tlb(). + * Preemption needs to be disabled around __flush_tlb* calls + * due to CR3 reload in __native_flush_tlb(). */ - VM_WARN_ON_ONCE(preemptible()); - + preempt_disable(); if (boot_cpu_has(X86_FEATURE_PGE)) { __flush_tlb_global(); } else { @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void) */ __flush_tlb(); } + preempt_enable(); }
/* diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index db7a10082238..f799076e3d57 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
/* * We should perform an IPI and flush all tlbs, - * but that can deadlock->flush only current cpu. - * Preemption needs to be disabled around __flush_tlb_all() due to - * CR3 reload in __native_flush_tlb(). + * but that can deadlock->flush only current cpu: */ - preempt_disable(); __flush_tlb_all(); - preempt_enable();
arch_flush_lazy_mmu_mode(); }
On Nov 9, 2018, at 4:05 PM, Dan Williams dan.j.williams@intel.com wrote:
Commit f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling:
I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 RIP: 0010:__flush_tlb_all+0x1b/0x3a [..] Call Trace: phys_pud_init+0x29c/0x2bb kernel_physical_mapping_init+0xfc/0x219 init_memory_mapping+0x1a5/0x3b0 arch_add_memory+0x2c/0x50 devm_memremap_pages+0x3aa/0x610 pmem_attach_disk+0x585/0x700 [nd_pmem]
Rather than audit all __flush_tlb_all() callers to add preemption, just do it internally to __flush_tlb_all().
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()") Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Borislav Petkov bp@alien8.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
arch/x86/include/asm/tlbflush.h | 8 ++++---- arch/x86/mm/pageattr.c | 6 +----- 2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index d760611cfc35..049e0aca0fb5 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned long addr) static inline void __flush_tlb_all(void) { /*
* This is to catch users with enabled preemption and the PGE feature
* and don't trigger the warning in __native_flush_tlb().
* Preemption needs to be disabled around __flush_tlb* calls
*/* due to CR3 reload in __native_flush_tlb().
- VM_WARN_ON_ONCE(preemptible());
- preempt_disable(); if (boot_cpu_has(X86_FEATURE_PGE)) { __flush_tlb_global(); } else {
@@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void) */ __flush_tlb(); }
- preempt_enable();
}
/* diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index db7a10082238..f799076e3d57 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
/* * We should perform an IPI and flush all tlbs,
* but that can deadlock->flush only current cpu.
* Preemption needs to be disabled around __flush_tlb_all() due to
* CR3 reload in __native_flush_tlb().
*/* but that can deadlock->flush only current cpu:
preempt_disable(); __flush_tlb_all();
preempt_enable();
arch_flush_lazy_mmu_mode();
}
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski luto@amacapital.net wrote:
On Nov 9, 2018, at 4:05 PM, Dan Williams dan.j.williams@intel.com wrote:
Commit f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling:
I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
Hmm, are saying the entire kernel_physical_mapping_init() sequence needs to run with pre-emption disabled?
On Nov 10, 2018, at 3:57 PM, Dan Williams dan.j.williams@intel.com wrote:
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski luto@amacapital.net wrote:
On Nov 9, 2018, at 4:05 PM, Dan Williams dan.j.williams@intel.com wrote:
Commit f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling:
I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
Hmm, are saying the entire kernel_physical_mapping_init() sequence needs to run with pre-emption disabled?
If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
[ added Kirill ]
On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski luto@amacapital.net wrote:
On Nov 10, 2018, at 3:57 PM, Dan Williams dan.j.williams@intel.com wrote:
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski luto@amacapital.net wrote:
On Nov 9, 2018, at 4:05 PM, Dan Williams dan.j.williams@intel.com wrote:
Commit f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling:
I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps.
Hmm, are saying the entire kernel_physical_mapping_init() sequence needs to run with pre-emption disabled?
If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
Yeah, I don't think __flush_tlb_all() is needed at kernel_physical_mapping_init() time, and at kernel_physical_mapping_remove() time we do a full flush_tlb_all().
Kirill?
On 11/10/18 4:31 PM, Dan Williams wrote:
If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
Yeah, I don't think __flush_tlb_all() is needed at kernel_physical_mapping_init() time, and at kernel_physical_mapping_remove() time we do a full flush_tlb_all().
It doesn't look strictly necessary to me. I _think_ we're only ever populating previously non-present entries, and those never need TLB flushes. I didn't look too deeply, so I'd appreciate anyone else double-checking me on this.
The __flush_tlb_all() actually appears to predate git and it was originally entirely intended for early-boot-only. It probably lasted this long because it looks really important. :)
It was even next to where we set MMU features in CR4, which is *really* early in boot:
asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features));
__flush_tlb_all();
I also totally agree with Andy that if it were needed on the local CPU, this code would be buggy because it doesn't initiate any *remote* TLB flushes.
So, let's remove it, but also add some comments about not being allowed to *change* page table entries, only populate them. We could even add some warnings to keep this enforced.
linux-stable-mirror@lists.linaro.org