On Tue, Sep 04, 2018 at 10:12:13AM -0700, Linus Torvalds wrote:
On Mon, Sep 3, 2018 at 11:39 AM Holger Hoffstätte holger@applied-asynchrony.com wrote:
Sep 3 20:19:38 ragnarok kernel: tlb_flush_mmu_tlbonly+0x76/0xc0 Sep 3 20:19:38 ragnarok kernel: tlb_table_flush.part.13+0xe/0x30 Sep 3 20:19:38 ragnarok kernel: tlb_flush_mmu_tlbonly+0x54/0xc0 ..a few hundred times.. Sep 3 20:19:38 ragnarok kernel: tlb_table_flush.part.13+0xe/0x30 Sep 3 20:19:38 ragnarok kernel: tlb_flush_mmu_tlbonly+0x54/0xc0 Sep 3 20:19:38 ragnarok kernel: arch_tlb_finish_mmu+0x3a/0x70 Sep 3 20:19:38 ragnarok kernel: tlb_finish_mmu+0x1f/0x30
Yeah, so what seems to have happened is that commit db7ddef30112 ("mm: move tlb_table_flush to tlb_flush_mmu_free") wasn't applied to the stable tree (because it wasn't an obvious dependency).
And without that, the backport of d86564a2f085 ("mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE") ends up with recursion from tlb_flush_mmu_tlbonly() calling tlb_table_flush(), which in turn calls tlb_table_invalidate(), which calls back to tlb_flush_mmu_tlbonly().
So you have endless recursion - at least until you run out of stack. Then, if you have VMAP_STACK enabled (x86-64 without KASAN), you get a nice clean kernel stack overflow message like you did.
Or if you have KASAN enabled and no VMAP stack, you just end up with random hangs and huge memory corruption as the recursion stomps all over your memory.
Ok, I will go queue this patch up now, it was in my very-long "to-apply" queue, but I didn't catch the dependancy here.
thanks,
greg k-h