On Tue, Nov 20, 2018 at 1:03 AM Peter Zijlstra peterz@infradead.org wrote:
On Tue, Nov 20, 2018 at 02:59:32AM +0000, Williams, Dan J wrote:
On Mon, 2018-11-19 at 15:43 -0800, Dave Hansen wrote:
On 11/19/18 3:19 PM, Dan Williams wrote:
Andy wondered why a path that can sleep was using __flush_tlb_all() [1] and Dave confirmed the expectation for TLB flush is for modifying / invalidating existing pte entries, but not initial population [2].
I _think_ this is OK.
But, could we sprinkle a few WARN_ON_ONCE(p*_present()) calls in there to help us sleep at night?
Well, I'm having nightmares now because my naive patch to sprinkle some WARN_ON_ONCE() calls is leading to my VM live locking at boot... no backtrace. If I revert the patch below and just go with the __flush_tlb_all() removal it seems fine.
I'm going to set this aside for a bit, but if anyone has any thoughts in the meantime I'd appreciate it.
Have you tried using early_printk ?
No, it boots well past printk, and even gets past pivot root. Eventually live locks with all cores spinning. It appears to be correlated with the arrival of pmem, and independent of the tlb flushes... I'll dig deeper.
So kernel_physical_mapping_init() has a comment that states the virtual and physical addresses we create mappings for should be PMD aligned, which implies pud/p4d could have overlap between the mappings.
But in that case, I would expect the new and old values to match.
So maybe you should be checking something like:
WARN_ON_ONCE(pud_present(*pud) && !pud_same(pud, new));
Yes, that looks better.