I don't think we can elide the call __tlb_reset_range() entirely, since I think we do want to clear the freed_pXX bits to ensure that we walk the range with the smallest mapping granule that we have. Otherwise couldn't we have a problem if we hit a PMD that had been cleared, but the TLB invalidation for the PTEs that used to be linked below it was still pending?
That's tlb->cleared_p*, and yes agreed. That is, right until some architecture has level dependent TLBI instructions, at which point we'll need to have them all set instead of cleared.
Perhaps we should just set fullmm if we see that here's a concurrent unmapper rather than do a worst-case range invalidation. Do you have a feeling for often the mm_tlb_flush_nested() triggers in practice?
Quite a bit for certain workloads I imagine, that was the whole point of doing it.
Anyway; am I correct in understanding that the actual problem is that we've cleared freed_tables and the ARM64 tlb_flush() will then not invalidate the cache and badness happens?
That is my understanding, only last level is flushed, which is not enough.
Because so far nobody has actually provided a coherent description of the actual problem we're trying to solve. But I'm thinking something like the below ought to do.
I applied it (and fixed small typo: s/tlb->full_mm/tlb->fullmm/). It fixes the problem for me.
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99740e1dd273..fe768f8d612e 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -244,15 +244,20 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) { /*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
* flush by batching, a thread has stable TLB entry can fail to flush
* the TLB by observing pte_none|!pte_dirty, for example so flush TLB
* forcefully if we detect parallel PTE batching threads.
*/* Sensible comment goes here..
- if (mm_tlb_flush_nested(tlb->mm)) {
__tlb_reset_range(tlb);
__tlb_adjust_range(tlb, start, end - start);
- if (mm_tlb_flush_nested(tlb->mm) && !tlb->full_mm) {
/*
* Since we're can't tell what we actually should have
* flushed flush everything in the given range.
*/
tlb->start = start;
tlb->end = end;
tlb->freed_tables = 1;
tlb->cleared_ptes = 1;
tlb->cleared_pmds = 1;
tlb->cleared_puds = 1;
}tlb->cleared_p4ds = 1;
tlb_flush_mmu(tlb);