(I don't understand how the page table can be used for "normal, non-hugetlb". I could only see how it is used for the remaining user for hugetlb stuff, but that's different question)
Right, this surely is related only to hugetlb PTS, otherwise the refcount shouldn't be a factor no?
The example from Jann is scary. But I think it checks out.
How does the fix work when an architecture does not issue IPIs for TLB shootdown? To handle gup-fast on these architectures, we use RCU.
So I'm wondering whether we use RCU somehow.
Presumably you mean whether we _can_ use RCU somehow?
No, whether there is an implied RCU sync before the page table gets reused, see my reply from Jann.
But note that in gup_fast_pte_range(), we are validating whether the PMD changed:
if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) || unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) { gup_put_folio(folio, 1, flags); goto pte_unmap; }
Right and as per the comment there:
/* ...
- For THP collapse, it's a bit more complicated because GUP-fast may be
- walking a pgtable page that is being freed (pte is still valid but pmd
- can be cleared already). To avoid race in such condition, we need to
- also check pmd here to make sure pmd doesn't change (corresponds to
- pmdp_collapse_flush() in the THP collapse code path).
... */
So if this can correctly handle a cleared PMD entry in the teardown case, surely it can handle it in this case also?
Right.
But see my other mail, on architectures that don't free page tables with RCU we still need the IPI, so that is nasty.
So in case the page table got reused in the meantime, we should just back off and be fine, right?
Yeah seems to be the case to me.
-- Cheers
David / dhildenb
So it seems like you have a proposal here - could you send a patch so we can assess it please? :)
It's a bit tricky, I think I have to discuss with Jann some more first. But right now my understanding is that Janns fix might not have taken care of arch without the IPI sync -- I might be wrong.