On 2024/7/26 16:04, David Hildenbrand wrote:
On 26.07.24 04:33, Baolin Wang wrote:
On 2024/7/26 02:39, David Hildenbrand wrote:
We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock.
However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
Let's make huge_pte_lockptr() effectively uses the same PT locks as any core-mm page table walker would.
Thanks for raising the issue again. I remember fixing this issue 2 years ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.
Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D
Did we have a reproducer back then that would make my live easier?
I don't have any reproducers right now. I remember I added some ugly hack code (adding delay() etc.) in kernel to analyze this issue, and not easy to reproduce. :(