 
            On 15/05/25 2:23 pm, David Hildenbrand wrote:
On 15.05.25 10:47, Dev Jain wrote:
On 15/05/25 2:06 pm, David Hildenbrand wrote:
On 15.05.25 10:22, Dev Jain wrote:
On 15/05/25 1:43 pm, David Hildenbrand wrote:
On 15.05.25 08:34, Dev Jain wrote:
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.
The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?
Should that one problematic caller then check for pmd_none() instead?
From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():
if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;
The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.
Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.
I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().
With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.
Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.
Right. We just make sure that all callers of pmd_free_pte_page() already check for it.
I'd just do something like:
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
pmd = READ_ONCE(*pmdp);
- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));
And also return 1? Also we should BUG_ON(!pmd_present(pmd)) to avoid the null dereference?
table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp)) + pmd_free_pte_page(pmdp, next);
Ah yes, the "caller" of pmd_free_pte_page() is not only vmap_try_huge_pmd but this also...my mind has been foggy lately... need to solve a math problem or two to sharpen it :)
} while (pmdp++, next += PMD_SIZE, next != end);
pud_clear(pudp);