[PATCH] arm64: Check pxd_leaf() instead of !pxd_table() while tearing down page tables

List overview All Threads
Download

newer

older

[PATCH] drm: radeon: ci_dpm: Add...

[PATCH v3] clk: meson-g12a: add...

Dev Jain

15 May 2025 15 May '25

6:34 a.m.

Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none. Thus it is possible to hit a warning in the latter, since pmd_none => !pmd_table(). Thus, enforce these checks again through pxd_leaf(). This problem was found by code inspection. The patch is based on 6.15-rc6.

Fixes: 9c006972c3fe (arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()) Cc: stable@vger.kernel.org Reported-by: Ryan Roberts ryan.roberts@arm.com Signed-off-by: Dev Jain dev.jain@arm.com --- arch/arm64/mm/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ea6695d53fb9..3d6789413a9b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1255,7 +1255,7 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { + if (pmd_leaf(pmd)) { VM_WARN_ON(1); return 1; } @@ -1276,7 +1276,7 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)

pud = READ_ONCE(*pudp);

- if (!pud_table(pud)) { + if (pud_leaf(pud)) { VM_WARN_ON(1); return 1; }

-- 2.30.2

Show replies by date

David Hildenbrand

15 May 15 May

8:13 a.m.

On 15.05.25 08:34, Dev Jain wrote:

...

Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

-- Cheers, David / dhildenb

Dev Jain

8:22 a.m.

On 15/05/25 1:43 pm, David Hildenbrand wrote:

...

On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

...

If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

I wasn't able to, it is just an observation from code inspection.

...

David Hildenbrand

8:36 a.m.

On 15.05.25 10:22, Dev Jain wrote:

...

On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

...

...
If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

I wasn't able to, it is just an observation from code inspection.

That better be included in the patch description :)

-- Cheers, David / dhildenb

Dev Jain

8:40 a.m.

On 15/05/25 2:06 pm, David Hildenbrand wrote:

...

On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

...
...
If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

I wasn't able to, it is just an observation from code inspection.

That better be included in the patch description :)

I did, actually. My bad for not putting in spaces, I notice now that the description looks horrible to the eye :)

...

David Hildenbrand

8:49 a.m.

On 15.05.25 10:40, Dev Jain wrote:

...

On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

...
...
If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

I wasn't able to, it is just an observation from code inspection.

That better be included in the patch description :)

I did, actually. My bad for not putting in spaces, I notice now that the description looks horrible to the eye :)

Ahh, there it is. Sorry :) Yeah, some empty lines won't hurt!

-- Cheers, David / dhildenb

Dev Jain

8:47 a.m.

On 15/05/25 2:06 pm, David Hildenbrand wrote:

...

On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

...

...
...
If you were able to trigger this WARN, it's always a good idea to include the splat in the commit.

I wasn't able to, it is just an observation from code inspection.

That better be included in the patch description :)

David Hildenbrand

8:53 a.m.

On 15.05.25 10:47, Dev Jain wrote:

...

On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp)) + pmd_free_pte_page(pmdp, next); } while (pmdp++, next += PMD_SIZE, next != end);

pud_clear(pudp);

-- Cheers, David / dhildenb

Dev Jain

9:27 a.m.

On 15/05/25 2:23 pm, David Hildenbrand wrote:

...

On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

And also return 1? Also we should BUG_ON(!pmd_present(pmd)) to avoid the null dereference?

...

table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp)) + pmd_free_pte_page(pmdp, next);

Ah yes, the "caller" of pmd_free_pte_page() is not only vmap_try_huge_pmd but this also...my mind has been foggy lately... need to solve a math problem or two to sharpen it :)

...

} while (pmdp++, next += PMD_SIZE, next != end);

pud_clear(pudp);

David Hildenbrand

9:32 a.m.

On 15.05.25 11:27, Dev Jain wrote:

...

On 15/05/25 2:23 pm, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote: > Commit 9c006972c3fe removes the pxd_present() checks because the > caller > checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller > only > checks pud_present(); pud_free_pmd_page() recurses on each pmd > through > pmd_free_pte_page(), wherein the pmd may be none. The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

And also return 1?

I'll leave that to Catalin + Will.

I'm not a friend for adding runtime-overhead for soemthing that should not happen and be caught early during testing -> VM_WARN_ON_ONCE().

...

Also we should BUG_ON(!pmd_present(pmd)) to avoid the null dereference?

Definitely no BUG_ON :)

...

...
table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp)) + pmd_free_pte_page(pmdp, next);

Ah yes, the "caller" of pmd_free_pte_page() is not only vmap_try_huge_pmd but this also...my mind has been foggy lately... need to solve a math problem or two to sharpen it :)

Haha.

-- Cheers, David / dhildenb

Will Deacon

12:56 p.m.

On Thu, May 15, 2025 at 11:32:22AM +0200, David Hildenbrand wrote:

...

On 15.05.25 11:27, Dev Jain wrote:

...
On 15/05/25 2:23 pm, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote: > On 15.05.25 08:34, Dev Jain wrote: > > Commit 9c006972c3fe removes the pxd_present() checks because the > > caller > > checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller > > only > > checks pud_present(); pud_free_pmd_page() recurses on each pmd > > through > > pmd_free_pte_page(), wherein the pmd may be none. > The commit states: "The core code already has a check for pXd_none()", > so I assume that assumption was not true in all cases? > > Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

And also return 1?

I'll leave that to Catalin + Will.

I'm not a friend for adding runtime-overhead for soemthing that should not happen and be caught early during testing -> VM_WARN_ON_ONCE().

I definitely think we should return early if the pmd isn't a table. Otherwise, we could end up descending into God-knows-what!

Will

David Hildenbrand

1:04 p.m.

On 15.05.25 14:56, Will Deacon wrote:

...

On Thu, May 15, 2025 at 11:32:22AM +0200, David Hildenbrand wrote:

...
On 15.05.25 11:27, Dev Jain wrote:

...
On 15/05/25 2:23 pm, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote: > > > On 15/05/25 1:43 pm, David Hildenbrand wrote: >> On 15.05.25 08:34, Dev Jain wrote: >>> Commit 9c006972c3fe removes the pxd_present() checks because the >>> caller >>> checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller >>> only >>> checks pud_present(); pud_free_pmd_page() recurses on each pmd >>> through >>> pmd_free_pte_page(), wherein the pmd may be none. >> The commit states: "The core code already has a check for pXd_none()", >> so I assume that assumption was not true in all cases? >> >> Should that one problematic caller then check for pmd_none() instead? > > From what I could gather of Will's commit message, my > interpretation is > that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. > These individually check for pxd_present(): > > if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) > return 0; > > The problem is that vmap_try_huge_pud will also iterate on pte entries. > So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page > may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

And also return 1?

I'll leave that to Catalin + Will.

I'm not a friend for adding runtime-overhead for soemthing that should not happen and be caught early during testing -> VM_WARN_ON_ONCE().

I definitely think we should return early if the pmd isn't a table. Otherwise, we could end up descending into God-knows-what!

The question is: how did something that is not a table end up here, and why is it valid to check exactly that at runtime. Not strong opinion, it just feels a bit arbitrary to test for exactly that at runtime if it is completely unexpected.

-- Cheers, David / dhildenb

Will Deacon

1:21 p.m.

On Thu, May 15, 2025 at 03:04:50PM +0200, David Hildenbrand wrote:

...

On 15.05.25 14:56, Will Deacon wrote:

...
On Thu, May 15, 2025 at 11:32:22AM +0200, David Hildenbrand wrote:

...
On 15.05.25 11:27, Dev Jain wrote:

...
On 15/05/25 2:23 pm, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote: > On 15.05.25 10:22, Dev Jain wrote: > > > > > > On 15/05/25 1:43 pm, David Hildenbrand wrote: > > > On 15.05.25 08:34, Dev Jain wrote: > > > > Commit 9c006972c3fe removes the pxd_present() checks because the > > > > caller > > > > checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller > > > > only > > > > checks pud_present(); pud_free_pmd_page() recurses on each pmd > > > > through > > > > pmd_free_pte_page(), wherein the pmd may be none. > > > The commit states: "The core code already has a check for pXd_none()", > > > so I assume that assumption was not true in all cases? > > > > > > Should that one problematic caller then check for pmd_none() instead? > > > > From what I could gather of Will's commit message, my > > interpretation is > > that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. > > These individually check for pxd_present(): > > > > if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) > > return 0; > > > > The problem is that vmap_try_huge_pud will also iterate on pte entries. > > So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page > > may encounter a none pmd and trigger a WARN. > > Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky. > > I assume we should either have an explicit pmd_none() check in > pud_free_pmd_page() before calling pmd_free_pte_page(), or one in > pmd_free_pte_page(). > > With your patch, we'd be calling pte_free_kernel() on a NULL pointer, > which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)

pmd = READ_ONCE(*pmdp);

- if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

And also return 1?

I'll leave that to Catalin + Will.

I'm not a friend for adding runtime-overhead for soemthing that should not happen and be caught early during testing -> VM_WARN_ON_ONCE().

I definitely think we should return early if the pmd isn't a table. Otherwise, we could end up descending into God-knows-what!

The question is: how did something that is not a table end up here, and why is it valid to check exactly that at runtime. Not strong opinion, it just feels a bit arbitrary to test for exactly that at runtime if it is completely unexpected.

I see it a little bit like type-checking: we could see an invalid entry, a leaf entry or a table entry and we should only ever dereference the latter. If the VM_WARN_ON() is justified, then I find it jarring to go ahead with the dereference regardless of the type.

That said, maybe the VM_WARN_ON() should either be deleted or moved out to the callers in mm/vmalloc.c?

Will

Ryan Roberts

10:07 a.m.

On 15/05/2025 09:53, David Hildenbrand wrote:

...

On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote:

...
Commit 9c006972c3fe removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none.

The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

I just reviewed the patch and had the same feedback as David. I agree with the patch below, with some small mods...

...

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr) pmd = READ_ONCE(*pmdp); - if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

You don't need both of these warnings; pmd_table() is only true if the pmd is present (well actually only if it's _valid_ which is more strict than present), so the second one is sufficient on its own.

...

table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)

Given you are removing the runtime check and early return in pmd_free_pte_page(), I think you should modify this function to use the same style too.

...

next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp))

question: I wonder if it is better to use !pmd_none() as the condition here? It should either be none or a table at this point, so this allows the warning in pmd_free_pte_page() to catch more error conditions. No strong opinion though.

Thanks, Ryan

...

+ pmd_free_pte_page(pmdp, next); } while (pmdp++, next += PMD_SIZE, next != end); pud_clear(pudp);

David Hildenbrand

1:01 p.m.

On 15.05.25 12:07, Ryan Roberts wrote:

...

On 15/05/2025 09:53, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote:

...
On 15.05.25 08:34, Dev Jain wrote: > Commit 9c006972c3fe removes the pxd_present() checks because the caller > checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller > only > checks pud_present(); pud_free_pmd_page() recurses on each pmd through > pmd_free_pte_page(), wherein the pmd may be none. The commit states: "The core code already has a check for pXd_none()", so I assume that assumption was not true in all cases?

Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

I just reviewed the patch and had the same feedback as David. I agree with the patch below, with some small mods...

...
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr) pmd = READ_ONCE(*pmdp); - if (!pmd_table(pmd)) { - VM_WARN_ON(1); - return 1; - } + VM_WARN_ON(!pmd_present(pmd)); + VM_WARN_ON(!pmd_table(pmd));

You don't need both of these warnings; pmd_table() is only true if the pmd is present (well actually only if it's _valid_ which is more strict than present), so the second one is sufficient on its own.

Ah, right.

...

...
table = pte_offset_kernel(pmdp, addr); pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)

Given you are removing the runtime check and early return in pmd_free_pte_page(), I think you should modify this function to use the same style too.

BTW, the "return 1" is weird. But looking at x86, we seem to be making a private copy of the page table first, to defer freeing the page tables after the TLB flush.

I wonder if there isn't a better way (e.g., clear PUDP + flush tlb, then walk over the effectively-disconnected page table). But I'm sure there is a good reason for that.

...

...
next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp))

question: I wonder if it is better to use !pmd_none() as the condition here? It should either be none or a table at this point, so this allows the warning in pmd_free_pte_page() to catch more error conditions. No strong opinion though.

Same here. The existing callers check pmd_present().

-- Cheers, David / dhildenb

Ryan Roberts

1:14 p.m.

On 15/05/2025 14:01, David Hildenbrand wrote:

...

On 15.05.25 12:07, Ryan Roberts wrote:

...
On 15/05/2025 09:53, David Hildenbrand wrote:

...
On 15.05.25 10:47, Dev Jain wrote:

...
On 15/05/25 2:06 pm, David Hildenbrand wrote:

...
On 15.05.25 10:22, Dev Jain wrote:

...
On 15/05/25 1:43 pm, David Hildenbrand wrote: > On 15.05.25 08:34, Dev Jain wrote: >> Commit 9c006972c3fe removes the pxd_present() checks because the caller >> checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller >> only >> checks pud_present(); pud_free_pmd_page() recurses on each pmd through >> pmd_free_pte_page(), wherein the pmd may be none. > The commit states: "The core code already has a check for pXd_none()", > so I assume that assumption was not true in all cases? > > Should that one problematic caller then check for pmd_none() instead?

From what I could gather of Will's commit message, my interpretation is that the concerned callers are vmap_try_huge_pud and vmap_try_huge_pmd. These individually check for pxd_present():

if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))       return 0;

The problem is that vmap_try_huge_pud will also iterate on pte entries. So if the pud is present, then pud_free_pmd_page -> pmd_free_pte_page may encounter a none pmd and trigger a WARN.

Yeah, pud_free_pmd_page()->pmd_free_pte_page() looks shaky.

I assume we should either have an explicit pmd_none() check in pud_free_pmd_page() before calling pmd_free_pte_page(), or one in pmd_free_pte_page().

With your patch, we'd be calling pte_free_kernel() on a NULL pointer, which sounds wrong -- unless I am missing something important.

Ah thanks, you seem to be right. We will be extracting table from a none pmd. Perhaps we should still bail out for !pxd_present() but without the warning, which the fix commit used to do.

Right. We just make sure that all callers of pmd_free_pte_page() already check for it.

I'd just do something like:

I just reviewed the patch and had the same feedback as David. I agree with the patch below, with some small mods...

...
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db7..e98dd7af147d5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1274,10 +1274,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)          pmd = READ_ONCE(*pmdp); -       if (!pmd_table(pmd)) { -               VM_WARN_ON(1); -               return 1; -       } +       VM_WARN_ON(!pmd_present(pmd)); +       VM_WARN_ON(!pmd_table(pmd));

You don't need both of these warnings; pmd_table() is only true if the pmd is present (well actually only if it's _valid_ which is more strict than present), so the second one is sufficient on its own.

Ah, right.

...
...
table = pte_offset_kernel(pmdp, addr);          pmd_clear(pmdp); @@ -1305,7 +1303,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)

Given you are removing the runtime check and early return in pmd_free_pte_page(), I think you should modify this function to use the same style too.

BTW, the "return 1" is weird. But looking at x86, we seem to be making a private copy of the page table first, to defer freeing the page tables after the TLB flush.

I wonder if there isn't a better way (e.g., clear PUDP + flush tlb, then walk over the effectively-disconnected page table). But I'm sure there is a good reason for that.

As I understand it, the actual TLB entries should have been invalidated when the previous mappings we vfree'd. So the single page __flush_tlb_kernel_pgtable() calls here are to zap any table entries that may be in the walk cache. We could do an all-levels TLBI for the entire range, but for a system that doesn't support the tlbi-range operations, we would end up issuing a tlbi per page across the whole range which I think would be much slower than the one tlbi per pgtable we have here.

Things could be rearranged a bit so that we issue all the tlbis with only a single set of barriers (currently each __flush_tlb_kernel_pgtable() issues it's own barriers), but I'm not sure how important that micro-optimization is given I guess we never even call pud_free_pmd_page() in practice given we have had no reports of the warning tripping.

...

...
...
next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + if (pmd_present(*pmdp))

question: I wonder if it is better to use !pmd_none() as the condition here? It should either be none or a table at this point, so this allows the warning in pmd_free_pte_page() to catch more error conditions. No strong opinion though.

Same here. The existing callers check pmd_present().

Yeah fair let's be consistent and use pmd_present().

Thanks, Ryan

225

days inactive

225

days old

linux-stable-mirror@lists.linaro.org

15 comments

participants

tags (0)

participants (4)

David Hildenbrand
Dev Jain
Ryan Roberts
Will Deacon