Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the vmemmap to populate at the PMD section level which was suitable initially since hotplugging granule is always 128M. However, commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") which added 2M hotplugging granule disrupted the arm64 assumptions.
Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when pmd_sect() is true, the entire PMD section is cleared, even if there is other effective subsection. For example pagemap1 and pagemap2 are part of a single PMD entry and they are hot-added sequentially. Then pagemap1 is removed, vmemmap_free() will clear the entire PMD entry freeing the struct page metadata for the whole section, even though pagemap2 is still active.
To address the issue, we need to prevent PMD/PUD/CONT mappings for both linear and vmemmap for non-boot sections if the size exceeds 2MB (considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page configuration.
Cc: stable@vger.kernel.org # v5.4+ Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Zhenhua Huang quic_zhenhuah@quicinc.com --- Hi Catalin and Anshuman, Based on your review comments, I concluded below patch and tested with my setup. I have not folded patchset #2 since this patch seems to be enough for backporting.. Please see if you have further suggestions.
arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e2739b69e11b..2b4d23f01d85 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -42,9 +42,11 @@ #include <asm/pgalloc.h> #include <asm/kfence.h>
-#define NO_BLOCK_MAPPINGS BIT(0) +#define NO_PMD_BLOCK_MAPPINGS BIT(0) #define NO_CONT_MAPPINGS BIT(1) #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ +#define NO_PUD_BLOCK_MAPPINGS BIT(3) /* Hotplug case: do not want block mapping for PUD */ +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); @@ -254,7 +256,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
/* try section mapping first */ if (((addr | next | phys) & ~PMD_MASK) == 0 && - (flags & NO_BLOCK_MAPPINGS) == 0) { + (flags & NO_PMD_BLOCK_MAPPINGS) == 0) { pmd_set_huge(pmdp, phys, prot);
/* @@ -356,10 +358,11 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
/* * For 4K granule only, attempt to put down a 1GB block + * Hotplug case: do not attempt 1GB block */ if (pud_sect_supported() && ((addr | next | phys) & ~PUD_MASK) == 0 && - (flags & NO_BLOCK_MAPPINGS) == 0) { + (flags & NO_PUD_BLOCK_MAPPINGS) == 0) { pud_set_huge(pudp, phys, prot);
/* @@ -1175,9 +1178,16 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { + unsigned long start_pfn; + struct mem_section *ms; + WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) + start_pfn = page_to_pfn((struct page *)start); + ms = __pfn_to_section(start_pfn); + + /* Hotplugged section not support hugepages */ + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms)) return vmemmap_populate_basepages(start, end, node, altmap); else return vmemmap_populate_hugepages(start, end, node, altmap); @@ -1339,9 +1349,24 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) { int ret, flags = NO_EXEC_MAPPINGS; + unsigned long start_pfn = page_to_pfn((struct page *)start); + struct mem_section *ms = __pfn_to_section(start_pfn);
VM_BUG_ON(!mhp_range_allowed(start, size, true));
+ /* Should not be invoked by early section */ + WARN_ON(early_section(ms)); + + if (IS_ENABLED(CONFIG_ARM64_4K_PAGES)) + /* + * As per subsection granule is 2M, allow PMD block mapping in + * case 4K PAGES. + * Other cases forbid section mapping. + */ + flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; + else + flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; + if (can_set_direct_map()) flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
Hello Zhenhua,
On 1/3/25 14:20, Zhenhua Huang wrote:
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the vmemmap to populate at the PMD section level which was suitable initially since hotplugging granule is always 128M. However,
A small nit s/hotplugging/hot plug/
Also please do mention that 128M is SECTION_SIZE_BITS == 27 on arm64 platform for 4K base pages which is the page size in context here.
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") which added 2M hotplugging granule disrupted the arm64 assumptions.
A small nit s/hotplugging/hot plug/
Also please do mention that 2M is SUB_SECTION_SIZE.
Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when pmd_sect() is true, the entire PMD section is cleared, even if there is other effective subsection. For example pagemap1 and pagemap2 are part of a single PMD entry and they are hot-added sequentially. Then pagemap1 is removed, vmemmap_free() will clear the entire PMD entry freeing the struct page metadata for the whole section, even though pagemap2 is still active.
I guess pagemap1/2 here indicates the struct pages virtual regions for two different vmemmap mapped sub sections covered via a single PMD entry ? But please do update the above paragraph appropriately because pagemap<N> might be confused with /proc/<pid>/pagemap mechanism.
Also please do mention that similar problems exist with linear mapping for 16K (PMD = 32M) and 64K (PMD = 512M) base pages as their block mappings exceed SUBSECTION_SIZE. Hence tearing down the entire PMD mapping too will leave other subsections unmapped in the linear mapping.
To address the issue, we need to prevent PMD/PUD/CONT mappings for both linear and vmemmap for non-boot sections if the size exceeds 2MB
s/if the size/if corresponding size on the given base page/
(considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page configuration.
PMD block in 4K page size config as it's PMD_SIZE matches the SUBSECTION_SIZE but only for linear mapping.
Cc: stable@vger.kernel.org # v5.4+ Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Zhenhua Huang quic_zhenhuah@quicinc.com
Hi Catalin and Anshuman, Based on your review comments, I concluded below patch and tested with my setup. I have not folded patchset #2 since this patch seems to be enough for backporting.. Please see if you have further suggestions.
arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e2739b69e11b..2b4d23f01d85 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -42,9 +42,11 @@ #include <asm/pgalloc.h> #include <asm/kfence.h> -#define NO_BLOCK_MAPPINGS BIT(0) +#define NO_PMD_BLOCK_MAPPINGS BIT(0) #define NO_CONT_MAPPINGS BIT(1) #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ +#define NO_PUD_BLOCK_MAPPINGS BIT(3) /* Hotplug case: do not want block mapping for PUD */ +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
Should not NO_PMD_BLOCK_MAPPINGS and NO_PUD_BLOCK_MAPPINGS be adjacent bits for better readability ? But that will also cause some additional churn.
u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); @@ -254,7 +256,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, /* try section mapping first */ if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
(flags & NO_PMD_BLOCK_MAPPINGS) == 0) {
Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS as it will now contain NO_PMD_BLOCK_MAPPINGS.
pmd_set_huge(pmdp, phys, prot);
/* @@ -356,10 +358,11 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, /* * For 4K granule only, attempt to put down a 1GB block
*/ if (pud_sect_supported() && ((addr | next | phys) & ~PUD_MASK) == 0 &&* Hotplug case: do not attempt 1GB block
(flags & NO_BLOCK_MAPPINGS) == 0) {
(flags & NO_PUD_BLOCK_MAPPINGS) == 0) {
Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS as it will now contain NO_PUD_BLOCK_MAPPINGS.
pud_set_huge(pudp, phys, prot);
/* @@ -1175,9 +1178,16 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) {
- unsigned long start_pfn;
- struct mem_section *ms;
- WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
- start_pfn = page_to_pfn((struct page *)start);
- ms = __pfn_to_section(start_pfn);
- /* Hotplugged section not support hugepages */
Please update the comment as
/* * Hotplugged section does not support hugepages as * PMD_SIZE (hence PUD_SIZE) section mapping covers * struct page range that exceeds a SUBSECTION_SIZE * i.e 2MB - for all available base page sizes. */
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms)) return vmemmap_populate_basepages(start, end, node, altmap); else return vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1339,9 +1349,24 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) { int ret, flags = NO_EXEC_MAPPINGS;
- unsigned long start_pfn = page_to_pfn((struct page *)start);
- struct mem_section *ms = __pfn_to_section(start_pfn);
VM_BUG_ON(!mhp_range_allowed(start, size, true));
- /* Should not be invoked by early section */
- WARN_ON(early_section(ms));
- if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))
- /*
* As per subsection granule is 2M, allow PMD block mapping in
* case 4K PAGES.
* Other cases forbid section mapping.
*/
IIUC subsection size is 2M regardless of the page size. But with 4K pages on arm64, PMD_SIZE happen to be 2M. Hence there is no problem in creating linear mappings at PMD block level, which will not be the case with other page sizes i.e 16K and 64K.
include/linux/mmzone.h
#define SUBSECTION_SHIFT 21 #define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)
Please update the comment with following changes but above IS_ENABLED() statement as it talks about all page size configs.
/* * 4K base page's PMD_SIZE matches SUBSECTION_SIZE i.e 2MB. Hence * PMD section mapping can be allowed, but only for 4K base pages. * Where as PMD_SIZE (hence PUD_SIZE) for other page sizes exceed * SUBSECTION_SIZE. */
flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- else
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- if (can_set_direct_map()) flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
Thanks Anshuman for your detailed review!
On 2025/1/6 23:11, Anshuman Khandual wrote:
Hello Zhenhua,
On 1/3/25 14:20, Zhenhua Huang wrote:
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the vmemmap to populate at the PMD section level which was suitable initially since hotplugging granule is always 128M. However,
A small nit s/hotplugging/hot plug/
Also please do mention that 128M is SECTION_SIZE_BITS == 27 on arm64 platform for 4K base pages which is the page size in context here.
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") which added 2M hotplugging granule disrupted the arm64 assumptions.
A small nit s/hotplugging/hot plug/
Also please do mention that 2M is SUB_SECTION_SIZE.
Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when pmd_sect() is true, the entire PMD section is cleared, even if there is other effective subsection. For example pagemap1 and pagemap2 are part of a single PMD entry and they are hot-added sequentially. Then pagemap1 is removed, vmemmap_free() will clear the entire PMD entry freeing the struct page metadata for the whole section, even though pagemap2 is still active.
I guess pagemap1/2 here indicates the struct pages virtual regions for two different vmemmap mapped sub sections covered via a single PMD entry ? But please do update the above paragraph appropriately because pagemap<N> might be confused with /proc/<pid>/pagemap mechanism.
Also please do mention that similar problems exist with linear mapping for 16K (PMD = 32M) and 64K (PMD = 512M) base pages as their block mappings exceed SUBSECTION_SIZE. Hence tearing down the entire PMD mapping too will leave other subsections unmapped in the linear mapping.
Make sense to me, will update comments.
To address the issue, we need to prevent PMD/PUD/CONT mappings for both linear and vmemmap for non-boot sections if the size exceeds 2MB
s/if the size/if corresponding size on the given base page/
(considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page configuration.
PMD block in 4K page size config as it's PMD_SIZE matches the SUBSECTION_SIZE but only for linear mapping.
Cc: stable@vger.kernel.org # v5.4+ Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Zhenhua Huang quic_zhenhuah@quicinc.com
Hi Catalin and Anshuman, Based on your review comments, I concluded below patch and tested with my setup. I have not folded patchset #2 since this patch seems to be enough for backporting.. Please see if you have further suggestions.
arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e2739b69e11b..2b4d23f01d85 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -42,9 +42,11 @@ #include <asm/pgalloc.h> #include <asm/kfence.h> -#define NO_BLOCK_MAPPINGS BIT(0) +#define NO_PMD_BLOCK_MAPPINGS BIT(0) #define NO_CONT_MAPPINGS BIT(1) #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ +#define NO_PUD_BLOCK_MAPPINGS BIT(3) /* Hotplug case: do not want block mapping for PUD */ +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
Should not NO_PMD_BLOCK_MAPPINGS and NO_PUD_BLOCK_MAPPINGS be adjacent bits for better readability ? But that will also cause some additional churn.
Agree, I see these macros are only used in this file, it's safe to update.
u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); @@ -254,7 +256,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, /* try section mapping first */ if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
(flags & NO_PMD_BLOCK_MAPPINGS) == 0) {
Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS as it will now contain NO_PMD_BLOCK_MAPPINGS.
Hmm... do you want me to include these in comments? The code appears to be clear as it is.
pmd_set_huge(pmdp, phys, prot);
/* @@ -356,10 +358,11 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, /* * For 4K granule only, attempt to put down a 1GB block
*/ if (pud_sect_supported() && ((addr | next | phys) & ~PUD_MASK) == 0 &&* Hotplug case: do not attempt 1GB block
(flags & NO_BLOCK_MAPPINGS) == 0) {
(flags & NO_PUD_BLOCK_MAPPINGS) == 0) {
Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS as it will now contain NO_PUD_BLOCK_MAPPINGS.
Ditto.
pud_set_huge(pudp, phys, prot);
/* @@ -1175,9 +1178,16 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node, int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) {
- unsigned long start_pfn;
- struct mem_section *ms;
- WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
- start_pfn = page_to_pfn((struct page *)start);
- ms = __pfn_to_section(start_pfn);
- /* Hotplugged section not support hugepages */
Please update the comment as
/* * Hotplugged section does not support hugepages as * PMD_SIZE (hence PUD_SIZE) section mapping covers * struct page range that exceeds a SUBSECTION_SIZE * i.e 2MB - for all available base page sizes. */
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms)) return vmemmap_populate_basepages(start, end, node, altmap); else return vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1339,9 +1349,24 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) { int ret, flags = NO_EXEC_MAPPINGS;
- unsigned long start_pfn = page_to_pfn((struct page *)start);
- struct mem_section *ms = __pfn_to_section(start_pfn);
VM_BUG_ON(!mhp_range_allowed(start, size, true));
- /* Should not be invoked by early section */
- WARN_ON(early_section(ms));
- if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))
- /*
* As per subsection granule is 2M, allow PMD block mapping in
* case 4K PAGES.
* Other cases forbid section mapping.
*/
IIUC subsection size is 2M regardless of the page size. But with 4K pages on arm64, PMD_SIZE happen to be 2M. Hence there is no problem in creating linear mappings at PMD block level, which will not be the case with other page sizes i.e 16K and 64K.
Yes.
include/linux/mmzone.h
#define SUBSECTION_SHIFT 21 #define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)
Please update the comment with following changes but above IS_ENABLED() statement as it talks about all page size configs.
/* * 4K base page's PMD_SIZE matches SUBSECTION_SIZE i.e 2MB. Hence * PMD section mapping can be allowed, but only for 4K base pages. * Where as PMD_SIZE (hence PUD_SIZE) for other page sizes exceed * SUBSECTION_SIZE. */
Nice comments! Thanks.
flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- else
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- if (can_set_direct_map()) flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
linux-stable-mirror@lists.linaro.org