Patch 1 adds the missing lock in memory hotplug operations, ensuring that the mem_hotplug_lock is held during critical operations like mhp_init_memmap_on_memory() and mhp_deinit_memmap_on_memory()
Patch 2 deals with error handling in add_memory_resource() and considers the possibility of altmap support. This ensures proper deallocation of struct pages, aligning with the allocation strategy.
Patch 3 relocates the vmem_altmap code to sparse-vmemmap.c, enabling the utilization of vmem_altmap_free() and vmem_altmap_offset() without the dependency on CONFIG_ZONE_DEVICE.
v2: * cc: stable@vger.kernel.org * made vmem_altmap_offset()/vmem_altmap_free() inline. * added Acked-by for patch 1 and patch 2.
v1: Changes since patch series "implement "memmap on memory" feature on s390: Patch 1 (Feedback from David): * Fixed subject for memory hotplug lock. * Added locking comments before memory_block_online()/memory_block_offline(). Patch 2 (Feedback from David): * Fixed Fixes-by commit. Patch 3 (lkp report): * when CONFIG_SPARSEMEM_VMEMMAP is disabled, resolve undefined reference to `vmem_altmap_offset' error. * Request for Reviewed-by once again due to the change in patch3.
Thank you
Sumanth Korikkar (3): mm/memory_hotplug: add missing mem_hotplug_lock mm/memory_hotplug: fix error handling in add_memory_resource() mm: use vmem_altmap code without CONFIG_ZONE_DEVICE
drivers/base/memory.c | 18 +++++++++++++++--- include/linux/memremap.h | 12 ------------ include/linux/mm.h | 26 ++++++++++++++++++++++++++ mm/memory_hotplug.c | 15 +++++++-------- mm/memremap.c | 14 +------------- 5 files changed, 49 insertions(+), 36 deletions(-)
From Documentation/core-api/memory-hotplug.rst: When adding/removing/onlining/offlining memory or adding/removing heterogeneous/device memory, we should always hold the mem_hotplug_lock in write mode to serialise memory hotplug (e.g. access to global/zone variables).
mhp_(de)init_memmap_on_memory() functions can change zone stats and struct page content, but they are currently called w/o the mem_hotplug_lock.
When memory block is being offlined and when kmemleak goes through each populated zone, the following theoretical race conditions could occur: CPU 0: | CPU 1: memory_offline() | -> offline_pages() | -> mem_hotplug_begin() | ... | -> mem_hotplug_done() | | kmemleak_scan() | -> get_online_mems() | ... -> mhp_deinit_memmap_on_memory() | [not protected by mem_hotplug_begin/done()]| Marks memory section as offline, | Retrieves zone_start_pfn poisons vmemmap struct pages and updates | and struct page members. the zone related data | | ... | -> put_online_mems()
Fix this by ensuring mem_hotplug_lock is taken before performing mhp_init_memmap_on_memory(). Also ensure that mhp_deinit_memmap_on_memory() holds the lock.
online/offline_pages() are currently only called from memory_block_online/offline(), so it is safe to move the locking there.
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com --- drivers/base/memory.c | 18 +++++++++++++++--- mm/memory_hotplug.c | 13 ++++++------- 2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c index f3b9a4d0fa3b..8a13babd826c 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -180,6 +180,9 @@ static inline unsigned long memblk_nr_poison(struct memory_block *mem) } #endif
+/* + * Must acquire mem_hotplug_lock in write mode. + */ static int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); @@ -204,10 +207,11 @@ static int memory_block_online(struct memory_block *mem) if (mem->altmap) nr_vmemmap_pages = mem->altmap->free;
+ mem_hotplug_begin(); if (nr_vmemmap_pages) { ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); if (ret) - return ret; + goto out; }
ret = online_pages(start_pfn + nr_vmemmap_pages, @@ -215,7 +219,7 @@ static int memory_block_online(struct memory_block *mem) if (ret) { if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); - return ret; + goto out; }
/* @@ -227,9 +231,14 @@ static int memory_block_online(struct memory_block *mem) nr_vmemmap_pages);
mem->zone = zone; +out: + mem_hotplug_done(); return ret; }
+/* + * Must acquire mem_hotplug_lock in write mode. + */ static int memory_block_offline(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); @@ -247,6 +256,7 @@ static int memory_block_offline(struct memory_block *mem) if (mem->altmap) nr_vmemmap_pages = mem->altmap->free;
+ mem_hotplug_begin(); if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, -nr_vmemmap_pages); @@ -258,13 +268,15 @@ static int memory_block_offline(struct memory_block *mem) if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, nr_vmemmap_pages); - return ret; + goto out; }
if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
mem->zone = NULL; +out: + mem_hotplug_done(); return ret; }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1b03f4ec6fd2..c8238fc5edcb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1129,6 +1129,9 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages) kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); }
+/* + * Must be called with mem_hotplug_lock in write mode. + */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { @@ -1149,7 +1152,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL;
- mem_hotplug_begin();
/* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); @@ -1208,7 +1210,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, writeback_set_ratelimit();
memory_notify(MEM_ONLINE, &arg); - mem_hotplug_done(); return 0;
failed_addition: @@ -1217,7 +1218,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); - mem_hotplug_done(); return ret; }
@@ -1863,6 +1863,9 @@ static int count_system_ram_pages_cb(unsigned long start_pfn, return 0; }
+/* + * Must be called with mem_hotplug_lock in write mode. + */ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { @@ -1885,8 +1888,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL;
- mem_hotplug_begin(); - /* * Don't allow to offline memory blocks that contain holes. * Consequently, memory blocks with holes can never get onlined @@ -2027,7 +2028,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
memory_notify(MEM_OFFLINE, &arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); - mem_hotplug_done(); return 0;
failed_removal_isolated: @@ -2042,7 +2042,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason); - mem_hotplug_done(); return ret; }
On Mon, Nov 20, 2023 at 11:27:32AM +0100, Sumanth Korikkar wrote:
From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing heterogeneous/device memory, we should always hold the mem_hotplug_lock in write mode to serialise memory hotplug (e.g. access to global/zone variables).
mhp_(de)init_memmap_on_memory() functions can change zone stats and struct page content, but they are currently called w/o the mem_hotplug_lock.
When memory block is being offlined and when kmemleak goes through each populated zone, the following theoretical race conditions could occur: CPU 0: | CPU 1: memory_offline() | -> offline_pages() | -> mem_hotplug_begin() | ... | -> mem_hotplug_done() | | kmemleak_scan() | -> get_online_mems() | ... -> mhp_deinit_memmap_on_memory() | [not protected by mem_hotplug_begin/done()]| Marks memory section as offline, | Retrieves zone_start_pfn poisons vmemmap struct pages and updates | and struct page members. the zone related data | | ... | -> put_online_mems()
Fix this by ensuring mem_hotplug_lock is taken before performing mhp_init_memmap_on_memory(). Also ensure that mhp_deinit_memmap_on_memory() holds the lock.
online/offline_pages() are currently only called from memory_block_online/offline(), so it is safe to move the locking there.
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com
drivers/base/memory.c | 18 +++++++++++++++--- mm/memory_hotplug.c | 13 ++++++------- 2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c index f3b9a4d0fa3b..8a13babd826c 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -180,6 +180,9 @@ static inline unsigned long memblk_nr_poison(struct memory_block *mem) } #endif +/*
- Must acquire mem_hotplug_lock in write mode.
- */
static int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); @@ -204,10 +207,11 @@ static int memory_block_online(struct memory_block *mem) if (mem->altmap) nr_vmemmap_pages = mem->altmap->free;
- mem_hotplug_begin(); if (nr_vmemmap_pages) { ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); if (ret)
return ret;
}goto out;
ret = online_pages(start_pfn + nr_vmemmap_pages, @@ -215,7 +219,7 @@ static int memory_block_online(struct memory_block *mem) if (ret) { if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
return ret;
}goto out;
/* @@ -227,9 +231,14 @@ static int memory_block_online(struct memory_block *mem) nr_vmemmap_pages); mem->zone = zone; +out:
- mem_hotplug_done(); return ret;
} +/*
- Must acquire mem_hotplug_lock in write mode.
- */
static int memory_block_offline(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); @@ -247,6 +256,7 @@ static int memory_block_offline(struct memory_block *mem) if (mem->altmap) nr_vmemmap_pages = mem->altmap->free;
- mem_hotplug_begin(); if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, -nr_vmemmap_pages);
@@ -258,13 +268,15 @@ static int memory_block_offline(struct memory_block *mem) if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, nr_vmemmap_pages);
return ret;
}goto out;
if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); mem->zone = NULL; +out:
- mem_hotplug_done(); return ret;
} diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1b03f4ec6fd2..c8238fc5edcb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1129,6 +1129,9 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages) kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); } +/*
- Must be called with mem_hotplug_lock in write mode.
- */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { @@ -1149,7 +1152,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL;
- mem_hotplug_begin();
/* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); @@ -1208,7 +1210,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, writeback_set_ratelimit(); memory_notify(MEM_ONLINE, &arg);
- mem_hotplug_done(); return 0;
failed_addition: @@ -1217,7 +1218,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); remove_pfn_range_from_zone(zone, pfn, nr_pages);
- mem_hotplug_done(); return ret;
} @@ -1863,6 +1863,9 @@ static int count_system_ram_pages_cb(unsigned long start_pfn, return 0; } +/*
- Must be called with mem_hotplug_lock in write mode.
- */
int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { @@ -1885,8 +1888,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL;
- mem_hotplug_begin();
- /*
- Don't allow to offline memory blocks that contain holes.
- Consequently, memory blocks with holes can never get onlined
@@ -2027,7 +2028,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, memory_notify(MEM_OFFLINE, &arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
- mem_hotplug_done(); return 0;
failed_removal_isolated: @@ -2042,7 +2042,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason);
- mem_hotplug_done(); return ret;
} -- 2.41.0
Hi,
This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree.
You are receiving this message because of the following common error(s) as indicated below:
- You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this.
If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers.
thanks,
greg k-h's patch email bot
In add_memory_resource(), creation of memory block devices occurs after successful call to arch_add_memory(). However, creation of memory block devices could fail. In that case, arch_remove_memory() is called to perform necessary cleanup.
Currently with or without altmap support, arch_remove_memory() is always passed with altmap set to NULL during error handling. This leads to freeing of struct pages using free_pages(), eventhough the allocation might have been performed with altmap support via altmap_alloc_block_buf().
Fix the error handling by passing altmap in arch_remove_memory(). This ensures the following: * When altmap is disabled, deallocation of the struct pages array occurs via free_pages(). * When altmap is enabled, deallocation occurs via vmem_altmap_free().
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c8238fc5edcb..4f476a970e84 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1458,7 +1458,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) /* create memory block devices after memory was added */ ret = create_memory_block_devices(start, size, params.altmap, group); if (ret) { - arch_remove_memory(start, size, NULL); + arch_remove_memory(start, size, params.altmap); goto error_free; }
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v2 2/3] mm/memory_hotplug: fix error handling in add_memory_resource() Link: https://lore.kernel.org/stable/20231120102734.2001576-3-sumanthk%40linux.ibm...
vmem_altmap_free() and vmem_altmap_offset() could be utlized without CONFIG_ZONE_DEVICE enabled. For example, mm/memory_hotplug.c:__add_pages() relies on that. The altmap is no longer restricted to ZONE_DEVICE handling, but instead depends on CONFIG_SPARSEMEM_VMEMMAP.
When CONFIG_SPARSEMEM_VMEMMAP is disabled, these functions are defined as inline stubs, ensuring compatibility with configurations that do not use sparsemem vmemmap. Without it, lkp reported the following:
ld: arch/x86/mm/init_64.o: in function `remove_pagetable': init_64.c:(.meminit.text+0xfc7): undefined reference to `vmem_altmap_free'
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202311180545.VeyRXEDq-lkp@intel.com/ Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com --- include/linux/memremap.h | 12 ------------ include/linux/mm.h | 26 ++++++++++++++++++++++++++ mm/memremap.c | 14 +------------- 3 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1314d9c5f05b..744c830f4b13 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -196,8 +196,6 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap); bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); -void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); unsigned long memremap_compat_align(void); #else static inline void *devm_memremap_pages(struct device *dev, @@ -228,16 +226,6 @@ static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn) return false; }
-static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) -{ - return 0; -} - -static inline void vmem_altmap_free(struct vmem_altmap *altmap, - unsigned long nr_pfns) -{ -} - /* when memremap_pages() is disabled all archs can remap a single page */ static inline unsigned long memremap_compat_align(void) { diff --git a/include/linux/mm.h b/include/linux/mm.h index bf5d0b1b16f4..f2344fd8acbe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3786,6 +3786,32 @@ void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif
+#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) +{ + /* number of pfns from base where pfn_to_page() is valid */ + if (altmap) + return altmap->reserve + altmap->free; + return 0; +} + +static inline void vmem_altmap_free(struct vmem_altmap *altmap, + unsigned long nr_pfns) +{ + altmap->alloc -= nr_pfns; +} +#else +static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) +{ + return 0; +} + +static inline void vmem_altmap_free(struct vmem_altmap *altmap, + unsigned long nr_pfns) +{ +} +#endif + #define VMEMMAP_RESERVE_NR 2 #ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap, diff --git a/mm/memremap.c b/mm/memremap.c index bee85560a243..9531faa92a7c 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -7,6 +7,7 @@ #include <linux/memremap.h> #include <linux/pfn_t.h> #include <linux/swap.h> +#include <linux/mm.h> #include <linux/mmzone.h> #include <linux/swapops.h> #include <linux/types.h> @@ -422,19 +423,6 @@ void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap) } EXPORT_SYMBOL_GPL(devm_memunmap_pages);
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) -{ - /* number of pfns from base where pfn_to_page() is valid */ - if (altmap) - return altmap->reserve + altmap->free; - return 0; -} - -void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns) -{ - altmap->alloc -= nr_pfns; -} - /** * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn * @pfn: page frame number to lookup page_map
On 20.11.23 11:27, Sumanth Korikkar wrote:
vmem_altmap_free() and vmem_altmap_offset() could be utlized without CONFIG_ZONE_DEVICE enabled. For example, mm/memory_hotplug.c:__add_pages() relies on that. The altmap is no longer restricted to ZONE_DEVICE handling, but instead depends on CONFIG_SPARSEMEM_VMEMMAP.
When CONFIG_SPARSEMEM_VMEMMAP is disabled, these functions are defined as inline stubs, ensuring compatibility with configurations that do not use sparsemem vmemmap. Without it, lkp reported the following:
ld: arch/x86/mm/init_64.o: in function `remove_pagetable': init_64.c:(.meminit.text+0xfc7): undefined reference to `vmem_altmap_free'
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202311180545.VeyRXEDq-lkp@intel.com/ Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Sumanth Korikkar sumanthk@linux.ibm.com
Acked-by: David Hildenbrand david@redhat.com
linux-stable-mirror@lists.linaro.org