Changes since v2 [1]: - Collect some reviewed-by's from David and Oscar
- Rework subsection validity to include pfn_valid() gated by CONFIG_HAVE_ARCH_PFN_VALID (David, Oscar)
- Introduce pgmap_pfn_valid() to validate metadata vs data in a pgmap (David)
! Kill put_ref_page(): the extra "if (ref_page) put_page(ref_page)" still feels more cluttered than adding a tiny helper. (Oscar)
[1]: http://lore.kernel.org/r/161044407603.1482714.16630477578392768273.stgit@dwi...
---
Michal reminds that the discussion about how to ensure pfn-walkers do not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes fell out:
1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case.
2/ The libnvdimm sysfs attribute visibility code was failing to publish the resource base for memmap=ss!nn defined namespaces. This is needed for the regression test for soft_offline_page().
3/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages, however that mapping may contain data pages and metadata raw pfns. Introduce pgmap_pfn_valid() to delineate the 2 types and fail the handling of raw metadata pfns.
---
Dan Williams (6): mm: Move pfn_to_online_page() out of line mm: Teach pfn_to_online_page() to consider subsection validity mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions mm: Fix page reference leak in soft_offline_page() mm: Fix memory_failure() handling of dax-namespace metadata libnvdimm/namespace: Fix visibility of namespace resource attribute
drivers/nvdimm/namespace_devs.c | 10 +++--- include/linux/memory_hotplug.h | 17 +-------- include/linux/memremap.h | 6 +++ include/linux/mmzone.h | 22 ++++++++---- mm/memory-failure.c | 26 ++++++++++++-- mm/memory_hotplug.c | 70 +++++++++++++++++++++++++++++++++++++++ mm/memremap.c | 15 ++++++++ 7 files changed, 134 insertions(+), 32 deletions(-)
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; }
+static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + }
if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; }
Legacy pmem namespaces lost support for the "resource" attribute when the code was cleaned up to put the permission visibility in the declaration. Restore this by listing 'resource' in the default attributes.
A new ndctl regression test for pfn_to_online_page() corner cases builds on this fix.
Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute") Cc: Vishal Verma vishal.l.verma@intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/nvdimm/namespace_devs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 6da67f4d641a..2403b71b601e 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1635,11 +1635,11 @@ static umode_t namespace_visible(struct kobject *kobj, return a->mode; }
- if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr - || a == &dev_attr_holder.attr - || a == &dev_attr_holder_class.attr - || a == &dev_attr_force_raw.attr - || a == &dev_attr_mode.attr) + /* base is_namespace_io() attributes */ + if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr || + a == &dev_attr_holder.attr || a == &dev_attr_holder_class.attr || + a == &dev_attr_force_raw.attr || a == &dev_attr_mode.attr || + a == &dev_attr_resource.attr) return a->mode;
return 0;
On Tue, Jan 12, 2021 at 11:35:50PM -0800, Dan Williams wrote:
Legacy pmem namespaces lost support for the "resource" attribute when the code was cleaned up to put the permission visibility in the declaration. Restore this by listing 'resource' in the default attributes.
A new ndctl regression test for pfn_to_online_page() corner cases builds on this fix.
Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute") Cc: Vishal Verma vishal.l.verma@intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/nvdimm/namespace_devs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
linux-stable-mirror@lists.linaro.org