Changes since v1 [1]: - Clarify the failing condition in patch 3 (Michal) - Clarify how subsection collisions manifest in shipping systems (Michal) - Use zone_idx() (Michal) - Move section_taint_zone_device() conditions to move_pfn_range_to_zone() (Michal) - Fix pfn_to_online_page() to account for pfn_valid() vs pfn_section_valid() confusion (David)
[1]: http://lore.kernel.org/r/160990599013.2430134.11556277600719835946.stgit@dwi...
---
Michal reminds that the discussion about how to ensure pfn-walkers do not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes fell out:
1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case.
2/ The libnvdimm sysfs attribute visibility code was failing to publish the resource base for memmap=ss!nn defined namespaces. This is needed for the regression test for soft_offline_page().
---
Dan Williams (5): mm: Move pfn_to_online_page() out of line mm: Teach pfn_to_online_page() to consider subsection validity mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions mm: Fix page reference leak in soft_offline_page() libnvdimm/namespace: Fix visibility of namespace resource attribute
drivers/nvdimm/namespace_devs.c | 10 +++--- include/linux/memory_hotplug.h | 17 +---------- include/linux/mmzone.h | 22 +++++++++----- mm/memory-failure.c | 20 ++++++++++--- mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 99 insertions(+), 32 deletions(-)
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference needs to be dropped when pfn_to_online_page() fails.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@kernel.org Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; }
+static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + }
if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; }
On Tue, Jan 12, 2021 at 01:34:58AM -0800, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference needs to be dropped when pfn_to_online_page() fails.
I would be more specific here wrt. get_user_pages (madvise). soft_offline_page gets called from more places besides madvise_*.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@kernel.org Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
LGTM, thanks for catching this:
Reviewed-by: Oscar Salvador osalvador@suse.de
A nit below.
mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; } +static void put_ref_page(struct page *page) +{
- if (page)
put_page(page);
+}
I am not sure this warrants a function. I would probably go with "if (ref_page).." in the two corresponding places, but not feeling strong here.
/**
- soft_offline_page - Soft offline a page.
- @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret;
- struct page *page; bool try_again = true;
- struct page *page, *ref_page = NULL;
- WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
Did you see any scenario where this could happen? I understand that you are adding this because we will leak a reference in case pfn is not valid anymore.
On Tue, Jan 12, 2021 at 1:54 AM Oscar Salvador osalvador@suse.de wrote:
On Tue, Jan 12, 2021 at 01:34:58AM -0800, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference needs to be dropped when pfn_to_online_page() fails.
I would be more specific here wrt. get_user_pages (madvise). soft_offline_page gets called from more places besides madvise_*.
Sure.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@kernel.org Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
LGTM, thanks for catching this:
Reviewed-by: Oscar Salvador osalvador@suse.de
A nit below.
mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; }
+static void put_ref_page(struct page *page) +{
if (page)
put_page(page);
+}
I am not sure this warrants a function. I would probably go with "if (ref_page).." in the two corresponding places, but not feeling strong here.
I'll take another look, it felt cluttered...
/**
- soft_offline_page - Soft offline a page.
- @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret;
struct page *page; bool try_again = true;
struct page *page, *ref_page = NULL;
WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
Did you see any scenario where this could happen? I understand that you are adding this because we will leak a reference in case pfn is not valid anymore.
I did not, more future proofing / documenting against refactoring that fails to consider that case.
On 12.01.21 10:34, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference needs to be dropped when pfn_to_online_page() fails.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Michal Hocko mhocko@kernel.org Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; } +static void put_ref_page(struct page *page) +{
- if (page)
put_page(page);
+}
/**
- soft_offline_page - Soft offline a page.
- @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret;
- struct page *page; bool try_again = true;
- struct page *page, *ref_page = NULL;
- WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn)) return -ENXIO;
- if (flags & MF_COUNT_INCREASED)
ref_page = pfn_to_page(pfn);
- /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn);
- if (!page)
- if (!page) {
return -EIO;put_ref_page(ref_page);
- }
if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
if (flags & MF_COUNT_INCREASED)
put_page(page);
return 0; }put_ref_page(ref_page);
Reviewed-by: David Hildenbrand david@redhat.com
Legacy pmem namespaces lost support for the "resource" attribute when the code was cleaned up to put the permission visibility in the declaration. Restore this by listing 'resource' in the default attributes.
A new ndctl regression test for pfn_to_online_page() corner cases builds on this fix.
Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute") Cc: Vishal Verma vishal.l.verma@intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/nvdimm/namespace_devs.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 6da67f4d641a..2403b71b601e 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1635,11 +1635,11 @@ static umode_t namespace_visible(struct kobject *kobj, return a->mode; }
- if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr - || a == &dev_attr_holder.attr - || a == &dev_attr_holder_class.attr - || a == &dev_attr_force_raw.attr - || a == &dev_attr_mode.attr) + /* base is_namespace_io() attributes */ + if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr || + a == &dev_attr_holder.attr || a == &dev_attr_holder_class.attr || + a == &dev_attr_force_raw.attr || a == &dev_attr_mode.attr || + a == &dev_attr_resource.attr) return a->mode;
return 0;
linux-stable-mirror@lists.linaro.org