Changes since v3 [1]: - Switch to "if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn))" (David) - Finish collecting reviewed-bys across all patches in the series - Drop the libnvdimm fixup, to be merged through nvdimm.git not -mm
[1]: http://lore.kernel.org/r/161052331545.1805594.2356512831689786960.stgit@dwil...
---
Andrew,
All patches in this series have been reviewed and the kbuild-robot reports a build-success over 172 configs. They pass an updated version of the nvdimm unit tests to exercise corner cases of pfn_to_online_page() and get_dev_pagemap() [2], and apply cleanly to current -next.
Please apply, thanks.
[2]: http://lore.kernel.org/r/161052209289.1804207.11599120961607513911.stgit@dwi...
---
Michal reminds that the discussion about how to ensure pfn-walkers do not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes fell out:
1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case.
2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages, however that mapping may contain data pages and metadata raw pfns. Introduce pgmap_pfn_valid() to delineate the 2 types and fail the handling of raw metadata pfns.
---
Dan Williams (5): mm: Move pfn_to_online_page() out of line mm: Teach pfn_to_online_page() to consider subsection validity mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions mm: Fix page reference leak in soft_offline_page() mm: Fix memory_failure() handling of dax-namespace metadata
include/linux/memory_hotplug.h | 17 +--------- include/linux/memremap.h | 6 +++ include/linux/mmzone.h | 22 +++++++++---- mm/memory-failure.c | 26 +++++++++++++-- mm/memory_hotplug.c | 69 ++++++++++++++++++++++++++++++++++++++++ mm/memremap.c | 15 +++++++++ 6 files changed, 128 insertions(+), 27 deletions(-)
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; }
+static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + }
if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; }
On Wed, Jan 13, 2021 at 04:43:32PM -0800, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
I'm OK if we don't have any other better approach, but the proposed changes make code a little messy, and I feel that get_user_pages() might be the right place to fix. Is get_user_pages() expected to return struct page with holding refcount for offline valid pages? I thought that such pages are only used by drivers for dax-devices, but that might be wrong. Can I ask for a little more explanation from this perspective?
Thanks, Naoya Horiguchi
mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; } +static void put_ref_page(struct page *page) +{
- if (page)
put_page(page);
+}
/**
- soft_offline_page - Soft offline a page.
- @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret;
- struct page *page; bool try_again = true;
- struct page *page, *ref_page = NULL;
- WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn)) return -ENXIO;
- if (flags & MF_COUNT_INCREASED)
ref_page = pfn_to_page(pfn);
- /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn);
- if (!page)
- if (!page) {
return -EIO;put_ref_page(ref_page);
- }
if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
if (flags & MF_COUNT_INCREASED)
put_page(page);
return 0; }put_ref_page(ref_page);
On Wed, Jan 13, 2021 at 5:50 PM HORIGUCHI NAOYA(堀口 直也) naoya.horiguchi@nec.com wrote:
On Wed, Jan 13, 2021 at 04:43:32PM -0800, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
I'm OK if we don't have any other better approach, but the proposed changes make code a little messy, and I feel that get_user_pages() might be the right place to fix. Is get_user_pages() expected to return struct page with holding refcount for offline valid pages? I thought that such pages are only used by drivers for dax-devices, but that might be wrong. Can I ask for a little more explanation from this perspective?
The motivation for ZONE_DEVICE is to allow get_user_pages() for "offline" pfns.
soft_offline_page() wants to only operate on "online" pfns.
get_user_pages() on a dax-mapping returns an "offline" ZONE_DEVICE page.
When pfn_to_online_page() fails the get_user_pages() reference needs to be dropped.
To be honest I dislike the entire flags based scheme for communicating the fact that page reference obtained by madvise needs to be dropped later. I'd rather pass a non-NULL 'struct page *' than redo pfn_to_page() conversions in the leaf functions, but that's a much larger change.
On 2021-01-14 07:18, Dan Williams wrote:
To be honest I dislike the entire flags based scheme for communicating the fact that page reference obtained by madvise needs to be dropped later. I'd rather pass a non-NULL 'struct page *' than redo pfn_to_page() conversions in the leaf functions, but that's a much larger change.
We tried to remove that flag in the past but for different reasons. I will have another look to see what can be done.
Thanks
On Wed, Jan 13, 2021 at 10:18:09PM -0800, Dan Williams wrote:
On Wed, Jan 13, 2021 at 5:50 PM HORIGUCHI NAOYA(堀口 直也) naoya.horiguchi@nec.com wrote:
On Wed, Jan 13, 2021 at 04:43:32PM -0800, Dan Williams wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
I'm OK if we don't have any other better approach, but the proposed changes make code a little messy, and I feel that get_user_pages() might be the right place to fix. Is get_user_pages() expected to return struct page with holding refcount for offline valid pages? I thought that such pages are only used by drivers for dax-devices, but that might be wrong. Can I ask for a little more explanation from this perspective?
The motivation for ZONE_DEVICE is to allow get_user_pages() for "offline" pfns.
I missed this point, thank you.
soft_offline_page() wants to only operate on "online" pfns.
Right.
get_user_pages() on a dax-mapping returns an "offline" ZONE_DEVICE page.
OK.
When pfn_to_online_page() fails the get_user_pages() reference needs to be dropped.
The background is clear to me, and I agree with this patch.
Reviewed-by: Naoya Horiguchi naoya.horiguchi@nec.com
To be honest I dislike the entire flags based scheme for communicating the fact that page reference obtained by madvise needs to be dropped later. I'd rather pass a non-NULL 'struct page *' than redo pfn_to_page() conversions in the leaf functions, but that's a much larger change.
As Oscar mentions in another email, removing MF_COUNT_INCREASED was recently discussed and rejected. I think that if pfn-page conversion layer is somehow factored out from soft/hard offline handler, code would be more readable/maintainable.
Thanks, Naoya Horiguchi
On Wed, 13 Jan 2021 16:43:32 -0800 Dan Williams dan.j.williams@intel.com wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
A cc:stable patch in the middle is awkward. I'll make this a standalone patch for merging into mainline soon (for 5.11) and shall turn the rest into a 4-patch series, OK?
On Sun, Jan 17, 2021 at 2:01 PM Andrew Morton akpm@linux-foundation.org wrote:
On Wed, 13 Jan 2021 16:43:32 -0800 Dan Williams dan.j.williams@intel.com wrote:
The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton akpm@linux-foundation.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Michal Hocko mhocko@kernel.org Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com
A cc:stable patch in the middle is awkward. I'll make this a standalone patch for merging into mainline soon (for 5.11) and shall turn the rest into a 4-patch series, OK?
Sounds good to me.
linux-stable-mirror@lists.linaro.org