From: Juergen Gross <jgross(a)suse.com>
Subject: mm, page_alloc: fix build_zonerefs_node()
Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
zones with pages managed by the buddy allocator") only zones with free
memory are included in a built zonelist. This is problematic when e.g.
all memory of a zone has been ballooned out when zonelists are being
rebuilt.
The decision whether to rebuild the zonelists when onlining new memory is
done based on populated_zone() returning 0 for the zone the memory will be
added to. The new zone is added to the zonelists only, if it has free
memory pages (managed_zone() returns a non-zero value) after the memory
has been onlined. This implies, that onlining memory will always free the
added pages to the allocator immediately, but this is not true in all
cases: when e.g. running as a Xen guest the onlined new memory will be
added only to the ballooned memory list, it will be freed only when the
guest is being ballooned up afterwards.
Another problem with using managed_zone() for the decision whether a zone
is being added to the zonelists is, that a zone with all memory used will
in fact be removed from all zonelists in case the zonelists happen to be
rebuilt.
Use populated_zone() when building a zonelist as it has been done before
that commit.
There was a report that QubesOS (based on Xen) is hitting this problem.
Xen has switched to use the zone device functionality in kernel 5.9
and QubesOS wants to use memory hotplugging for guests in order to be
able to start a guest with minimal memory and expand it as needed.
This was the report leading to the patch.
Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Reported-by: Marek Marczykowski-G��recki <marmarek(a)invisiblethingslab.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Marczykowski-G��recki <marmarek(a)invisiblethingslab.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-build_zonerefs_node
+++ a/mm/page_alloc.c
@@ -6131,7 +6131,7 @@ static int build_zonerefs_node(pg_data_t
do {
zone_type--;
zone = pgdat->node_zones + zone_type;
- if (managed_zone(zone)) {
+ if (populated_zone(zone)) {
zoneref_set_zone(zone, &zonerefs[nr_zones++]);
check_highest_zone(zone_type);
}
_
v3:
Fix the patch order and fix the missing symbol compile error when compiled
after each patch is applied.
v2:
Rebase on the latest stable-5.15.33.
Adds the following commits to the v1 patchset as they fix issues in the
merged commit.
ca93e44bfb5f btrfs: fallback to blocking mode when doing async dio over multiple extents
fe673d3f5bf1 mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
And this set drops the following patch as it is already in the
stable-5.15.y.
[PATCH 01/17 stable-5.15.y] powerpc/kvm: Fix kvm_use_magic_page
------- original cover letter --------
This set fixes a process hang issue in btrfs and gf2 filesystems. When we
do a direct IO read or write when the buffer given by the user is
memory-mapped to the file range we are going to do IO, we end up ending
in a deadlock. This is triggered by the test case generic/647 from
fstests.
This fix depends on the iov_iter and iomap changes introduced in the
commit c03098d4b9ad ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2") and they
are part of this set for stable-5.15.y.
Please note that patch 2/18 (in v2) (was 3/17 in v1) in the patchset
changes the prototype and renames an exported symbol as below. All its
references are updated as well.
-EXPORT_SYMBOL(iov_iter_fault_in_readable);
+EXPORT_SYMBOL(fault_in_iov_iter_readable);
Andreas Gruenbacher (14):
gup: Turn fault_in_pages_{readable,writeable} into
fault_in_{readable,writeable}
iov_iter: Turn iov_iter_fault_in_readable into
fault_in_iov_iter_readable
iov_iter: Introduce fault_in_iov_iter_writeable
gfs2: Add wrapper for iomap_file_buffered_write
gfs2: Clean up function may_grant
gfs2: Move the inode glock locking to gfs2_file_buffered_write
gfs2: Eliminate ip->i_gh
gfs2: Fix mmap + page fault deadlocks for buffered I/O
iomap: Fix iomap_dio_rw return value for user copies
iomap: Support partial direct I/O on user copy failures
iomap: Add done_before argument to iomap_dio_rw
gup: Introduce FOLL_NOFAULT flag to disable page faults
iov_iter: Introduce nofault flag to disable page faults
gfs2: Fix mmap + page fault deadlocks for direct I/O
Bob Peterson (1):
gfs2: Introduce flag for glock holder auto-demotion
Filipe Manana (2):
btrfs: fix deadlock due to page faults during direct IO reads and
writes
btrfs: fallback to blocking mode when doing async dio over multiple
extents
Linus Torvalds (1):
mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
arch/powerpc/kernel/kvm.c | 3 +-
arch/powerpc/kernel/signal_32.c | 4 +-
arch/powerpc/kernel/signal_64.c | 2 +-
arch/x86/kernel/fpu/signal.c | 7 +-
drivers/gpu/drm/armada/armada_gem.c | 7 +-
fs/btrfs/file.c | 142 ++++++++++--
fs/btrfs/inode.c | 28 +++
fs/btrfs/ioctl.c | 5 +-
fs/erofs/data.c | 2 +-
fs/ext4/file.c | 5 +-
fs/f2fs/file.c | 2 +-
fs/fuse/file.c | 2 +-
fs/gfs2/bmap.c | 60 +----
fs/gfs2/file.c | 252 +++++++++++++++++++--
fs/gfs2/glock.c | 330 +++++++++++++++++++++-------
fs/gfs2/glock.h | 20 ++
fs/gfs2/incore.h | 4 +-
fs/iomap/buffered-io.c | 2 +-
fs/iomap/direct-io.c | 29 ++-
fs/ntfs/file.c | 2 +-
fs/ntfs3/file.c | 2 +-
fs/xfs/xfs_file.c | 6 +-
fs/zonefs/super.c | 4 +-
include/linux/iomap.h | 11 +-
include/linux/mm.h | 3 +-
include/linux/pagemap.h | 58 +----
include/linux/uio.h | 4 +-
lib/iov_iter.c | 98 +++++++--
mm/filemap.c | 4 +-
mm/gup.c | 120 +++++++++-
30 files changed, 920 insertions(+), 298 deletions(-)
--
2.33.1
v2:
Rebase on the latest stable-5.15.33.
Adds the following commits to the v1 patchset as they fix issues in the
merged commit.
ca93e44bfb5f btrfs: fallback to blocking mode when doing async dio over multiple extents
fe673d3f5bf1 mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
And this set drops the following patch as it is already in the
stable-5.15.y.
[PATCH 01/17 stable-5.15.y] powerpc/kvm: Fix kvm_use_magic_page
------- original cover letter --------
This set fixes a process hang issue in btrfs and gf2 filesystems. When we
do a direct IO read or write when the buffer given by the user is
memory-mapped to the file range we are going to do IO, we end up ending
in a deadlock. This is triggered by the test case generic/647 from
fstests.
This fix depends on the iov_iter and iomap changes introduced in the
commit c03098d4b9ad ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2") and they
are part of this set for stable-5.15.y.
Please note that patch 2/18 (in v2) (was 3/17 in v1) in the patchset
changes the prototype and renames an exported symbol as below. All its
references are updated as well.
-EXPORT_SYMBOL(iov_iter_fault_in_readable);
+EXPORT_SYMBOL(fault_in_iov_iter_readable);
Andreas Gruenbacher (14):
gup: Turn fault_in_pages_{readable,writeable} into
fault_in_{readable,writeable}
iov_iter: Turn iov_iter_fault_in_readable into
fault_in_iov_iter_readable
iov_iter: Introduce fault_in_iov_iter_writeable
gfs2: Add wrapper for iomap_file_buffered_write
gfs2: Clean up function may_grant
gfs2: Move the inode glock locking to gfs2_file_buffered_write
gfs2: Eliminate ip->i_gh
gfs2: Fix mmap + page fault deadlocks for buffered I/O
iomap: Fix iomap_dio_rw return value for user copies
iomap: Support partial direct I/O on user copy failures
iomap: Add done_before argument to iomap_dio_rw
gup: Introduce FOLL_NOFAULT flag to disable page faults
iov_iter: Introduce nofault flag to disable page faults
gfs2: Fix mmap + page fault deadlocks for direct I/O
Bob Peterson (1):
gfs2: Introduce flag for glock holder auto-demotion
Filipe Manana (2):
btrfs: fix deadlock due to page faults during direct IO reads and
writes
btrfs: fallback to blocking mode when doing async dio over multiple
extents
Linus Torvalds (1):
mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
arch/powerpc/kernel/kvm.c | 3 +-
arch/powerpc/kernel/signal_32.c | 4 +-
arch/powerpc/kernel/signal_64.c | 2 +-
arch/x86/kernel/fpu/signal.c | 7 +-
drivers/gpu/drm/armada/armada_gem.c | 7 +-
fs/btrfs/file.c | 142 ++++++++++--
fs/btrfs/inode.c | 28 +++
fs/btrfs/ioctl.c | 5 +-
fs/erofs/data.c | 2 +-
fs/ext4/file.c | 5 +-
fs/f2fs/file.c | 2 +-
fs/fuse/file.c | 2 +-
fs/gfs2/bmap.c | 60 +----
fs/gfs2/file.c | 252 +++++++++++++++++++--
fs/gfs2/glock.c | 330 +++++++++++++++++++++-------
fs/gfs2/glock.h | 20 ++
fs/gfs2/incore.h | 4 +-
fs/iomap/buffered-io.c | 2 +-
fs/iomap/direct-io.c | 29 ++-
fs/ntfs/file.c | 2 +-
fs/ntfs3/file.c | 2 +-
fs/xfs/xfs_file.c | 6 +-
fs/zonefs/super.c | 4 +-
include/linux/iomap.h | 11 +-
include/linux/mm.h | 3 +-
include/linux/pagemap.h | 58 +----
include/linux/uio.h | 4 +-
lib/iov_iter.c | 98 +++++++--
mm/filemap.c | 4 +-
mm/gup.c | 120 +++++++++-
30 files changed, 920 insertions(+), 298 deletions(-)
--
2.33.1