The patch titled
Subject: mm, swap: fix THP swap out
has been removed from the -mm tree. Its filename was
mm-swap-fix-thp-swap-out.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Huang Ying <ying.huang(a)intel.com>
Subject: mm, swap: fix THP swap out
0-Day test system reported some OOM regressions for several THP
(Transparent Huge Page) swap test cases. These regressions are bisected
to 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"). In the
commit, BIO_MAX_PAGES is set to 256 even when THP swap is enabled. So the
bio_alloc(gfp_flags, 512) in get_swap_bio() may fail when swapping out
THP. That causes the OOM.
As in the patch description of 6861428921b5 ("block: always define
BIO_MAX_PAGES as 256"), THP swap should use multi-page bvec to write THP
to swap space. So the issue is fixed via doing that in get_swap_bio().
BTW: I remember I have checked the THP swap code when 6861428921b5
("block: always define BIO_MAX_PAGES as 256") was merged, and thought the
THP swap code needn't to be changed. But apparently, I was wrong. I
should have done this at that time.
Link: http://lkml.kernel.org/r/20190624075515.31040-1-ying.huang@intel.com
Fixes: 6861428921b5 ("block: always define BIO_MAX_PAGES as 256")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_io.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
--- a/mm/page_io.c~mm-swap-fix-thp-swap-out
+++ a/mm/page_io.c
@@ -29,10 +29,9 @@
static struct bio *get_swap_bio(gfp_t gfp_flags,
struct page *page, bio_end_io_t end_io)
{
- int i, nr = hpage_nr_pages(page);
struct bio *bio;
- bio = bio_alloc(gfp_flags, nr);
+ bio = bio_alloc(gfp_flags, 1);
if (bio) {
struct block_device *bdev;
@@ -41,9 +40,7 @@ static struct bio *get_swap_bio(gfp_t gf
bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
bio->bi_end_io = end_io;
- for (i = 0; i < nr; i++)
- bio_add_page(bio, page + i, PAGE_SIZE, 0);
- VM_BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE * nr);
+ bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0);
}
return bio;
}
_
Patches currently in -mm which might be from ying.huang(a)intel.com are
mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch
mm-swap-simplify-total_swapcache_pages-with-get_swap_device.patch
mm-swap-simplify-total_swapcache_pages-with-get_swap_device-fix.patch
mm-fix-race-between-swapoff-and-mincore.patch
The patch titled
Subject: mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
has been removed from the -mm tree. Its filename was
mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
- add PageHuge check in dissolve_free_huge_page() outside hugetlb_lock
- update comment on dissolve_free_huge_page() about return value
Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp…
Reported-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Tested-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
+++ a/mm/hugetlb.c
@@ -1510,14 +1510,22 @@ static int free_pool_huge_page(struct hs
/*
* Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * dissolution fails because a give page is not a free hugepage, or because
- * free hugepages are fully reserved.
+ * nothing for in-use hugepages and non-hugepages.
+ * This function returns values like below:
+ *
+ * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
+ * (allocated or reserved.)
+ * 0: successfully dissolved free hugepages or the page is not a
+ * hugepage (considered as already dissolved)
*/
int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+ /* Not to disrupt normal path by vainly holding hugetlb_lock */
+ if (!PageHuge(page))
+ return 0;
+
spin_lock(&hugetlb_lock);
if (!PageHuge(page)) {
rc = 0;
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
The patch titled
Subject: mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
has been removed from the -mm tree. Its filename was
mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
for hugepages with overcommitting enabled. That was caused by the
suboptimal code in current soft-offline code. See the following part:
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
...
} else {
/*
* We set PG_hwpoison only when the migration source hugepage
* was successfully dissolved, because otherwise hwpoisoned
* hugepage remains on free hugepage list, then userspace will
* find it as SIGBUS by allocation failure. That's not expected
* in soft-offlining.
*/
ret = dissolve_free_huge_page(page);
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
}
}
return ret;
Here dissolve_free_huge_page() returns -EBUSY if the migration source page
was freed into buddy in migrate_pages(), but even in that case we actually
has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
current code gives up offlining too early now.
dissolve_free_huge_page() checks that a given hugepage is suitable for
dissolving, where we should return success for !PageHuge() case because
the given hugepage is considered as already dissolved.
This change also affects other callers of dissolve_free_huge_page(), which
are cleaned up together.
Link: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.j…
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Reported-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Tested-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 15 +++++++++------
mm/memory-failure.c | 5 +----
2 files changed, 10 insertions(+), 10 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge
+++ a/mm/hugetlb.c
@@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page
int rc = -EBUSY;
spin_lock(&hugetlb_lock);
- if (PageHuge(page) && !page_count(page)) {
+ if (!PageHuge(page)) {
+ rc = 0;
+ goto out;
+ }
+
+ if (!page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
@@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned lo
for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
page = pfn_to_page(pfn);
- if (PageHuge(page) && !page_count(page)) {
- rc = dissolve_free_huge_page(page);
- if (rc)
- break;
- }
+ rc = dissolve_free_huge_page(page);
+ if (rc)
+ break;
}
return rc;
--- a/mm/memory-failure.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge
+++ a/mm/memory-failure.c
@@ -1856,11 +1856,8 @@ static int soft_offline_in_use_page(stru
static int soft_offline_free_page(struct page *page)
{
- int rc = 0;
- struct page *head = compound_head(page);
+ int rc = dissolve_free_huge_page(page);
- if (PageHuge(head))
- rc = dissolve_free_huge_page(page);
if (!rc) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
The patch titled
Subject: mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
has been removed from the -mm tree. Its filename was
mm-soft-offline-return-ebusy-if-set_hwpoison_free_buddy_page-fails.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
The pass/fail of soft offline should be judged by checking whether the
raw error page was finally contained or not (i.e. the result of
set_hwpoison_free_buddy_page()), but current code do not work like
that. It might lead us to misjudge the test result when
set_hwpoison_free_buddy_page() fails.
Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
not offline the original page and will not return an error.
Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.j…
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/memory-failure.c~mm-soft-offline-return-ebusy-if-set_hwpoison_free_buddy_page-fails
+++ a/mm/memory-failure.c
@@ -1730,6 +1730,8 @@ static int soft_offline_huge_page(struct
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
+ else
+ ret = -EBUSY;
}
}
return ret;
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
The patch titled
Subject: signal: remove the wrong signal_pending() check in restore_user_sigmask()
has been removed from the -mm tree. Its filename was
signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: signal: remove the wrong signal_pending() check in restore_user_sigmask()
This is the minimal fix for stable, I'll send cleanups later.
854a6ed56839a40f6b5 ("signal: Add restore_user_sigmask()") introduced the
visible change which breaks user-space: a signal temporary unblocked by
set_user_sigmask() can be delivered even if the caller returns success or
timeout.
Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.
Eric said:
: For clarity. I don't think this is required by posix, or fundamentally to
: remove the races in select. It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented. The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?
Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Eric Wong <e(a)80x24.org>
Tested-by: Eric Wong <e(a)80x24.org>
Acked-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Jason Baron <jbaron(a)akamai.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/aio.c | 28 ++++++++++++++++++++--------
fs/eventpoll.c | 4 ++--
fs/io_uring.c | 7 ++++---
fs/select.c | 18 ++++++------------
include/linux/signal.h | 2 +-
kernel/signal.c | 5 +++--
6 files changed, 36 insertions(+), 28 deletions(-)
--- a/fs/aio.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/fs/aio.c
@@ -2095,6 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
+ bool interrupted;
int ret;
if (timeout && unlikely(get_timespec64(&ts, timeout)))
@@ -2108,8 +2109,10 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+
+ interrupted = signal_pending(current);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted);
+ if (interrupted && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2128,6 +2131,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 ts;
+ bool interrupted;
int ret;
if (timeout && unlikely(get_old_timespec32(&ts, timeout)))
@@ -2142,8 +2146,10 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+
+ interrupted = signal_pending(current);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted);
+ if (interrupted && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2193,6 +2199,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
+ bool interrupted;
int ret;
if (timeout && get_old_timespec32(&t, timeout))
@@ -2206,8 +2213,10 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+
+ interrupted = signal_pending(current);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted);
+ if (interrupted && !ret)
ret = -ERESTARTNOHAND;
return ret;
@@ -2226,6 +2235,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_tim
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
+ bool interrupted;
int ret;
if (timeout && get_timespec64(&t, timeout))
@@ -2239,8 +2249,10 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_tim
return ret;
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
- restore_user_sigmask(ksig.sigmask, &sigsaved);
- if (signal_pending(current) && !ret)
+
+ interrupted = signal_pending(current);
+ restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted);
+ if (interrupted && !ret)
ret = -ERESTARTNOHAND;
return ret;
--- a/fs/eventpoll.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/fs/eventpoll.c
@@ -2325,7 +2325,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd,
error = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, error == -EINTR);
return error;
}
@@ -2350,7 +2350,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int,
err = do_epoll_wait(epfd, events, maxevents, timeout);
- restore_user_sigmask(sigmask, &sigsaved);
+ restore_user_sigmask(sigmask, &sigsaved, err == -EINTR);
return err;
}
--- a/fs/io_uring.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/fs/io_uring.c
@@ -2201,11 +2201,12 @@ static int io_cqring_wait(struct io_ring
}
ret = wait_event_interruptible(ctx->wait, io_cqring_events(ring) >= min_events);
- if (ret == -ERESTARTSYS)
- ret = -EINTR;
if (sig)
- restore_user_sigmask(sig, &sigsaved);
+ restore_user_sigmask(sig, &sigsaved, ret == -ERESTARTSYS);
+
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0;
}
--- a/fs/select.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/fs/select.c
@@ -758,10 +758,9 @@ static long do_pselect(int n, fd_set __u
return ret;
ret = core_sys_select(n, inp, outp, exp, to);
+ restore_user_sigmask(sigmask, &sigsaved, ret == -ERESTARTNOHAND);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
-
return ret;
}
@@ -1106,8 +1105,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __u
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
-
+ restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR);
/* We can restart this syscall, usually */
if (ret == -EINTR)
ret = -ERESTARTNOHAND;
@@ -1142,8 +1140,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pol
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
-
+ restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR);
/* We can restart this syscall, usually */
if (ret == -EINTR)
ret = -ERESTARTNOHAND;
@@ -1350,10 +1347,9 @@ static long do_compat_pselect(int n, com
return ret;
ret = compat_core_sys_select(n, inp, outp, exp, to);
+ restore_user_sigmask(sigmask, &sigsaved, ret == -ERESTARTNOHAND);
ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
- restore_user_sigmask(sigmask, &sigsaved);
-
return ret;
}
@@ -1425,8 +1421,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, str
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
-
+ restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR);
/* We can restart this syscall, usually */
if (ret == -EINTR)
ret = -ERESTARTNOHAND;
@@ -1461,8 +1456,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, str
ret = do_sys_poll(ufds, nfds, to);
- restore_user_sigmask(sigmask, &sigsaved);
-
+ restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR);
/* We can restart this syscall, usually */
if (ret == -EINTR)
ret = -ERESTARTNOHAND;
--- a/include/linux/signal.h~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/include/linux/signal.h
@@ -276,7 +276,7 @@ extern int sigprocmask(int, sigset_t *,
extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
sigset_t *oldset, size_t sigsetsize);
extern void restore_user_sigmask(const void __user *usigmask,
- sigset_t *sigsaved);
+ sigset_t *sigsaved, bool interrupted);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
extern int show_unhandled_signals;
--- a/kernel/signal.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask
+++ a/kernel/signal.c
@@ -2912,7 +2912,8 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
* This is useful for syscalls such as ppoll, pselect, io_pgetevents and
* epoll_pwait where a new sigmask is passed in from userland for the syscalls.
*/
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved,
+ bool interrupted)
{
if (!usigmask)
@@ -2922,7 +2923,7 @@ void restore_user_sigmask(const void __u
* Restoring sigmask here can lead to delivering signals that the above
* syscalls are intended to block because of the sigmask passed in.
*/
- if (signal_pending(current)) {
+ if (interrupted) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
return;
_
Patches currently in -mm which might be from oleg(a)redhat.com are
signal-simplify-set_user_sigmask-restore_user_sigmask.patch
select-change-do_poll-to-return-erestartnohand-rather-than-eintr.patch
select-shift-restore_saved_sigmask_unless-into-poll_select_copy_remaining.patch
aio-simplify-read_events.patch
The patch titled
Subject: fs/binfmt_flat.c: make load_flat_shared_library() work
has been removed from the -mm tree. Its filename was
binfmt_flat-make-load_flat_shared_library-work.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: fs/binfmt_flat.c: make load_flat_shared_library() work
load_flat_shared_library() is broken: It only calls load_flat_file() if
prepare_binprm() returns zero, but prepare_binprm() returns the number of
bytes read - so this only happens if the file is empty.
Instead, call into load_flat_file() if the number of bytes read is
non-negative. (Even if the number of bytes is zero - in that case,
load_flat_file() will see nullbytes and return a nice -ENOEXEC.)
In addition, remove the code related to bprm creds and stop using
prepare_binprm() - this code is loading a library, not a main executable,
and it only actually uses the members "buf", "file" and "filename" of the
linux_binprm struct. Instead, call kernel_read() directly.
Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com
Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Greg Ungerer <gerg(a)linux-m68k.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_flat.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)
--- a/fs/binfmt_flat.c~binfmt_flat-make-load_flat_shared_library-work
+++ a/fs/binfmt_flat.c
@@ -856,9 +856,14 @@ err:
static int load_flat_shared_library(int id, struct lib_info *libs)
{
+ /*
+ * This is a fake bprm struct; only the members "buf", "file" and
+ * "filename" are actually used.
+ */
struct linux_binprm bprm;
int res;
char buf[16];
+ loff_t pos = 0;
memset(&bprm, 0, sizeof(bprm));
@@ -872,25 +877,11 @@ static int load_flat_shared_library(int
if (IS_ERR(bprm.file))
return res;
- bprm.cred = prepare_exec_creds();
- res = -ENOMEM;
- if (!bprm.cred)
- goto out;
-
- /* We don't really care about recalculating credentials at this point
- * as we're past the point of no return and are dealing with shared
- * libraries.
- */
- bprm.called_set_creds = 1;
-
- res = prepare_binprm(&bprm);
+ res = kernel_read(bprm.file, bprm.buf, BINPRM_BUF_SIZE, &pos);
- if (!res)
+ if (res >= 0)
res = load_flat_file(&bprm, libs, id, NULL);
- abort_creds(bprm.cred);
-
-out:
allow_write_access(bprm.file);
fput(bprm.file);
_
Patches currently in -mm which might be from jannh(a)google.com are
The patch titled
Subject: mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
has been removed from the -mm tree. Its filename was
mm-mempolicy-fix-an-incorrect-rebind-node-in-mpol_rebind_nodemask.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: zhong jiang <zhongjiang(a)huawei.com>
Subject: mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE
mempoclicies when the tasks's cpuset's mems_allowed changes. For
policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES,
it works by remapping the policy's allowed nodes (stored in v.nodes)
using the previous value of mems_allowed (stored in
w.cpuset_mems_allowed) as the domain of map and the new mems_allowed
(passed as nodes) as the range of the map (see the comment of
bitmap_remap() for details).
The result of remapping is stored back as policy's nodemask in v.nodes,
and the new value of mems_allowed should be stored in
w.cpuset_mems_allowed to facilitate the next rebind, if it happens.
However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies
when updating cpusets") introduced a bug where the result of remapping
is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's
allowed nodes can evolve in an unexpected way after a series of
rebinding due to cpuset mems_allowed changes, possibly binding to a
wrong node or a smaller number of nodes which may e.g. overload them.
This patch fixes the bug so rebinding again works as intended.
[vbabka(a)suse.cz: new changlog]
Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz
Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawe…
Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
Signed-off-by: zhong jiang <zhongjiang(a)huawei.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-an-incorrect-rebind-node-in-mpol_rebind_nodemask
+++ a/mm/mempolicy.c
@@ -306,7 +306,7 @@ static void mpol_rebind_nodemask(struct
else {
nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
*nodes);
- pol->w.cpuset_mems_allowed = tmp;
+ pol->w.cpuset_mems_allowed = *nodes;
}
if (nodes_empty(tmp))
_
Patches currently in -mm which might be from zhongjiang(a)huawei.com are
The patch titled
Subject: fs/proc/array.c: allow reporting eip/esp for all coredumping threads
has been removed from the -mm tree. Its filename was
fs-proc-allow-reporting-eip-esp-for-all-coredumping-threads.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: John Ogness <john.ogness(a)linutronix.de>
Subject: fs/proc/array.c: allow reporting eip/esp for all coredumping threads
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat")
stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in
/prod/PID/stat for coredumping") reintroduced the feature to fix a
regression with userspace core dump handlers (such as minicoredumper).
Because PF_DUMPCORE is only set for the primary thread, this didn't fix
the original problem for secondary threads. Allow reporting the eip/esp
for all threads by checking for PF_EXITING as well. This is set for all
the other threads when they are killed. coredump_wait() waits for all the
tasks to become inactive before proceeding to invoke a core dumper.
Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de
Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de
Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
Signed-off-by: John Ogness <john.ogness(a)linutronix.de>
Reported-by: Jan Luebbe <jlu(a)pengutronix.de>
Tested-by: Jan Luebbe <jlu(a)pengutronix.de>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/array.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/array.c~fs-proc-allow-reporting-eip-esp-for-all-coredumping-threads
+++ a/fs/proc/array.c
@@ -462,7 +462,7 @@ static int do_task_stat(struct seq_file
* a program is not able to use ptrace(2) in that case. It is
* safe because the task has stopped executing permanently.
*/
- if (permitted && (task->flags & PF_DUMPCORE)) {
+ if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) {
if (try_get_task_stack(task)) {
eip = KSTK_EIP(task);
esp = KSTK_ESP(task);
_
Patches currently in -mm which might be from john.ogness(a)linutronix.de are