The patch titled
Subject: mm,hwpoison: fix race with compound page allocation
has been added to the -mm tree. Its filename is
mmhwpoison-fix-race-with-compound-page-allocation.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mmhwpoison-fix-race-with-compound…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mmhwpoison-fix-race-with-compound…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm,hwpoison: fix race with compound page allocation
When hugetlb page fault (under overcommiting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following
race:
CPU0: CPU1:
gather_surplus_pages()
page = alloc_surplus_huge_page()
memory_failure_hugetlb()
get_hwpoison_page(page)
__get_hwpoison_page(page)
get_page_unless_zero(page)
zero = put_page_testzero(page)
VM_BUG_ON_PAGE(!zero, page)
enqueue_huge_page(h, page)
put_page(page)
__get_hwpoison_page() only checks page refcount before taking additional
one for memory error handling, which is wrong because there's time windows
where compound pages have non-zero refcount during initialization.
So make __get_hwpoison_page() check page status a bit more for a few types
of compound pages. PageSlab() check is added because otherwise "non
anonymous thp" path is wrongly chosen.
Link: https://lkml.kernel.org/r/20210511151016.2310627-2-nao.horiguchi@gmail.com
Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: <stable(a)vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 55 +++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 21 deletions(-)
--- a/mm/memory-failure.c~mmhwpoison-fix-race-with-compound-page-allocation
+++ a/mm/memory-failure.c
@@ -960,30 +960,43 @@ static int __get_hwpoison_page(struct pa
{
struct page *head = compound_head(page);
- if (!PageHuge(head) && PageTransHuge(head)) {
- /*
- * Non anonymous thp exists only in allocation/free time. We
- * can't handle such a case correctly, so let's give it up.
- * This should be better than triggering BUG_ON when kernel
- * tries to touch the "partially handled" page.
- */
- if (!PageAnon(head)) {
- pr_err("Memory failure: %#lx: non anonymous thp\n",
- page_to_pfn(page));
- return 0;
- }
- }
+ if (PageCompound(page)) {
+ if (PageSlab(page)) {
+ return get_page_unless_zero(page);
+ } else if (PageHuge(head)) {
+ int ret = 0;
- if (get_page_unless_zero(head)) {
- if (head == compound_head(page))
- return 1;
-
- pr_info("Memory failure: %#lx cannot catch tail\n",
- page_to_pfn(page));
- put_page(head);
+ spin_lock(&hugetlb_lock);
+ if (!PageHuge(head))
+ ret = -EBUSY;
+ else if (HPageFreed(head) || HPageMigratable(head))
+ ret = get_page_unless_zero(head);
+ spin_unlock(&hugetlb_lock);
+ return ret;
+ } else if (PageTransHuge(head)) {
+ /*
+ * Non anonymous thp exists only in allocation/free time. We
+ * can't handle such a case correctly, so let's give it up.
+ * This should be better than triggering BUG_ON when kernel
+ * tries to touch the "partially handled" page.
+ */
+ if (!PageAnon(head)) {
+ pr_err("Memory failure: %#lx: non anonymous thp\n",
+ page_to_pfn(page));
+ return 0;
+ }
+ if (get_page_unless_zero(head)) {
+ if (head == compound_head(page))
+ return 1;
+ pr_info("Memory failure: %#lx cannot catch tail\n",
+ page_to_pfn(page));
+ put_page(head);
+ }
+ }
+ return 0;
}
- return 0;
+ return get_page_unless_zero(page);
}
/*
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
mmhwpoison-fix-race-with-compound-page-allocation.patch
mmhwpoison-make-get_hwpoison_page-call-get_any_page.patch
The patch titled
Subject: ipc/mqueue, msg, sem: Avoid relying on a stack reference past its expiry
has been added to the -mm tree. Its filename is
ipc-mqueue-msg-sem-avoid-relying-on-a-stack-reference-past-its-expiry.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/ipc-mqueue-msg-sem-avoid-relying-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/ipc-mqueue-msg-sem-avoid-relying-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Varad Gautam <varad.gautam(a)suse.com>
Subject: ipc/mqueue, msg, sem: Avoid relying on a stack reference past its expiry
do_mq_timedreceive calls wq_sleep with a stack local address. The sender
(do_mq_timedsend) uses this address to later call pipelined_send.
This leads to a very hard to trigger race where a do_mq_timedreceive call
might return and leave do_mq_timedsend to rely on an invalid address,
causing the following crash:
[ 240.739977] RIP: 0010:wake_q_add_safe+0x13/0x60
[ 240.739991] Call Trace:
[ 240.739999] __x64_sys_mq_timedsend+0x2a9/0x490
[ 240.740003] ? auditd_test_task+0x38/0x40
[ 240.740007] ? auditd_test_task+0x38/0x40
[ 240.740011] do_syscall_64+0x80/0x680
[ 240.740017] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 240.740019] RIP: 0033:0x7f5928e40343
The race occurs as:
1. do_mq_timedreceive calls wq_sleep with the address of `struct
ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
holds a valid `struct ext_wait_queue *` as long as the stack has not
been overwritten.
2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
__pipelined_op.
3. Sender calls __pipelined_op::smp_store_release(&this->state,
STATE_READY). Here is where the race window begins. (`this` is
`ewq_addr`.)
4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
will see `state == STATE_READY` and break.
5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
stack. (Although the address may not get overwritten until another
function happens to touch it, which means it can persist around for an
indefinite time.)
6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
`struct ext_wait_queue *`, and uses it to find a task_struct to pass to
the wake_q_add_safe call. In the lucky case where nothing has
overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
bogus address as the receiver's task_struct causing the crash.
do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add_safe on the receiver's
task_struct returned by get_task_struct, instead of dereferencing `this`
which sits on the receiver's stack.
As Manfred pointed out, the race potentially also exists in
ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix
those in the same way.
Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers")
Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers")
Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers")
Signed-off-by: Varad Gautam <varad.gautam(a)suse.com>
Reported-by: Matthias von Faber <matthias.vonfaber(a)aox-tech.de>
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/mqueue.c | 6 ++++--
ipc/msg.c | 6 ++++--
ipc/sem.c | 6 ++++--
3 files changed, 12 insertions(+), 6 deletions(-)
--- a/ipc/mqueue.c~ipc-mqueue-msg-sem-avoid-relying-on-a-stack-reference-past-its-expiry
+++ a/ipc/mqueue.c
@@ -1004,12 +1004,14 @@ static inline void __pipelined_op(struct
struct mqueue_inode_info *info,
struct ext_wait_queue *this)
{
+ struct task_struct *task;
+
list_del(&this->list);
- get_task_struct(this->task);
+ task = get_task_struct(this->task);
/* see MQ_BARRIER for purpose/pairing */
smp_store_release(&this->state, STATE_READY);
- wake_q_add_safe(wake_q, this->task);
+ wake_q_add_safe(wake_q, task);
}
/* pipelined_send() - send a message directly to the task waiting in
--- a/ipc/msg.c~ipc-mqueue-msg-sem-avoid-relying-on-a-stack-reference-past-its-expiry
+++ a/ipc/msg.c
@@ -251,11 +251,13 @@ static void expunge_all(struct msg_queue
struct msg_receiver *msr, *t;
list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
- get_task_struct(msr->r_tsk);
+ struct task_struct *r_tsk;
+
+ r_tsk = get_task_struct(msr->r_tsk);
/* see MSG_BARRIER for purpose/pairing */
smp_store_release(&msr->r_msg, ERR_PTR(res));
- wake_q_add_safe(wake_q, msr->r_tsk);
+ wake_q_add_safe(wake_q, r_tsk);
}
}
--- a/ipc/sem.c~ipc-mqueue-msg-sem-avoid-relying-on-a-stack-reference-past-its-expiry
+++ a/ipc/sem.c
@@ -784,12 +784,14 @@ would_block:
static inline void wake_up_sem_queue_prepare(struct sem_queue *q, int error,
struct wake_q_head *wake_q)
{
- get_task_struct(q->sleeper);
+ struct task_struct *sleeper;
+
+ sleeper = get_task_struct(q->sleeper);
/* see SEM_BARRIER_2 for purpose/pairing */
smp_store_release(&q->status, error);
- wake_q_add_safe(wake_q, q->sleeper);
+ wake_q_add_safe(wake_q, sleeper);
}
static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
_
Patches currently in -mm which might be from varad.gautam(a)suse.com are
ipc-mqueue-msg-sem-avoid-relying-on-a-stack-reference-past-its-expiry.patch
EVM_ALLOW_METADATA_WRITES is an EVM initialization flag that can be set to
temporarily disable metadata verification until all xattrs/attrs necessary
to verify an EVM portable signature are copied to the file. This flag is
cleared when EVM is initialized with an HMAC key, to avoid that the HMAC is
calculated on unverified xattrs/attrs.
Currently EVM unnecessarily denies setting this flag if EVM is initialized
with a public key, which is not a concern as it cannot be used to trust
xattrs/attrs updates. This patch removes this limitation.
Cc: stable(a)vger.kernel.org # 4.16.x
Fixes: ae1ba1676b88e ("EVM: Allow userland to permit modification of EVM-protected metadata")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
Documentation/ABI/testing/evm | 5 +++--
security/integrity/evm/evm_secfs.c | 5 ++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Documentation/ABI/testing/evm b/Documentation/ABI/testing/evm
index 3c477ba48a31..eb6d70fd6fa2 100644
--- a/Documentation/ABI/testing/evm
+++ b/Documentation/ABI/testing/evm
@@ -49,8 +49,9 @@ Description:
modification of EVM-protected metadata and
disable all further modification of policy
- Note that once a key has been loaded, it will no longer be
- possible to enable metadata modification.
+ Note that once an HMAC key has been loaded, it will no longer
+ be possible to enable metadata modification and, if it is
+ already enabled, it will be disabled.
Until key loading has been signaled EVM can not create
or validate the 'security.evm' xattr, but returns
diff --git a/security/integrity/evm/evm_secfs.c b/security/integrity/evm/evm_secfs.c
index bbc85637e18b..860c48b9a0c3 100644
--- a/security/integrity/evm/evm_secfs.c
+++ b/security/integrity/evm/evm_secfs.c
@@ -81,11 +81,10 @@ static ssize_t evm_write_key(struct file *file, const char __user *buf,
return -EINVAL;
/* Don't allow a request to freshly enable metadata writes if
- * keys are loaded.
+ * an HMAC key is loaded.
*/
if ((i & EVM_ALLOW_METADATA_WRITES) &&
- ((evm_initialized & EVM_KEY_MASK) != 0) &&
- !(evm_initialized & EVM_ALLOW_METADATA_WRITES))
+ (evm_initialized & EVM_INIT_HMAC) != 0)
return -EPERM;
if (i & EVM_INIT_HMAC) {
--
2.25.1
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: e09784a8a751e539dffc94d43bc917b0ac1e934a
Gitweb: https://git.kernel.org/tip/e09784a8a751e539dffc94d43bc917b0ac1e934a
Author: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
AuthorDate: Tue, 11 May 2021 03:45:16 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 11 May 2021 21:28:04 +02:00
alarmtimer: Check RTC features instead of ops
RTC drivers used to leave .set_alarm() NULL in order to signal the RTC
device doesn't support alarms. The drivers are now clearing the
RTC_FEATURE_ALARM bit for that purpose in order to keep the rtc_class_ops
structure const. So now, .set_alarm() is set unconditionally and this
possibly causes the alarmtimer code to select an RTC device that doesn't
support alarms.
Test RTC_FEATURE_ALARM instead of relying on ops->set_alarm to determine
whether alarms are available.
Fixes: 7ae41220ef58 ("rtc: introduce features bitfield")
Signed-off-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210511014516.563031-1-alexandre.belloni@bootlin…
---
kernel/time/alarmtimer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index bea9d08..5897828 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -92,7 +92,7 @@ static int alarmtimer_rtc_add_device(struct device *dev,
if (rtcdev)
return -EBUSY;
- if (!rtc->ops->set_alarm)
+ if (!test_bit(RTC_FEATURE_ALARM, rtc->features))
return -1;
if (!device_may_wakeup(rtc->dev.parent))
return -1;