The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 84ff7a09c371bc7417eabfda19bf7f113ec917b6 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon(a)arm.com>
Date: Mon, 8 Apr 2019 12:45:09 +0100
Subject: [PATCH] arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero
result value
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
Cc: <stable(a)kernel.org>
Fixes: 6170a97460db ("arm64: Atomic operations")
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index cccb83ad7fa8..e1d95f08f8e1 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -30,8 +30,8 @@ do { \
" prfm pstl1strm, %2\n" \
"1: ldxr %w1, %2\n" \
insn "\n" \
-"2: stlxr %w3, %w0, %2\n" \
-" cbnz %w3, 1b\n" \
+"2: stlxr %w0, %w3, %2\n" \
+" cbnz %w0, 1b\n" \
" dmb ish\n" \
"3:\n" \
" .pushsection .fixup,\"ax\"\n" \
@@ -50,30 +50,30 @@ do { \
static inline int
arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
{
- int oldval = 0, ret, tmp;
+ int oldval, ret, tmp;
u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
pagefault_disable();
switch (op) {
case FUTEX_OP_SET:
- __futex_atomic_op("mov %w0, %w4",
+ __futex_atomic_op("mov %w3, %w4",
ret, oldval, uaddr, tmp, oparg);
break;
case FUTEX_OP_ADD:
- __futex_atomic_op("add %w0, %w1, %w4",
+ __futex_atomic_op("add %w3, %w1, %w4",
ret, oldval, uaddr, tmp, oparg);
break;
case FUTEX_OP_OR:
- __futex_atomic_op("orr %w0, %w1, %w4",
+ __futex_atomic_op("orr %w3, %w1, %w4",
ret, oldval, uaddr, tmp, oparg);
break;
case FUTEX_OP_ANDN:
- __futex_atomic_op("and %w0, %w1, %w4",
+ __futex_atomic_op("and %w3, %w1, %w4",
ret, oldval, uaddr, tmp, ~oparg);
break;
case FUTEX_OP_XOR:
- __futex_atomic_op("eor %w0, %w1, %w4",
+ __futex_atomic_op("eor %w3, %w1, %w4",
ret, oldval, uaddr, tmp, oparg);
break;
default:
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6b4f4bc9cb22875f97023984a625386f0c7cc1c0 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon(a)arm.com>
Date: Thu, 28 Feb 2019 11:58:08 +0000
Subject: [PATCH] locking/futex: Allow low-level atomic operations to return
-EAGAIN
Some futex() operations, including FUTEX_WAKE_OP, require the kernel to
perform an atomic read-modify-write of the futex word via the userspace
mapping. These operations are implemented by each architecture in
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
are called in atomic context with the relevant hash bucket locks held.
Although these routines may return -EFAULT in response to a page fault
generated when accessing userspace, they are expected to succeed (i.e.
return 0) in all other cases. This poses a problem for architectures
that do not provide bounded forward progress guarantees or fairness of
contended atomic operations and can lead to starvation in some cases.
In these problematic scenarios, we must return back to the core futex
code so that we can drop the hash bucket locks and reschedule if
necessary, much like we do in the case of a page fault.
Allow architectures to return -EAGAIN from their implementations of
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
will cause the core futex code to reschedule if necessary and return
back to the architecture code later on.
Cc: <stable(a)kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
diff --git a/kernel/futex.c b/kernel/futex.c
index 9e40cf7be606..6262f1534ac9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1311,13 +1311,15 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
{
+ int err;
u32 uninitialized_var(curval);
if (unlikely(should_fail_futex(true)))
return -EFAULT;
- if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
- return -EFAULT;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (unlikely(err))
+ return err;
/* If user space value changed, let the caller retry */
return curval != uval ? -EAGAIN : 0;
@@ -1502,10 +1504,8 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
if (unlikely(should_fail_futex(true)))
ret = -EFAULT;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
- ret = -EFAULT;
-
- } else if (curval != uval) {
+ ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (!ret && (curval != uval)) {
/*
* If a unconditional UNLOCK_PI operation (user space did not
* try the TID->0 transition) raced with a waiter setting the
@@ -1700,32 +1700,32 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
double_lock_hb(hb1, hb2);
op_ret = futex_atomic_op_inuser(op, uaddr2);
if (unlikely(op_ret < 0)) {
-
double_unlock_hb(hb1, hb2);
-#ifndef CONFIG_MMU
- /*
- * we don't get EFAULT from MMU faults if we don't have an MMU,
- * but we might get them from range checking
- */
- ret = op_ret;
- goto out_put_keys;
-#endif
-
- if (unlikely(op_ret != -EFAULT)) {
+ if (!IS_ENABLED(CONFIG_MMU) ||
+ unlikely(op_ret != -EFAULT && op_ret != -EAGAIN)) {
+ /*
+ * we don't get EFAULT from MMU faults if we don't have
+ * an MMU, but we might get them from range checking
+ */
ret = op_ret;
goto out_put_keys;
}
- ret = fault_in_user_writeable(uaddr2);
- if (ret)
- goto out_put_keys;
+ if (op_ret == -EFAULT) {
+ ret = fault_in_user_writeable(uaddr2);
+ if (ret)
+ goto out_put_keys;
+ }
- if (!(flags & FLAGS_SHARED))
+ if (!(flags & FLAGS_SHARED)) {
+ cond_resched();
goto retry_private;
+ }
put_futex_key(&key2);
put_futex_key(&key1);
+ cond_resched();
goto retry;
}
@@ -2350,7 +2350,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
u32 uval, uninitialized_var(curval), newval;
struct task_struct *oldowner, *newowner;
u32 newtid;
- int ret;
+ int ret, err = 0;
lockdep_assert_held(q->lock_ptr);
@@ -2421,14 +2421,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
if (!pi_state->owner)
newtid |= FUTEX_OWNER_DIED;
- if (get_futex_value_locked(&uval, uaddr))
- goto handle_fault;
+ err = get_futex_value_locked(&uval, uaddr);
+ if (err)
+ goto handle_err;
for (;;) {
newval = (uval & FUTEX_OWNER_DIED) | newtid;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
- goto handle_fault;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (err)
+ goto handle_err;
+
if (curval == uval)
break;
uval = curval;
@@ -2456,23 +2459,37 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
return 0;
/*
- * To handle the page fault we need to drop the locks here. That gives
- * the other task (either the highest priority waiter itself or the
- * task which stole the rtmutex) the chance to try the fixup of the
- * pi_state. So once we are back from handling the fault we need to
- * check the pi_state after reacquiring the locks and before trying to
- * do another fixup. When the fixup has been done already we simply
- * return.
+ * In order to reschedule or handle a page fault, we need to drop the
+ * locks here. In the case of a fault, this gives the other task
+ * (either the highest priority waiter itself or the task which stole
+ * the rtmutex) the chance to try the fixup of the pi_state. So once we
+ * are back from handling the fault we need to check the pi_state after
+ * reacquiring the locks and before trying to do another fixup. When
+ * the fixup has been done already we simply return.
*
* Note: we hold both hb->lock and pi_mutex->wait_lock. We can safely
* drop hb->lock since the caller owns the hb -> futex_q relation.
* Dropping the pi_mutex->wait_lock requires the state revalidate.
*/
-handle_fault:
+handle_err:
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(q->lock_ptr);
- ret = fault_in_user_writeable(uaddr);
+ switch (err) {
+ case -EFAULT:
+ ret = fault_in_user_writeable(uaddr);
+ break;
+
+ case -EAGAIN:
+ cond_resched();
+ ret = 0;
+ break;
+
+ default:
+ WARN_ON_ONCE(1);
+ ret = err;
+ break;
+ }
spin_lock(q->lock_ptr);
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
@@ -3041,10 +3058,8 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* A unconditional UNLOCK_PI op raced against a waiter
* setting the FUTEX_WAITERS bit. Try again.
*/
- if (ret == -EAGAIN) {
- put_futex_key(&key);
- goto retry;
- }
+ if (ret == -EAGAIN)
+ goto pi_retry;
/*
* wake_futex_pi has detected invalid state. Tell user
* space.
@@ -3059,9 +3074,19 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* preserve the WAITERS bit not the OWNER_DIED one. We are the
* owner.
*/
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) {
+ if ((ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))) {
spin_unlock(&hb->lock);
- goto pi_faulted;
+ switch (ret) {
+ case -EFAULT:
+ goto pi_faulted;
+
+ case -EAGAIN:
+ goto pi_retry;
+
+ default:
+ WARN_ON_ONCE(1);
+ goto out_putkey;
+ }
}
/*
@@ -3075,6 +3100,11 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
put_futex_key(&key);
return ret;
+pi_retry:
+ put_futex_key(&key);
+ cond_resched();
+ goto retry;
+
pi_faulted:
put_futex_key(&key);
@@ -3435,6 +3465,7 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
{
u32 uval, uninitialized_var(nval), mval;
+ int err;
/* Futex address must be 32bit aligned */
if ((((unsigned long)uaddr) % sizeof(*uaddr)) != 0)
@@ -3444,42 +3475,57 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p
if (get_user(uval, uaddr))
return -1;
- if ((uval & FUTEX_TID_MASK) == task_pid_vnr(curr)) {
- /*
- * Ok, this dying thread is truly holding a futex
- * of interest. Set the OWNER_DIED bit atomically
- * via cmpxchg, and if the value had FUTEX_WAITERS
- * set, wake up a waiter (if any). (We have to do a
- * futex_wake() even if OWNER_DIED is already set -
- * to handle the rare but possible case of recursive
- * thread-death.) The rest of the cleanup is done in
- * userspace.
- */
- mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
- /*
- * We are not holding a lock here, but we want to have
- * the pagefault_disable/enable() protection because
- * we want to handle the fault gracefully. If the
- * access fails we try to fault in the futex with R/W
- * verification via get_user_pages. get_user() above
- * does not guarantee R/W access. If that fails we
- * give up and leave the futex locked.
- */
- if (cmpxchg_futex_value_locked(&nval, uaddr, uval, mval)) {
+ if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
+ return 0;
+
+ /*
+ * Ok, this dying thread is truly holding a futex
+ * of interest. Set the OWNER_DIED bit atomically
+ * via cmpxchg, and if the value had FUTEX_WAITERS
+ * set, wake up a waiter (if any). (We have to do a
+ * futex_wake() even if OWNER_DIED is already set -
+ * to handle the rare but possible case of recursive
+ * thread-death.) The rest of the cleanup is done in
+ * userspace.
+ */
+ mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
+
+ /*
+ * We are not holding a lock here, but we want to have
+ * the pagefault_disable/enable() protection because
+ * we want to handle the fault gracefully. If the
+ * access fails we try to fault in the futex with R/W
+ * verification via get_user_pages. get_user() above
+ * does not guarantee R/W access. If that fails we
+ * give up and leave the futex locked.
+ */
+ if ((err = cmpxchg_futex_value_locked(&nval, uaddr, uval, mval))) {
+ switch (err) {
+ case -EFAULT:
if (fault_in_user_writeable(uaddr))
return -1;
goto retry;
- }
- if (nval != uval)
+
+ case -EAGAIN:
+ cond_resched();
goto retry;
- /*
- * Wake robust non-PI futexes here. The wakeup of
- * PI futexes happens in exit_pi_state():
- */
- if (!pi && (uval & FUTEX_WAITERS))
- futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+ default:
+ WARN_ON_ONCE(1);
+ return err;
+ }
}
+
+ if (nval != uval)
+ goto retry;
+
+ /*
+ * Wake robust non-PI futexes here. The wakeup of
+ * PI futexes happens in exit_pi_state():
+ */
+ if (!pi && (uval & FUTEX_WAITERS))
+ futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+
return 0;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6b4f4bc9cb22875f97023984a625386f0c7cc1c0 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon(a)arm.com>
Date: Thu, 28 Feb 2019 11:58:08 +0000
Subject: [PATCH] locking/futex: Allow low-level atomic operations to return
-EAGAIN
Some futex() operations, including FUTEX_WAKE_OP, require the kernel to
perform an atomic read-modify-write of the futex word via the userspace
mapping. These operations are implemented by each architecture in
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
are called in atomic context with the relevant hash bucket locks held.
Although these routines may return -EFAULT in response to a page fault
generated when accessing userspace, they are expected to succeed (i.e.
return 0) in all other cases. This poses a problem for architectures
that do not provide bounded forward progress guarantees or fairness of
contended atomic operations and can lead to starvation in some cases.
In these problematic scenarios, we must return back to the core futex
code so that we can drop the hash bucket locks and reschedule if
necessary, much like we do in the case of a page fault.
Allow architectures to return -EAGAIN from their implementations of
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
will cause the core futex code to reschedule if necessary and return
back to the architecture code later on.
Cc: <stable(a)kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
diff --git a/kernel/futex.c b/kernel/futex.c
index 9e40cf7be606..6262f1534ac9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1311,13 +1311,15 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
{
+ int err;
u32 uninitialized_var(curval);
if (unlikely(should_fail_futex(true)))
return -EFAULT;
- if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
- return -EFAULT;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (unlikely(err))
+ return err;
/* If user space value changed, let the caller retry */
return curval != uval ? -EAGAIN : 0;
@@ -1502,10 +1504,8 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
if (unlikely(should_fail_futex(true)))
ret = -EFAULT;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
- ret = -EFAULT;
-
- } else if (curval != uval) {
+ ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (!ret && (curval != uval)) {
/*
* If a unconditional UNLOCK_PI operation (user space did not
* try the TID->0 transition) raced with a waiter setting the
@@ -1700,32 +1700,32 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
double_lock_hb(hb1, hb2);
op_ret = futex_atomic_op_inuser(op, uaddr2);
if (unlikely(op_ret < 0)) {
-
double_unlock_hb(hb1, hb2);
-#ifndef CONFIG_MMU
- /*
- * we don't get EFAULT from MMU faults if we don't have an MMU,
- * but we might get them from range checking
- */
- ret = op_ret;
- goto out_put_keys;
-#endif
-
- if (unlikely(op_ret != -EFAULT)) {
+ if (!IS_ENABLED(CONFIG_MMU) ||
+ unlikely(op_ret != -EFAULT && op_ret != -EAGAIN)) {
+ /*
+ * we don't get EFAULT from MMU faults if we don't have
+ * an MMU, but we might get them from range checking
+ */
ret = op_ret;
goto out_put_keys;
}
- ret = fault_in_user_writeable(uaddr2);
- if (ret)
- goto out_put_keys;
+ if (op_ret == -EFAULT) {
+ ret = fault_in_user_writeable(uaddr2);
+ if (ret)
+ goto out_put_keys;
+ }
- if (!(flags & FLAGS_SHARED))
+ if (!(flags & FLAGS_SHARED)) {
+ cond_resched();
goto retry_private;
+ }
put_futex_key(&key2);
put_futex_key(&key1);
+ cond_resched();
goto retry;
}
@@ -2350,7 +2350,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
u32 uval, uninitialized_var(curval), newval;
struct task_struct *oldowner, *newowner;
u32 newtid;
- int ret;
+ int ret, err = 0;
lockdep_assert_held(q->lock_ptr);
@@ -2421,14 +2421,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
if (!pi_state->owner)
newtid |= FUTEX_OWNER_DIED;
- if (get_futex_value_locked(&uval, uaddr))
- goto handle_fault;
+ err = get_futex_value_locked(&uval, uaddr);
+ if (err)
+ goto handle_err;
for (;;) {
newval = (uval & FUTEX_OWNER_DIED) | newtid;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
- goto handle_fault;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (err)
+ goto handle_err;
+
if (curval == uval)
break;
uval = curval;
@@ -2456,23 +2459,37 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
return 0;
/*
- * To handle the page fault we need to drop the locks here. That gives
- * the other task (either the highest priority waiter itself or the
- * task which stole the rtmutex) the chance to try the fixup of the
- * pi_state. So once we are back from handling the fault we need to
- * check the pi_state after reacquiring the locks and before trying to
- * do another fixup. When the fixup has been done already we simply
- * return.
+ * In order to reschedule or handle a page fault, we need to drop the
+ * locks here. In the case of a fault, this gives the other task
+ * (either the highest priority waiter itself or the task which stole
+ * the rtmutex) the chance to try the fixup of the pi_state. So once we
+ * are back from handling the fault we need to check the pi_state after
+ * reacquiring the locks and before trying to do another fixup. When
+ * the fixup has been done already we simply return.
*
* Note: we hold both hb->lock and pi_mutex->wait_lock. We can safely
* drop hb->lock since the caller owns the hb -> futex_q relation.
* Dropping the pi_mutex->wait_lock requires the state revalidate.
*/
-handle_fault:
+handle_err:
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(q->lock_ptr);
- ret = fault_in_user_writeable(uaddr);
+ switch (err) {
+ case -EFAULT:
+ ret = fault_in_user_writeable(uaddr);
+ break;
+
+ case -EAGAIN:
+ cond_resched();
+ ret = 0;
+ break;
+
+ default:
+ WARN_ON_ONCE(1);
+ ret = err;
+ break;
+ }
spin_lock(q->lock_ptr);
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
@@ -3041,10 +3058,8 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* A unconditional UNLOCK_PI op raced against a waiter
* setting the FUTEX_WAITERS bit. Try again.
*/
- if (ret == -EAGAIN) {
- put_futex_key(&key);
- goto retry;
- }
+ if (ret == -EAGAIN)
+ goto pi_retry;
/*
* wake_futex_pi has detected invalid state. Tell user
* space.
@@ -3059,9 +3074,19 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* preserve the WAITERS bit not the OWNER_DIED one. We are the
* owner.
*/
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) {
+ if ((ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))) {
spin_unlock(&hb->lock);
- goto pi_faulted;
+ switch (ret) {
+ case -EFAULT:
+ goto pi_faulted;
+
+ case -EAGAIN:
+ goto pi_retry;
+
+ default:
+ WARN_ON_ONCE(1);
+ goto out_putkey;
+ }
}
/*
@@ -3075,6 +3100,11 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
put_futex_key(&key);
return ret;
+pi_retry:
+ put_futex_key(&key);
+ cond_resched();
+ goto retry;
+
pi_faulted:
put_futex_key(&key);
@@ -3435,6 +3465,7 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
{
u32 uval, uninitialized_var(nval), mval;
+ int err;
/* Futex address must be 32bit aligned */
if ((((unsigned long)uaddr) % sizeof(*uaddr)) != 0)
@@ -3444,42 +3475,57 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p
if (get_user(uval, uaddr))
return -1;
- if ((uval & FUTEX_TID_MASK) == task_pid_vnr(curr)) {
- /*
- * Ok, this dying thread is truly holding a futex
- * of interest. Set the OWNER_DIED bit atomically
- * via cmpxchg, and if the value had FUTEX_WAITERS
- * set, wake up a waiter (if any). (We have to do a
- * futex_wake() even if OWNER_DIED is already set -
- * to handle the rare but possible case of recursive
- * thread-death.) The rest of the cleanup is done in
- * userspace.
- */
- mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
- /*
- * We are not holding a lock here, but we want to have
- * the pagefault_disable/enable() protection because
- * we want to handle the fault gracefully. If the
- * access fails we try to fault in the futex with R/W
- * verification via get_user_pages. get_user() above
- * does not guarantee R/W access. If that fails we
- * give up and leave the futex locked.
- */
- if (cmpxchg_futex_value_locked(&nval, uaddr, uval, mval)) {
+ if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
+ return 0;
+
+ /*
+ * Ok, this dying thread is truly holding a futex
+ * of interest. Set the OWNER_DIED bit atomically
+ * via cmpxchg, and if the value had FUTEX_WAITERS
+ * set, wake up a waiter (if any). (We have to do a
+ * futex_wake() even if OWNER_DIED is already set -
+ * to handle the rare but possible case of recursive
+ * thread-death.) The rest of the cleanup is done in
+ * userspace.
+ */
+ mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
+
+ /*
+ * We are not holding a lock here, but we want to have
+ * the pagefault_disable/enable() protection because
+ * we want to handle the fault gracefully. If the
+ * access fails we try to fault in the futex with R/W
+ * verification via get_user_pages. get_user() above
+ * does not guarantee R/W access. If that fails we
+ * give up and leave the futex locked.
+ */
+ if ((err = cmpxchg_futex_value_locked(&nval, uaddr, uval, mval))) {
+ switch (err) {
+ case -EFAULT:
if (fault_in_user_writeable(uaddr))
return -1;
goto retry;
- }
- if (nval != uval)
+
+ case -EAGAIN:
+ cond_resched();
goto retry;
- /*
- * Wake robust non-PI futexes here. The wakeup of
- * PI futexes happens in exit_pi_state():
- */
- if (!pi && (uval & FUTEX_WAITERS))
- futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+ default:
+ WARN_ON_ONCE(1);
+ return err;
+ }
}
+
+ if (nval != uval)
+ goto retry;
+
+ /*
+ * Wake robust non-PI futexes here. The wakeup of
+ * PI futexes happens in exit_pi_state():
+ */
+ if (!pi && (uval & FUTEX_WAITERS))
+ futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6b4f4bc9cb22875f97023984a625386f0c7cc1c0 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon(a)arm.com>
Date: Thu, 28 Feb 2019 11:58:08 +0000
Subject: [PATCH] locking/futex: Allow low-level atomic operations to return
-EAGAIN
Some futex() operations, including FUTEX_WAKE_OP, require the kernel to
perform an atomic read-modify-write of the futex word via the userspace
mapping. These operations are implemented by each architecture in
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
are called in atomic context with the relevant hash bucket locks held.
Although these routines may return -EFAULT in response to a page fault
generated when accessing userspace, they are expected to succeed (i.e.
return 0) in all other cases. This poses a problem for architectures
that do not provide bounded forward progress guarantees or fairness of
contended atomic operations and can lead to starvation in some cases.
In these problematic scenarios, we must return back to the core futex
code so that we can drop the hash bucket locks and reschedule if
necessary, much like we do in the case of a page fault.
Allow architectures to return -EAGAIN from their implementations of
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
will cause the core futex code to reschedule if necessary and return
back to the architecture code later on.
Cc: <stable(a)kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
diff --git a/kernel/futex.c b/kernel/futex.c
index 9e40cf7be606..6262f1534ac9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1311,13 +1311,15 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
{
+ int err;
u32 uninitialized_var(curval);
if (unlikely(should_fail_futex(true)))
return -EFAULT;
- if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
- return -EFAULT;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (unlikely(err))
+ return err;
/* If user space value changed, let the caller retry */
return curval != uval ? -EAGAIN : 0;
@@ -1502,10 +1504,8 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
if (unlikely(should_fail_futex(true)))
ret = -EFAULT;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
- ret = -EFAULT;
-
- } else if (curval != uval) {
+ ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (!ret && (curval != uval)) {
/*
* If a unconditional UNLOCK_PI operation (user space did not
* try the TID->0 transition) raced with a waiter setting the
@@ -1700,32 +1700,32 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
double_lock_hb(hb1, hb2);
op_ret = futex_atomic_op_inuser(op, uaddr2);
if (unlikely(op_ret < 0)) {
-
double_unlock_hb(hb1, hb2);
-#ifndef CONFIG_MMU
- /*
- * we don't get EFAULT from MMU faults if we don't have an MMU,
- * but we might get them from range checking
- */
- ret = op_ret;
- goto out_put_keys;
-#endif
-
- if (unlikely(op_ret != -EFAULT)) {
+ if (!IS_ENABLED(CONFIG_MMU) ||
+ unlikely(op_ret != -EFAULT && op_ret != -EAGAIN)) {
+ /*
+ * we don't get EFAULT from MMU faults if we don't have
+ * an MMU, but we might get them from range checking
+ */
ret = op_ret;
goto out_put_keys;
}
- ret = fault_in_user_writeable(uaddr2);
- if (ret)
- goto out_put_keys;
+ if (op_ret == -EFAULT) {
+ ret = fault_in_user_writeable(uaddr2);
+ if (ret)
+ goto out_put_keys;
+ }
- if (!(flags & FLAGS_SHARED))
+ if (!(flags & FLAGS_SHARED)) {
+ cond_resched();
goto retry_private;
+ }
put_futex_key(&key2);
put_futex_key(&key1);
+ cond_resched();
goto retry;
}
@@ -2350,7 +2350,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
u32 uval, uninitialized_var(curval), newval;
struct task_struct *oldowner, *newowner;
u32 newtid;
- int ret;
+ int ret, err = 0;
lockdep_assert_held(q->lock_ptr);
@@ -2421,14 +2421,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
if (!pi_state->owner)
newtid |= FUTEX_OWNER_DIED;
- if (get_futex_value_locked(&uval, uaddr))
- goto handle_fault;
+ err = get_futex_value_locked(&uval, uaddr);
+ if (err)
+ goto handle_err;
for (;;) {
newval = (uval & FUTEX_OWNER_DIED) | newtid;
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
- goto handle_fault;
+ err = cmpxchg_futex_value_locked(&curval, uaddr, uval, newval);
+ if (err)
+ goto handle_err;
+
if (curval == uval)
break;
uval = curval;
@@ -2456,23 +2459,37 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
return 0;
/*
- * To handle the page fault we need to drop the locks here. That gives
- * the other task (either the highest priority waiter itself or the
- * task which stole the rtmutex) the chance to try the fixup of the
- * pi_state. So once we are back from handling the fault we need to
- * check the pi_state after reacquiring the locks and before trying to
- * do another fixup. When the fixup has been done already we simply
- * return.
+ * In order to reschedule or handle a page fault, we need to drop the
+ * locks here. In the case of a fault, this gives the other task
+ * (either the highest priority waiter itself or the task which stole
+ * the rtmutex) the chance to try the fixup of the pi_state. So once we
+ * are back from handling the fault we need to check the pi_state after
+ * reacquiring the locks and before trying to do another fixup. When
+ * the fixup has been done already we simply return.
*
* Note: we hold both hb->lock and pi_mutex->wait_lock. We can safely
* drop hb->lock since the caller owns the hb -> futex_q relation.
* Dropping the pi_mutex->wait_lock requires the state revalidate.
*/
-handle_fault:
+handle_err:
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(q->lock_ptr);
- ret = fault_in_user_writeable(uaddr);
+ switch (err) {
+ case -EFAULT:
+ ret = fault_in_user_writeable(uaddr);
+ break;
+
+ case -EAGAIN:
+ cond_resched();
+ ret = 0;
+ break;
+
+ default:
+ WARN_ON_ONCE(1);
+ ret = err;
+ break;
+ }
spin_lock(q->lock_ptr);
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
@@ -3041,10 +3058,8 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* A unconditional UNLOCK_PI op raced against a waiter
* setting the FUTEX_WAITERS bit. Try again.
*/
- if (ret == -EAGAIN) {
- put_futex_key(&key);
- goto retry;
- }
+ if (ret == -EAGAIN)
+ goto pi_retry;
/*
* wake_futex_pi has detected invalid state. Tell user
* space.
@@ -3059,9 +3074,19 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* preserve the WAITERS bit not the OWNER_DIED one. We are the
* owner.
*/
- if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) {
+ if ((ret = cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))) {
spin_unlock(&hb->lock);
- goto pi_faulted;
+ switch (ret) {
+ case -EFAULT:
+ goto pi_faulted;
+
+ case -EAGAIN:
+ goto pi_retry;
+
+ default:
+ WARN_ON_ONCE(1);
+ goto out_putkey;
+ }
}
/*
@@ -3075,6 +3100,11 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
put_futex_key(&key);
return ret;
+pi_retry:
+ put_futex_key(&key);
+ cond_resched();
+ goto retry;
+
pi_faulted:
put_futex_key(&key);
@@ -3435,6 +3465,7 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
{
u32 uval, uninitialized_var(nval), mval;
+ int err;
/* Futex address must be 32bit aligned */
if ((((unsigned long)uaddr) % sizeof(*uaddr)) != 0)
@@ -3444,42 +3475,57 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p
if (get_user(uval, uaddr))
return -1;
- if ((uval & FUTEX_TID_MASK) == task_pid_vnr(curr)) {
- /*
- * Ok, this dying thread is truly holding a futex
- * of interest. Set the OWNER_DIED bit atomically
- * via cmpxchg, and if the value had FUTEX_WAITERS
- * set, wake up a waiter (if any). (We have to do a
- * futex_wake() even if OWNER_DIED is already set -
- * to handle the rare but possible case of recursive
- * thread-death.) The rest of the cleanup is done in
- * userspace.
- */
- mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
- /*
- * We are not holding a lock here, but we want to have
- * the pagefault_disable/enable() protection because
- * we want to handle the fault gracefully. If the
- * access fails we try to fault in the futex with R/W
- * verification via get_user_pages. get_user() above
- * does not guarantee R/W access. If that fails we
- * give up and leave the futex locked.
- */
- if (cmpxchg_futex_value_locked(&nval, uaddr, uval, mval)) {
+ if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
+ return 0;
+
+ /*
+ * Ok, this dying thread is truly holding a futex
+ * of interest. Set the OWNER_DIED bit atomically
+ * via cmpxchg, and if the value had FUTEX_WAITERS
+ * set, wake up a waiter (if any). (We have to do a
+ * futex_wake() even if OWNER_DIED is already set -
+ * to handle the rare but possible case of recursive
+ * thread-death.) The rest of the cleanup is done in
+ * userspace.
+ */
+ mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
+
+ /*
+ * We are not holding a lock here, but we want to have
+ * the pagefault_disable/enable() protection because
+ * we want to handle the fault gracefully. If the
+ * access fails we try to fault in the futex with R/W
+ * verification via get_user_pages. get_user() above
+ * does not guarantee R/W access. If that fails we
+ * give up and leave the futex locked.
+ */
+ if ((err = cmpxchg_futex_value_locked(&nval, uaddr, uval, mval))) {
+ switch (err) {
+ case -EFAULT:
if (fault_in_user_writeable(uaddr))
return -1;
goto retry;
- }
- if (nval != uval)
+
+ case -EAGAIN:
+ cond_resched();
goto retry;
- /*
- * Wake robust non-PI futexes here. The wakeup of
- * PI futexes happens in exit_pi_state():
- */
- if (!pi && (uval & FUTEX_WAITERS))
- futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+ default:
+ WARN_ON_ONCE(1);
+ return err;
+ }
}
+
+ if (nval != uval)
+ goto retry;
+
+ /*
+ * Wake robust non-PI futexes here. The wakeup of
+ * PI futexes happens in exit_pi_state():
+ */
+ if (!pi && (uval & FUTEX_WAITERS))
+ futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+
return 0;
}
The timer_stats facility should filter and translate PIDs if opened
from a non-initial PID namespace, to avoid leaking information about
the wider system. It should also not show kernel virtual addresses.
Unfortunately it has now been removed upstream (as redundant)
instead of being fixed.
For stable, fix the leak by restricting access to root only. A
similar change was already made for the /proc/timer_list file.
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
---
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -417,7 +417,7 @@ static int __init init_tstats_procfs(voi
{
struct proc_dir_entry *pe;
- pe = proc_create("timer_stats", 0644, NULL, &tstats_fops);
+ pe = proc_create("timer_stats", 0600, NULL, &tstats_fops);
if (!pe)
return -ENOMEM;
return 0;
From: Trac Hoang <trac.hoang(a)broadcom.com>
The iproc host eMMC/SD controller hold time does not meet the
specification in the HS50 mode. This problem can be mitigated
by disabling the HISPD bit; thus forcing the controller output
data to be driven on the falling clock edges rather than the
rising clock edges.
Stable tag (v4.12+) chosen to assist stable kernel maintainers so that
the change does not produce merge conflicts backporting to older kernel
versions. In reality, the timing bug existed since the driver was first
introduced but there is no need for this driver to be supported in kernel
versions that old.
Cc: stable(a)vger.kernel.org # v4.12+
Signed-off-by: Trac Hoang <trac.hoang(a)broadcom.com>
Signed-off-by: Scott Branden <scott.branden(a)broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
---
drivers/mmc/host/sdhci-iproc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sdhci-iproc.c b/drivers/mmc/host/sdhci-iproc.c
index 9d4071c41c94..2feb4ef32035 100644
--- a/drivers/mmc/host/sdhci-iproc.c
+++ b/drivers/mmc/host/sdhci-iproc.c
@@ -220,7 +220,8 @@ static const struct sdhci_iproc_data iproc_cygnus_data = {
static const struct sdhci_pltfm_data sdhci_iproc_pltfm_data = {
.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK |
- SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12,
+ SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 |
+ SDHCI_QUIRK_NO_HISPD_BIT,
.quirks2 = SDHCI_QUIRK2_ACMD23_BROKEN,
.ops = &sdhci_iproc_ops,
};
--
2.17.1