From: Tang Junhui <tang.junhui.linux(a)gmail.com>
Stale && dirty keys can be produced in the follow way:
After writeback in write_dirty_finish(), dirty keys k1 will
replace by clean keys k2
==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
==>static int bch_btree_insert_node(struct btree *b,
struct btree_op *op,
struct keylist *insert_keys,
atomic_t *journal_ref,
Then two steps:
A) update k1 to k2 in btree node memory;
bch_btree_insert_keys(b, op, insert_keys, replace_key)
B) Write the bset(contains k2) to cache disk by a 30s delay work
bch_btree_leaf_dirty(b, journal_ref).
But before the 30s delay work write the bset to cache device,
these things happened:
A) GC works, and reclaim the bucket k2 point to;
B) Allocator works, and invalidate the bucket k2 point to,
and increase the gen of the bucket, and place it into free_inc
fifo;
C) Until now, the 30s delay work still does not finish work,
so in the disk, the key still is k1, it is dirty and stale
(its gen is smaller than the gen of the bucket). and then the
machine power off suddenly happens;
D) When the machine power on again, after the btree reconstruction,
the stale dirty key appear.
In bch_extent_bad(), when expensive_debug_checks is off, it would
treat the dirty key as good even it is stale keys, and it would
cause bellow probelms:
A) In read_dirty() it would cause machine crash:
BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
B) It could be worse when reads hits stale dirty keys, it would
read old incorrect data.
This patch tolerate the existence of these stale && dirty keys,
and treat them as bad key in bch_extent_bad().
(Coly Li: fix indent format which was modified by sender's email
client)
Signed-off-by: Tang Junhui <tang.junhui.linux(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/extents.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
index 956004366699..886710043025 100644
--- a/drivers/md/bcache/extents.c
+++ b/drivers/md/bcache/extents.c
@@ -538,6 +538,7 @@ static bool bch_extent_bad(struct btree_keys *bk, const struct bkey *k)
{
struct btree *b = container_of(bk, struct btree, keys);
unsigned int i, stale;
+ char buf[80];
if (!KEY_PTRS(k) ||
bch_extent_invalid(bk, k))
@@ -547,19 +548,19 @@ static bool bch_extent_bad(struct btree_keys *bk, const struct bkey *k)
if (!ptr_available(b->c, k, i))
return true;
- if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
- return false;
-
for (i = 0; i < KEY_PTRS(k); i++) {
stale = ptr_stale(b->c, k, i);
+ if (stale && KEY_DIRTY(k)) {
+ bch_extent_to_text(buf, sizeof(buf), k);
+ pr_info("stale dirty pointer, stale %u, key: %s",
+ stale, buf);
+ }
+
btree_bug_on(stale > BUCKET_GC_GEN_MAX, b,
"key too stale: %i, need_gc %u",
stale, b->c->need_gc);
- btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
- b, "stale dirty pointer");
-
if (stale)
return true;
--
2.16.4
On Tue, 2018-12-25 at 15:45 +0000, ? ? wrote:
> Hi, Greg
>
> I found on Debian testing with kernel 4.18.20 fail boot, kernel panic
> on i915. and reported it to Debian bug 917280 [0], with panic log[1].
>
> after revert:
>
> commit 06e562e7f515292ea7721475950f23554214adde
> Author: Chris Wilson <chris(a)chris-wilson.co.uk>
> Date: Mon Nov 5 09:43:05 2018 +0000
>
> drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
>
> System boots to desktop.
>
> [0]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=917280
> [1]:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=917280;filename=dme…
The 4.18 stable branch is no longer maintained.
I suspect this is the same as <https://bugs.debian.org/914495> and
<https://bugs.freedesktop.org/show_bug.cgi?id=108850>, which is fixed
in 4.19 (currently in unstable).
Ben.
--
Ben Hutchings
It is impossible to make anything foolproof
because fools are so ingenious.
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
again") makes a change to FIFO clearing code which its commit message
suggests was intended to be specific to use with RS485 mode, however:
1) The change made does not just affect __do_stop_tx_rs485(), it also
affects other uses of serial8250_clear_fifos() including paths for
starting up, shutting down or auto-configuring a port regardless of
whether it's an RS485 port or not.
2) It makes the assumption that resetting the FIFOs is a no-op when
FIFOs are disabled, and as such it checks for this case & explicitly
avoids setting the FIFO reset bits when the FIFO enable bit is
clear. A reading of the PC16550D manual would suggest that this is
OK since the FIFO should automatically be reset if it is later
enabled, but we support many 16550-compatible devices and have never
required this auto-reset behaviour for at least the whole git era.
Starting to rely on it now seems risky, offers no benefit, and
indeed breaks at least the Ingenic JZ4780's UARTs which reads
garbage when the RX FIFO is enabled if we don't explicitly reset it.
3) By only resetting the FIFOs if they're enabled, the behaviour of
serial8250_do_startup() during boot now depends on what the value of
FCR is before the 8250 driver is probed. This in itself seems
questionable and leaves us with FCR=0 & no FIFO reset if the UART
was used by 8250_early, otherwise it depends upon what the
bootloader left behind.
4) Although the naming of serial8250_clear_fifos() may be unclear, it
is clear that callers of it expect that it will disable FIFOs. Both
serial8250_do_startup() & serial8250_do_shutdown() contain comments
to that effect, and other callers explicitly re-enable the FIFOs
after calling serial8250_clear_fifos(). The premise of that patch
that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Daniel Jedrychowski <avistel(a)gmail.com>
Cc: Marek Vasut <marex(a)denx.de>
Cc: linux-mips(a)vger.kernel.org
Cc: linux-serial(a)vger.kernel.org
Cc: stable <stable(a)vger.kernel.org> # 4.10+
---
I did suggest an alternative approach which would rename
serial8250_clear_fifos() and split it into 2 variants - one that
disables FIFOs & one that does not, then use the latter in
__do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was
attempting to fix & have heard nothing back with regards to him testing
that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the
broken patch may be backported, given that this is where commit
2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent
transmits to break") that it tried to fix was introduced.
---
drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------
1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..3f779d25ec0c 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -552,30 +552,11 @@ static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
*/
static void serial8250_clear_fifos(struct uart_8250_port *p)
{
- unsigned char fcr;
- unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT;
-
if (p->capabilities & UART_CAP_FIFO) {
- /*
- * Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits.
- * In case ENABLE_FIFO is not set, there is nothing to flush
- * so just return. Furthermore, on certain implementations of
- * the 8250 core, the FCR[7:3] bits may only be changed under
- * specific conditions and changing them if those conditions
- * are not met can have nasty side effects. One such core is
- * the 8250-omap present in TI AM335x.
- */
- fcr = serial_in(p, UART_FCR);
-
- /* FIFO is not enabled, there's nothing to clear. */
- if (!(fcr & UART_FCR_ENABLE_FIFO))
- return;
-
- fcr |= clr_mask;
- serial_out(p, UART_FCR, fcr);
-
- fcr &= ~clr_mask;
- serial_out(p, UART_FCR, fcr);
+ serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO);
+ serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO |
+ UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
+ serial_out(p, UART_FCR, 0);
}
}
@@ -1467,7 +1448,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p)
* Enable previously disabled RX interrupts.
*/
if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
- serial8250_clear_fifos(p);
+ serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI;
serial_port_out(&p->port, UART_IER, p->ier);
--
2.20.0
Omer Tripp's analysis of a Spectre V1 gadget in __close_fd():
"1. __close_fd() is reachable via the close() syscall with a
user-controlled fd.
2. If said bounds check is mispredicted, then a user-controlled
address fdt->fd[fd] is obtained then dereferenced, and the value of
a user-controlled address is loaded into the local variable file.
3. file is then passed as an argument to filp_close, where the cache
lines secret + offsetof(f_op) and secret + offsetof(f_mode) are hot
and vulnerable to a timing channel attack."
Address this by using array_index_nospec() to prevent speculation past
the end of current->fdt.
Reported-by: Omer Tripp <trippo(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Greg Hackmann <ghackmann(a)android.com>
---
v2: include Omer Tripp's analysis in commit message, and update my email
address
fs/file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/file.c b/fs/file.c
index 7ffd6e9d103d..a80cf82be96b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -18,6 +18,7 @@
#include <linux/bitops.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/nospec.h>
unsigned int sysctl_nr_open __read_mostly = 1024*1024;
unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -626,6 +627,7 @@ int __close_fd(struct files_struct *files, unsigned fd)
fdt = files_fdtable(files);
if (fd >= fdt->max_fds)
goto out_unlock;
+ fd = array_index_nospec(fd, fdt->max_fds);
file = fdt->fd[fd];
if (!file)
goto out_unlock;
--
2.19.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da791a667536bf8322042e38ca85d55a78d3c273 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 10 Dec 2018 14:35:14 +0100
Subject: [PATCH] futex: Cure exit race
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <stli(a)linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Darren Hart <dvhart(a)infradead.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
diff --git a/kernel/futex.c b/kernel/futex.c
index f423f9b6577e..5cc8083a4c89 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
return ret;
}
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+ struct task_struct *tsk)
+{
+ u32 uval2;
+
+ /*
+ * If PF_EXITPIDONE is not yet set, then try again.
+ */
+ if (tsk && !(tsk->flags & PF_EXITPIDONE))
+ return -EAGAIN;
+
+ /*
+ * Reread the user space value to handle the following situation:
+ *
+ * CPU0 CPU1
+ *
+ * sys_exit() sys_futex()
+ * do_exit() futex_lock_pi()
+ * futex_lock_pi_atomic()
+ * exit_signals(tsk) No waiters:
+ * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
+ * mm_release(tsk) Set waiter bit
+ * exit_robust_list(tsk) { *uaddr = 0x80000PID;
+ * Set owner died attach_to_pi_owner() {
+ * *uaddr = 0xC0000000; tsk = get_task(PID);
+ * } if (!tsk->flags & PF_EXITING) {
+ * ... attach();
+ * tsk->flags |= PF_EXITPIDONE; } else {
+ * if (!(tsk->flags & PF_EXITPIDONE))
+ * return -EAGAIN;
+ * return -ESRCH; <--- FAIL
+ * }
+ *
+ * Returning ESRCH unconditionally is wrong here because the
+ * user space value has been changed by the exiting task.
+ *
+ * The same logic applies to the case where the exiting task is
+ * already gone.
+ */
+ if (get_futex_value_locked(&uval2, uaddr))
+ return -EFAULT;
+
+ /* If the user space value has changed, try again. */
+ if (uval2 != uval)
+ return -EAGAIN;
+
+ /*
+ * The exiting task did not have a robust list, the robust list was
+ * corrupted or the user space value in *uaddr is simply bogus.
+ * Give up and tell user space.
+ */
+ return -ESRCH;
+}
+
/*
* Lookup the task for the TID provided from user space and attach to
* it after doing proper sanity checks.
*/
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
struct futex_pi_state **ps)
{
pid_t pid = uval & FUTEX_TID_MASK;
@@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
/*
* We are the first waiter - try to look up the real owner and attach
* the new pi_state to it, but bail out when TID = 0 [1]
+ *
+ * The !pid check is paranoid. None of the call sites should end up
+ * with pid == 0, but better safe than sorry. Let the caller retry
*/
if (!pid)
- return -ESRCH;
+ return -EAGAIN;
p = find_get_task_by_vpid(pid);
if (!p)
- return -ESRCH;
+ return handle_exit_race(uaddr, uval, NULL);
if (unlikely(p->flags & PF_KTHREAD)) {
put_task_struct(p);
@@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
* set, we know that the task has finished the
* cleanup:
*/
- int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+ int ret = handle_exit_race(uaddr, uval, p);
raw_spin_unlock_irq(&p->pi_lock);
put_task_struct(p);
@@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
* We are the first waiter - try to look up the owner based on
* @uval and attach to it.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, uval, key, ps);
}
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
* attach to the owner. If that fails, no harm done, we only
* set the FUTEX_WAITERS bit in the user space variable.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, newval, key, ps);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da791a667536bf8322042e38ca85d55a78d3c273 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 10 Dec 2018 14:35:14 +0100
Subject: [PATCH] futex: Cure exit race
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <stli(a)linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Darren Hart <dvhart(a)infradead.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
diff --git a/kernel/futex.c b/kernel/futex.c
index f423f9b6577e..5cc8083a4c89 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
return ret;
}
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+ struct task_struct *tsk)
+{
+ u32 uval2;
+
+ /*
+ * If PF_EXITPIDONE is not yet set, then try again.
+ */
+ if (tsk && !(tsk->flags & PF_EXITPIDONE))
+ return -EAGAIN;
+
+ /*
+ * Reread the user space value to handle the following situation:
+ *
+ * CPU0 CPU1
+ *
+ * sys_exit() sys_futex()
+ * do_exit() futex_lock_pi()
+ * futex_lock_pi_atomic()
+ * exit_signals(tsk) No waiters:
+ * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
+ * mm_release(tsk) Set waiter bit
+ * exit_robust_list(tsk) { *uaddr = 0x80000PID;
+ * Set owner died attach_to_pi_owner() {
+ * *uaddr = 0xC0000000; tsk = get_task(PID);
+ * } if (!tsk->flags & PF_EXITING) {
+ * ... attach();
+ * tsk->flags |= PF_EXITPIDONE; } else {
+ * if (!(tsk->flags & PF_EXITPIDONE))
+ * return -EAGAIN;
+ * return -ESRCH; <--- FAIL
+ * }
+ *
+ * Returning ESRCH unconditionally is wrong here because the
+ * user space value has been changed by the exiting task.
+ *
+ * The same logic applies to the case where the exiting task is
+ * already gone.
+ */
+ if (get_futex_value_locked(&uval2, uaddr))
+ return -EFAULT;
+
+ /* If the user space value has changed, try again. */
+ if (uval2 != uval)
+ return -EAGAIN;
+
+ /*
+ * The exiting task did not have a robust list, the robust list was
+ * corrupted or the user space value in *uaddr is simply bogus.
+ * Give up and tell user space.
+ */
+ return -ESRCH;
+}
+
/*
* Lookup the task for the TID provided from user space and attach to
* it after doing proper sanity checks.
*/
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
struct futex_pi_state **ps)
{
pid_t pid = uval & FUTEX_TID_MASK;
@@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
/*
* We are the first waiter - try to look up the real owner and attach
* the new pi_state to it, but bail out when TID = 0 [1]
+ *
+ * The !pid check is paranoid. None of the call sites should end up
+ * with pid == 0, but better safe than sorry. Let the caller retry
*/
if (!pid)
- return -ESRCH;
+ return -EAGAIN;
p = find_get_task_by_vpid(pid);
if (!p)
- return -ESRCH;
+ return handle_exit_race(uaddr, uval, NULL);
if (unlikely(p->flags & PF_KTHREAD)) {
put_task_struct(p);
@@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
* set, we know that the task has finished the
* cleanup:
*/
- int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+ int ret = handle_exit_race(uaddr, uval, p);
raw_spin_unlock_irq(&p->pi_lock);
put_task_struct(p);
@@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
* We are the first waiter - try to look up the owner based on
* @uval and attach to it.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, uval, key, ps);
}
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
* attach to the owner. If that fails, no harm done, we only
* set the FUTEX_WAITERS bit in the user space variable.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, newval, key, ps);
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e58725d51fa8da9133f3f1c54170aa2e43056b91 Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard(a)nod.at>
Date: Wed, 7 Nov 2018 23:04:43 +0100
Subject: [PATCH] ubifs: Handle re-linking of inodes correctly while recovery
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.
Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */
Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.
As solution for this problem, scan the replay list for a re-link entry
before dropping data.
Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
Cc: stable(a)vger.kernel.org
Cc: Russell Senior <russell(a)personaltelco.net>
Cc: Rafał Miłecki <zajec5(a)gmail.com>
Reported-by: Russell Senior <russell(a)personaltelco.net>
Reported-by: Rafał Miłecki <zajec5(a)gmail.com>
Tested-by: Rafał Miłecki <rafal(a)milecki.pl>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index a08c5b7030ea..0a0e65c07c6d 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -212,6 +212,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
return ubifs_tnc_remove_range(c, &min_key, &max_key);
}
+/**
+ * inode_still_linked - check whether inode in question will be re-linked.
+ * @c: UBIFS file-system description object
+ * @rino: replay entry to test
+ *
+ * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1.
+ * This case needs special care, otherwise all references to the inode will
+ * be removed upon the first replay entry of an inode with link count 0
+ * is found.
+ */
+static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino)
+{
+ struct replay_entry *r;
+
+ ubifs_assert(c, rino->deletion);
+ ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY);
+
+ /*
+ * Find the most recent entry for the inode behind @rino and check
+ * whether it is a deletion.
+ */
+ list_for_each_entry_reverse(r, &c->replay_list, list) {
+ ubifs_assert(c, r->sqnum >= rino->sqnum);
+ if (key_inum(c, &r->key) == key_inum(c, &rino->key))
+ return r->deletion == 0;
+
+ }
+
+ ubifs_assert(c, 0);
+ return false;
+}
+
/**
* apply_replay_entry - apply a replay entry to the TNC.
* @c: UBIFS file-system description object
@@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
{
ino_t inum = key_inum(c, &r->key);
+ if (inode_still_linked(c, r)) {
+ err = 0;
+ break;
+ }
+
err = ubifs_tnc_remove_ino(c, inum);
break;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e58725d51fa8da9133f3f1c54170aa2e43056b91 Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard(a)nod.at>
Date: Wed, 7 Nov 2018 23:04:43 +0100
Subject: [PATCH] ubifs: Handle re-linking of inodes correctly while recovery
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.
Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */
Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.
As solution for this problem, scan the replay list for a re-link entry
before dropping data.
Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
Cc: stable(a)vger.kernel.org
Cc: Russell Senior <russell(a)personaltelco.net>
Cc: Rafał Miłecki <zajec5(a)gmail.com>
Reported-by: Russell Senior <russell(a)personaltelco.net>
Reported-by: Rafał Miłecki <zajec5(a)gmail.com>
Tested-by: Rafał Miłecki <rafal(a)milecki.pl>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index a08c5b7030ea..0a0e65c07c6d 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -212,6 +212,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
return ubifs_tnc_remove_range(c, &min_key, &max_key);
}
+/**
+ * inode_still_linked - check whether inode in question will be re-linked.
+ * @c: UBIFS file-system description object
+ * @rino: replay entry to test
+ *
+ * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1.
+ * This case needs special care, otherwise all references to the inode will
+ * be removed upon the first replay entry of an inode with link count 0
+ * is found.
+ */
+static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino)
+{
+ struct replay_entry *r;
+
+ ubifs_assert(c, rino->deletion);
+ ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY);
+
+ /*
+ * Find the most recent entry for the inode behind @rino and check
+ * whether it is a deletion.
+ */
+ list_for_each_entry_reverse(r, &c->replay_list, list) {
+ ubifs_assert(c, r->sqnum >= rino->sqnum);
+ if (key_inum(c, &r->key) == key_inum(c, &rino->key))
+ return r->deletion == 0;
+
+ }
+
+ ubifs_assert(c, 0);
+ return false;
+}
+
/**
* apply_replay_entry - apply a replay entry to the TNC.
* @c: UBIFS file-system description object
@@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
{
ino_t inum = key_inum(c, &r->key);
+ if (inode_still_linked(c, r)) {
+ err = 0;
+ break;
+ }
+
err = ubifs_tnc_remove_ino(c, inum);
break;
}
From: Michal Hocko <mhocko(a)suse.com>
Burt Holzman has noticed that memcg v1 doesn't notify about OOM events
via eventfd anymore. The reason is that 29ef680ae7c2 ("memcg, oom: move
out_of_memory back to the charge path") has moved the oom handling back
to the charge path. While doing so the notification was left behind in
mem_cgroup_oom_synchronize.
Fix the issue by replicating the oom hierarchy locking and the
notification.
Reported-by: Burt Holzman <burt(a)fnal.gov>
Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path")
Cc: stable # 4.19+
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
---
Hi Andrew,
I forgot to CC you on the patch sent as a reply to the original bug
report [1] so I am reposting with Ack from Johannes. Burt has confirmed
this is resolving the regression for him [2]. 4.20 is out but I have
marked the patch for stable so it should hit both 4.19 and 4.20.
[1] http://lkml.kernel.org/r/20181221153302.GB6410@dhcp22.suse.cz
[2] http://lkml.kernel.org/r/96D4815C-420F-41B7-B1E9-A741E7523596@services.fnal…
mm/memcontrol.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e1469b80cb7..7e6bf74ddb1e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1666,6 +1666,9 @@ enum oom_status {
static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
{
+ enum oom_status ret;
+ bool locked;
+
if (order > PAGE_ALLOC_COSTLY_ORDER)
return OOM_SKIPPED;
@@ -1700,10 +1703,23 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
return OOM_ASYNC;
}
+ mem_cgroup_mark_under_oom(memcg);
+
+ locked = mem_cgroup_oom_trylock(memcg);
+
+ if (locked)
+ mem_cgroup_oom_notify(memcg);
+
+ mem_cgroup_unmark_under_oom(memcg);
if (mem_cgroup_out_of_memory(memcg, mask, order))
- return OOM_SUCCESS;
+ ret = OOM_SUCCESS;
+ else
+ ret = OOM_FAILED;
- return OOM_FAILED;
+ if (locked)
+ mem_cgroup_oom_unlock(memcg);
+
+ return ret;
}
/**
--
2.19.2
Big endian machines (at least the one I have access to) cannot mount
f2fs filesystems anymore.
This is with Linux 4.14.89 but I suspect that 4.9.144 (and later) is
affected as well.
commit 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid
potential overflows") treats the "block_count" from struct
f2fs_super_block as 32-bit little endian value instead of a 64-bit
little endian value.
I tested this fix on top of Linux 4.14.49 but it seems that all stable
and mainline kernel versions are affected:
- 4.9.144 and later because 0cfe75c5b01199 was backported there
- 4.14.86 and later because 0cfe75c5b01199 was backported there
- 4.19
- 4.20-rcX
Martin Blumenstingl (1):
f2fs: fix validation of the block count in sanity_check_raw_super
fs/f2fs/super.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--
2.20.1
Den 23-12-2018 kl. 01:28, skrev Linus Torvalds:
> On Sat, Dec 22, 2018 at 3:07 PM Christian Brauner
> <christian.brauner(a)canonical.com> wrote:
>>
>> However, for this case should I resend the revert?
>
> Since I was pointed at the original email thread, I just picked it up
> from there directly. It still applied cleanly, nothing had changed in
> that area.
>
> Linus
>
This should also be picked up for 4.19 lts
Greg, it's now upstream as:
From 94f82008ce30e2624537d240d64ce718255e0b80 Mon Sep 17 00:00:00 2001
From: Christian Brauner <christian(a)brauner.io>
Date: Thu, 5 Jul 2018 17:51:20 +0200
Subject: Revert "vfs: Allow userns root to call mknod on owned filesystems."
--
Thomas
Mapping the delay slot emulation page as both writeable & executable
presents a security risk, in that if an exploit can write to & jump into
the page then it can be used as an easy way to execute arbitrary code.
Prevent this by mapping the page read-only for userland, and using
access_process_vm() with the FOLL_FORCE flag to write to it from
mips_dsemul().
This will likely be less efficient due to copy_to_user_page() performing
cache maintenance on a whole page, rather than a single line as in the
previous use of flush_cache_sigtramp(). However this delay slot
emulation code ought not to be running in any performance critical paths
anyway so this isn't really a problem, and we can probably do better in
copy_to_user_page() anyway in future.
A major advantage of this approach is that the fix is small & simple to
backport to stable kernels.
Reported-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: stable(a)vger.kernel.org # v4.8+
---
arch/mips/kernel/vdso.c | 4 ++--
arch/mips/math-emu/dsemul.c | 38 +++++++++++++++++++------------------
2 files changed, 22 insertions(+), 20 deletions(-)
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 48a9c6b90e07..9df3ebdc7b0f 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -126,8 +126,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
/* Map delay slot emulation page */
base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
- VM_READ|VM_WRITE|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_READ | VM_EXEC |
+ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
0, NULL);
if (IS_ERR_VALUE(base)) {
ret = base;
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 5450f4d1c920..e2d46cb93ca9 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -214,8 +214,9 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
{
int isa16 = get_isa16_mode(regs->cp0_epc);
mips_instruction break_math;
- struct emuframe __user *fr;
- int err, fr_idx;
+ unsigned long fr_uaddr;
+ struct emuframe fr;
+ int fr_idx, ret;
/* NOP is easy */
if (ir == 0)
@@ -250,27 +251,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
fr_idx = alloc_emuframe();
if (fr_idx == BD_EMUFRAME_NONE)
return SIGBUS;
- fr = &dsemul_page()[fr_idx];
/* Retrieve the appropriately encoded break instruction */
break_math = BREAK_MATH(isa16);
/* Write the instructions to the frame */
if (isa16) {
- err = __put_user(ir >> 16,
- (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff,
- (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(break_math >> 16,
- (u16 __user *)(&fr->badinst));
- err |= __put_user(break_math & 0xffff,
- (u16 __user *)((long)(&fr->badinst) + 2));
+ union mips_instruction _emul = {
+ .halfword = { ir >> 16, ir }
+ };
+ union mips_instruction _badinst = {
+ .halfword = { break_math >> 16, break_math }
+ };
+
+ fr.emul = _emul.word;
+ fr.badinst = _badinst.word;
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user(break_math, &fr->badinst);
+ fr.emul = ir;
+ fr.badinst = break_math;
}
- if (unlikely(err)) {
+ /* Write the frame to user memory */
+ fr_uaddr = (unsigned long)&dsemul_page()[fr_idx];
+ ret = access_process_vm(current, fr_uaddr, &fr, sizeof(fr),
+ FOLL_FORCE | FOLL_WRITE);
+ if (unlikely(ret != sizeof(fr))) {
MIPS_FPU_EMU_INC_STATS(errors);
free_emuframe(fr_idx, current->mm);
return SIGBUS;
@@ -282,10 +287,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
atomic_set(¤t->thread.bd_emu_frame, fr_idx);
/* Change user register context to execute the frame */
- regs->cp0_epc = (unsigned long)&fr->emul | isa16;
-
- /* Ensure the icache observes our newly written frame */
- flush_cache_sigtramp((unsigned long)&fr->emul);
+ regs->cp0_epc = fr_uaddr | isa16;
return 0;
}
--
2.20.0
The AFU Descriptor Template in the PCI config space has a Name Space
field which is a 24 Byte ASCII character string of descriptive name
space for the AFU. The OCXL driver read the string four characters at
a time with pci_read_config_dword().
This optimization is valid on a little-endian system since this is PCI,
but a big-endian system ends up with each subset of four characters in
reverse order.
This could be fixed by switching to read characters one by one. Another
option is to swap the bytes if we're big-endian.
Go for the latter with le32_to_cpu().
Cc: stable(a)vger.kernel.org # v4.16
Signed-off-by: Greg Kurz <groug(a)kaod.org>
Acked-by: Frederic Barrat <fbarrat(a)linux.ibm.com>
---
v2: - silence sparse with (__force __le32) cast
- new changelog
---
drivers/misc/ocxl/config.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 57a6bb1fd3c9..8f2c5d8bd2ee 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
if (rc)
return rc;
ptr = (u32 *) &afu->name[i];
- *ptr = val;
+ *ptr = le32_to_cpu((__force __le32) val);
}
afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */
return 0;
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
pc: c000000000054550: restore_gprs+0xb0/0x180
lr: 0000000000000000
sp: c00000041f157950
msr: 8000000100021033
current = 0xc00000041f143000
paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01
pid = 1021, comm = kworker/11:1
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
enter ? for help
[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
be atomic from a preemption perspective, thus, calling
preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it
might be exposed to the preemption issue, thus, disabling preemption in
this chunk of code.
Changes from v2:
* Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable(a)vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao(a)debian.org>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index e6474a45cef5..fd59fef9931b 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* If TM bits are set to the reserved value, it's an invalid context */
if (MSR_TM_RESV(msr_hi))
return 1;
- /* Pull in the MSR TM bits from the user context */
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /*
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ *
+ * Pull in the MSR TM bits from the user context
+ */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
/* Now, recheckpoint. This loads up all of the checkpointed (older)
* registers, including FP and V[S]Rs. After recheckpointing, the
@@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
}
#endif
+ preempt_enable();
+
return 0;
}
#endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..bbd1c73243d7 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ *
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
regs->msr |= MSR_VEC;
}
+ preempt_enable();
+
return err;
}
#endif
--
2.19.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 747df19747bc9752cd40b9cce761e17a033aa5c2 Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel(a)zonque.org>
Date: Thu, 11 Oct 2018 20:32:05 +0200
Subject: [PATCH] ASoC: sta32x: set ->component pointer in private struct
The ESD watchdog code in sta32x_watchdog() dereferences the pointer
which is never assigned.
This is a regression from a1be4cead9b950 ("ASoC: sta32x: Convert to direct
regmap API usage.") which went unnoticed since nobody seems to use that ESD
workaround.
Fixes: a1be4cead9b950 ("ASoC: sta32x: Convert to direct regmap API usage.")
Signed-off-by: Daniel Mack <daniel(a)zonque.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/sound/soc/codecs/sta32x.c b/sound/soc/codecs/sta32x.c
index d5035f2f2b2b..ce508b4cc85c 100644
--- a/sound/soc/codecs/sta32x.c
+++ b/sound/soc/codecs/sta32x.c
@@ -879,6 +879,9 @@ static int sta32x_probe(struct snd_soc_component *component)
struct sta32x_priv *sta32x = snd_soc_component_get_drvdata(component);
struct sta32x_platform_data *pdata = sta32x->pdata;
int i, ret = 0, thermal = 0;
+
+ sta32x->component = component;
+
ret = regulator_bulk_enable(ARRAY_SIZE(sta32x->supplies),
sta32x->supplies);
if (ret != 0) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2d204ee9d671327915260071c19350d84344e096 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Mon, 10 Sep 2018 14:12:07 +0300
Subject: [PATCH] cifs: integer overflow in in SMB2_ioctl()
The "le32_to_cpu(rsp->OutputOffset) + *plen" addition can overflow and
wrap around to a smaller value which looks like it would lead to an
information leak.
Fixes: 4a72dafa19ba ("SMB2 FSCTL and IOCTL worker function")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel(a)suse.com>
CC: Stable <stable(a)vger.kernel.org>
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 6f0e6b42599c..f54d07bda067 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2459,14 +2459,14 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
/* We check for obvious errors in the output buffer length and offset */
if (*plen == 0)
goto ioctl_exit; /* server returned no data */
- else if (*plen > 0xFF00) {
+ else if (*plen > rsp_iov.iov_len || *plen > 0xFF00) {
cifs_dbg(VFS, "srv returned invalid ioctl length: %d\n", *plen);
*plen = 0;
rc = -EIO;
goto ioctl_exit;
}
- if (rsp_iov.iov_len < le32_to_cpu(rsp->OutputOffset) + *plen) {
+ if (rsp_iov.iov_len - *plen < le32_to_cpu(rsp->OutputOffset)) {
cifs_dbg(VFS, "Malformed ioctl resp: len %d offset %d\n", *plen,
le32_to_cpu(rsp->OutputOffset));
*plen = 0;
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.
Instead of writing the required complicated code for this rare
occurrence, just eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem
longer in truncation and hold punch code to cover the call to
remove_inode_hugepages.
With this modification, code in remove_inode_hugepages checking for
races becomes 'dead' as it can not longer happen. Remove the dead code
and expand comments to explain reasoning. Similarly, checks for races
with truncation in the page fault path can be simplified and removed.
Cc: <stable(a)vger.kernel.org>
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/hugetlbfs/inode.c | 61 ++++++++++++++++++++------------------------
mm/hugetlb.c | 21 ++++++++-------
2 files changed, 38 insertions(+), 44 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..a2fcea5f8225 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
* truncation is indicated by end of range being LLONG_MAX
* In this case, we first scan the range and release found pages.
* After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- * maps and global counts. Page faults can not race with truncation
- * in this routine. hugetlb_no_page() prevents page faults in the
- * truncated range. It checks i_size before allocation, and again after
- * with the page table lock for the page held. The same lock must be
- * acquired to unmap a page.
+ * maps and global counts.
* hole punch is indicated if end is not LLONG_MAX
* In the hole punch case we scan the range and release found pages.
* Only when releasing a page is the associated region/reserv map
* deleted. The region/reserv map for ranges without associated
- * pages are not modified. Page faults can race with hole punch.
- * This is indicated if we find a mapped page.
+ * pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
* Note: If the passed end of range value is beyond the end of file, but
* not LLONG_MAX this routine still performs a hole punch operation.
*/
@@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
for (i = 0; i < pagevec_count(&pvec); ++i) {
struct page *page = pvec.pages[i];
- u32 hash;
index = page->index;
- hash = hugetlb_fault_mutex_hash(h, current->mm,
- &pseudo_vma,
- mapping, index, 0);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
/*
- * If page is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) now after taking
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the page.
- *
- * This race can only happen in the hole punch case.
- * Getting here in a truncate operation is a bug.
+ * A mapped page is impossible as callers should unmap
+ * all references before calling. And, i_mmap_rwsem
+ * prevents the creation of additional mappings.
*/
- if (unlikely(page_mapped(page))) {
- BUG_ON(truncate_op);
-
- i_mmap_lock_write(mapping);
- hugetlb_vmdelete_list(&mapping->i_mmap,
- index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
- i_mmap_unlock_write(mapping);
- }
+ VM_BUG_ON(page_mapped(page));
lock_page(page);
/*
@@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
unlock_page(page);
- mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
huge_pagevec_release(&pvec);
cond_resched();
@@ -482,9 +462,20 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
static void hugetlbfs_evict_inode(struct inode *inode)
{
+ struct address_space *mapping = inode->i_mapping;
struct resv_map *resv_map;
+ /*
+ * The vfs layer guarantees that there are no other users of this
+ * inode. Therefore, it would be safe to call remove_inode_hugepages
+ * without holding i_mmap_rwsem. We acquire and hold here to be
+ * consistent with other callers. Since there will be no contention
+ * on the semaphore, overhead is negligible.
+ */
+ i_mmap_lock_write(mapping);
remove_inode_hugepages(inode, 0, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
+
resv_map = (struct resv_map *)inode->i_mapping->private_data;
/* root inode doesn't have the resv_map, so we should check it */
if (resv_map)
@@ -505,8 +496,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
return 0;
}
@@ -540,8 +531,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
hugetlb_vmdelete_list(&mapping->i_mmap,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
+ i_mmap_unlock_write(mapping);
inode_unlock(inode);
}
@@ -624,7 +615,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
/* addr is the offset within the file (zero based) */
addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */
+ /*
+ * fault mutex taken here, protects against fault path
+ * and hole punch. inode_lock previously taken protects
+ * against truncation.
+ */
hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
index, addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2a3162030167..cfd9790b01e3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3757,16 +3757,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
/*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
+ * We can not race with truncation due to holding i_mmap_rwsem.
+ * Check once here for faults beyond end of file.
*/
+ size = i_size_read(mapping->host) >> huge_page_shift(h);
+ if (idx >= size)
+ goto out;
+
retry:
page = find_lock_page(mapping, idx);
if (!page) {
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto out;
-
/*
* Check for page in userfault range
*/
@@ -3856,9 +3856,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
ptl = huge_pte_lock(h, mm, ptep);
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto backout;
ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3961,8 +3958,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/*
* Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
- * until finished with ptep. This prevents huge_pmd_unshare from
- * being called elsewhere and making the ptep no longer valid.
+ * until finished with ptep. This serves two purposes:
+ * 1) It prevents huge_pmd_unshare from being called elsewhere
+ * and making the ptep no longer valid.
+ * 2) It synchronizes us with file truncation.
*
* ptep could have already be assigned via huge_pte_offset. That
* is OK, as huge_pte_alloc will return the same value unless
--
2.17.2
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.
Instead of writing the required complicated code for this rare
occurrence, just eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem
longer in truncation and hold punch code to cover the call to
remove_inode_hugepages.
With this modification, code in remove_inode_hugepages checking for
races becomes 'dead' as it can not longer happen. Remove the dead code
and expand comments to explain reasoning. Similarly, checks for races
with truncation in the page fault path can be simplified and removed.
Cc: <stable(a)vger.kernel.org>
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/hugetlbfs/inode.c | 50 +++++++++++++++-----------------------------
mm/hugetlb.c | 21 +++++++++----------
2 files changed, 27 insertions(+), 44 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..a9c00c6ef80d 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
* truncation is indicated by end of range being LLONG_MAX
* In this case, we first scan the range and release found pages.
* After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- * maps and global counts. Page faults can not race with truncation
- * in this routine. hugetlb_no_page() prevents page faults in the
- * truncated range. It checks i_size before allocation, and again after
- * with the page table lock for the page held. The same lock must be
- * acquired to unmap a page.
+ * maps and global counts.
* hole punch is indicated if end is not LLONG_MAX
* In the hole punch case we scan the range and release found pages.
* Only when releasing a page is the associated region/reserv map
* deleted. The region/reserv map for ranges without associated
- * pages are not modified. Page faults can race with hole punch.
- * This is indicated if we find a mapped page.
+ * pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
* Note: If the passed end of range value is beyond the end of file, but
* not LLONG_MAX this routine still performs a hole punch operation.
*/
@@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
for (i = 0; i < pagevec_count(&pvec); ++i) {
struct page *page = pvec.pages[i];
- u32 hash;
index = page->index;
- hash = hugetlb_fault_mutex_hash(h, current->mm,
- &pseudo_vma,
- mapping, index, 0);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
/*
- * If page is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) now after taking
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the page.
- *
- * This race can only happen in the hole punch case.
- * Getting here in a truncate operation is a bug.
+ * A mapped page is impossible as callers should unmap
+ * all references before calling. And, i_mmap_rwsem
+ * prevents the creation of additional mappings.
*/
- if (unlikely(page_mapped(page))) {
- BUG_ON(truncate_op);
-
- i_mmap_lock_write(mapping);
- hugetlb_vmdelete_list(&mapping->i_mmap,
- index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
- i_mmap_unlock_write(mapping);
- }
+ VM_BUG_ON(page_mapped(page));
lock_page(page);
/*
@@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
unlock_page(page);
- mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
huge_pagevec_release(&pvec);
cond_resched();
@@ -505,8 +485,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
return 0;
}
@@ -540,8 +520,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
hugetlb_vmdelete_list(&mapping->i_mmap,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
+ i_mmap_unlock_write(mapping);
inode_unlock(inode);
}
@@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
/* addr is the offset within the file (zero based) */
addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */
+ /*
+ * fault mutex taken here, protects against fault path
+ * and hole punch. inode_lock previously taken protects
+ * against truncation.
+ */
hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
index, addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ab4c77b8c72c..25a0cd2f8b39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
/*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
+ * We can not race with truncation due to holding i_mmap_rwsem.
+ * Check once here for faults beyond end of file.
*/
+ size = i_size_read(mapping->host) >> huge_page_shift(h);
+ if (idx >= size)
+ goto out;
+
retry:
page = find_lock_page(mapping, idx);
if (!page) {
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto out;
-
/*
* Check for page in userfault range
*/
@@ -3859,9 +3859,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
ptl = huge_pte_lock(h, mm, ptep);
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto backout;
ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/*
* Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
- * until finished with ptep. This prevents huge_pmd_unshare from
- * being called elsewhere and making the ptep no longer valid.
+ * until finished with ptep. This serves two purposes:
+ * 1) It prevents huge_pmd_unshare from being called elsewhere
+ * and making the ptep no longer valid.
+ * 2) It synchronizes us with file truncation.
*
* ptep could have already be assigned via huge_pte_offset. That
* is OK, as huge_pte_alloc will return the same value unless
--
2.17.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 55e56f06ed71d9441f3abd5b1d3c1a870812b3fe Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <willy(a)infradead.org>
Date: Tue, 27 Nov 2018 13:16:34 -0800
Subject: [PATCH] dax: Don't access a freed inode
After we drop the i_pages lock, the inode can be freed at any time.
The get_unlocked_entry() code has no choice but to reacquire the lock,
so it can't be used here. Create a new wait_entry_unlocked() which takes
care not to acquire the lock or dereference the address_space in any way.
Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Matthew Wilcox <willy(a)infradead.org>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/fs/dax.c b/fs/dax.c
index e69fc231833b..3f592dc18d67 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -232,6 +232,34 @@ static void *get_unlocked_entry(struct xa_state *xas)
}
}
+/*
+ * The only thing keeping the address space around is the i_pages lock
+ * (it's cycled in clear_inode() after removing the entries from i_pages)
+ * After we call xas_unlock_irq(), we cannot touch xas->xa.
+ */
+static void wait_entry_unlocked(struct xa_state *xas, void *entry)
+{
+ struct wait_exceptional_entry_queue ewait;
+ wait_queue_head_t *wq;
+
+ init_wait(&ewait.wait);
+ ewait.wait.func = wake_exceptional_entry_func;
+
+ wq = dax_entry_waitqueue(xas, entry, &ewait.key);
+ prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE);
+ xas_unlock_irq(xas);
+ schedule();
+ finish_wait(wq, &ewait.wait);
+
+ /*
+ * Entry lock waits are exclusive. Wake up the next waiter since
+ * we aren't sure we will acquire the entry lock and thus wake
+ * the next waiter up on unlock.
+ */
+ if (waitqueue_active(wq))
+ __wake_up(wq, TASK_NORMAL, 1, &ewait.key);
+}
+
static void put_unlocked_entry(struct xa_state *xas, void *entry)
{
/* If we were the only waiter woken, wake the next one */
@@ -389,9 +417,7 @@ bool dax_lock_mapping_entry(struct page *page)
entry = xas_load(&xas);
if (dax_is_locked(entry)) {
rcu_read_unlock();
- entry = get_unlocked_entry(&xas);
- xas_unlock_irq(&xas);
- put_unlocked_entry(&xas, entry);
+ wait_entry_unlocked(&xas, entry);
rcu_read_lock();
continue;
}
Hi Sasha,
On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions.
>
> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146,
Neat! I like the idea of this automation :)
> v4.19.10: Build OK!
> v4.14.89: Build OK!
> v4.9.146: Failed to apply! Possible dependencies:
> 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request")
> 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS")
> 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success")
> 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event")
> 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event")
> 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps")
> 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor")
> d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE")
This list includes the correct soft dependency - commit 897ab3e0c49e
("userfaultfd: non-cooperative: add event for memory unmaps") which
added an extra argument to mmap_region().
> How should we proceed with this patch?
The backport to v4.9 should simply drop the last argument (NULL) in the
call to mmap_region().
Is there some way I can indicate this sort of thing in future patches so
that the automation can spot that I already know it won't apply cleanly
to a particular range of kernel versions? Or even better, that I could
indicate what needs to be changed when backporting to those kernel
versions?
Thanks,
Paul
+cc Greg, stable
Greensky, James J <james.j.greensky(a)intel.com> 于2018年12月21日周五 上午11:48写道:
>
> Commit d38d272592737ea88a20 ("perf tools: Synthesize GROUP_DESC feature in pipe mode") broke the LT 4.14 branch when using event groups in pipe-mode.
>
> # perf record -e '{cycles,instructions,branches}' -- sleep 4 | perf report
> # To display the perf.data header info, please use --header/--header-only options
> #
> Oxd7c [0x60]: failed to process type: 80
> Error:
> Failed to process sample
>
> Commit a2015516c5c0be932a69 ("perf record: Synthesize features before events in pipe mode") is the fix. Can we get this cherry-picked and applied?
>
If we fail to pin the ggtt vma slot for the ppgtt page tables, we need
to unwind the locals before reporting the error. Or else on subsequent
attempts to bind the page tables into the ggtt, we will already believe
that the vma has been pinned and continue on blithely. If something else
should happen to be at that location, choas ensues.
Fixes: a2bbf7148342 ("drm/i915/gtt: Only keep gen6 page directories pinned while active")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Cc: Matthew Auld <matthew.william.auld(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # v4.19+
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6e31745f6156..4ed2f3e61347 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2073,6 +2073,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
{
struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+ int err;
/*
* Workaround the limited maximum vma->pin_count and the aliasing_ppgtt
@@ -2088,9 +2089,17 @@ int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
* allocator works in address space sizes, so it's multiplied by page
* size. We allocate at the top of the GTT to avoid fragmentation.
*/
- return i915_vma_pin(ppgtt->vma,
- 0, GEN6_PD_ALIGN,
- PIN_GLOBAL | PIN_HIGH);
+ err = i915_vma_pin(ppgtt->vma,
+ 0, GEN6_PD_ALIGN,
+ PIN_GLOBAL | PIN_HIGH);
+ if (err)
+ goto unpin;
+
+ return 0;
+
+unpin:
+ ppgtt->pin_count = 0;
+ return err;
}
void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base)
--
2.20.1
Big endian machines (at least the one I have access to) cannot mount
f2fs filesystems anymore.
This is with Linux 4.14.89 but I suspect that 4.9.144 (and later) is
affected as well.
commit 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid
potential overflows") treats the "block_count" from struct
f2fs_super_block as 32-bit little endian value instead of a 64-bit
little endian value.
I tested this fix on top of Linux 4.14.49 but it seems that all stable
and mainline kernel versions are affected:
- 4.9.144 and later because 0cfe75c5b01199 was backported there
- 4.14.86 and later because 0cfe75c5b01199 was backported there
- 4.19
- 4.20-rcX
changes since v1 at [0]:
- change the printk format for block_count from "%u" to "%llu" (thanks
to "kbuild test robot" for spotting this)
- added Chao Yu's reviewed by
[0] https://lore.kernel.org/patchwork/cover/1027285/
Martin Blumenstingl (1):
f2fs: fix validation of the block count in sanity_check_raw_super
fs/f2fs/super.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.20.1
All fields in the PE are big-endian. Use cpu_to_be32() like everywhere
else something is written to the PE. Otherwise a wrong TID will be used
by the NPU. If this TID happens to point to an existing thread sharing
the same mm, it could be woken up by error. This is highly improbable
though. The likely outcome of this is the NPU not finding the target
thread and forcing the AFU into sending an interrupt, which userspace
is supposed to handle anyway.
Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9")
Cc: stable(a)vger.kernel.org # v4.18
Signed-off-by: Greg Kurz <groug(a)kaod.org>
---
This bug remained unnoticed so far because the current OCXL test suite
happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context.
This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe()
which re-writes the TID in the PE with the appropriate endianness.
I have some patches that change the behavior of the OCXL test suite so that
it can catch the issue:
https://github.com/gkurz/libocxl/commits/wake-host-thread-rework
---
drivers/misc/ocxl/link.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 31695a078485..646d16450066 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
mutex_lock(&spa->spa_lock);
- pe->tid = tid;
+ pe->tid = cpu_to_be32(tid);
/*
* The barrier makes sure the PE is updated
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00ee8b60102862f4daf0814d12a2ea2744fc0b9b Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard(a)nod.at>
Date: Mon, 11 Jun 2018 23:41:09 +0200
Subject: [PATCH] ubifs: Fix directory size calculation for symlinks
We have to account the name of the symlink and not the target length.
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 9da224d4f2da..e8616040bffc 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1123,8 +1123,7 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
struct ubifs_inode *ui;
struct ubifs_inode *dir_ui = ubifs_inode(dir);
struct ubifs_info *c = dir->i_sb->s_fs_info;
- int err, len = strlen(symname);
- int sz_change = CALC_DENT_SIZE(len);
+ int err, sz_change, len = strlen(symname);
struct fscrypt_str disk_link;
struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
.new_ino_d = ALIGN(len, 8),
@@ -1151,6 +1150,8 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
if (err)
goto out_budg;
+ sz_change = CALC_DENT_SIZE(fname_len(&nm));
+
inode = ubifs_new_inode(c, dir, S_IFLNK | S_IRWXUGO);
if (IS_ERR(inode)) {
err = PTR_ERR(inode);
From: Dave Chinner <dchinner(a)redhat.com>
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION -- xfs
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ...
run fstests generic/038 at 2018-12-20 18:43:05
XFS (sdc): Unmounting Filesystem
XFS (sdc): Mounting V5 Filesystem
XFS (sdc): Ending clean mount
BUG: Bad page state in process kswapd0 pfn:3a7fa
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
flags: 0xfffffc0000000()
raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
raw: 0000000000000001 0000000000000000 00000000ffffffff
page dumped because: non-NULL mapping
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
bad_page.cold.116+0x8a/0xbd
free_pcppages_bulk+0x4bf/0x6a0
free_unref_page_list+0x10f/0x1f0
shrink_page_list+0x49d/0xf50
shrink_inactive_list+0x19d/0x3b0
shrink_node_memcg.constprop.77+0x398/0x690
? shrink_slab.constprop.81+0x278/0x3f0
shrink_node+0x7a/0x2f0
kswapd+0x34b/0x6d0
? node_reclaim+0x240/0x240
kthread+0x11f/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Disabling lock debugging due to kernel taint
....
The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
---
fs/iomap.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index 5bc172f3dfe8..d6bc98ae8d35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,12 +116,6 @@ iomap_page_create(struct inode *inode, struct page *page)
atomic_set(&iop->read_count, 0);
atomic_set(&iop->write_count, 0);
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
- /*
- * migrate_page_move_mapping() assumes that pages with private data have
- * their count elevated by 1.
- */
- get_page(page);
set_page_private(page, (unsigned long)iop);
SetPagePrivate(page);
return iop;
@@ -138,7 +132,6 @@ iomap_page_release(struct page *page)
WARN_ON_ONCE(atomic_read(&iop->write_count));
ClearPagePrivate(page);
set_page_private(page, 0);
- put_page(page);
kfree(iop);
}
--
2.19.1
From: Peter Xu <peterx(a)redhat.com>
Subject: mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs. However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound()
while actually they don't make sense at all when it's a migration entry.
Fix them up. Since at it, drop the ifdef together as not needed.
Note that if my understanding is correct about the problem then if without
the patch there is chance to lose some of the dirty bits in the migrating
pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset
instead of bit 2) and it could potentially corrupt the memory of an
userspace program which depends on the dirty bit.
Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split
+++ a/mm/huge_memory.c
@@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru
*/
old_pmd = pmdp_invalidate(vma, haddr, pmd);
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
pmd_migration = is_pmd_migration_entry(old_pmd);
- if (pmd_migration) {
+ if (unlikely(pmd_migration)) {
swp_entry_t entry;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_to_page(swp_offset(entry));
- } else
-#endif
+ write = is_write_migration_entry(entry);
+ young = false;
+ soft_dirty = pmd_swp_soft_dirty(old_pmd);
+ } else {
page = pmd_page(old_pmd);
+ if (pmd_dirty(old_pmd))
+ SetPageDirty(page);
+ write = pmd_write(old_pmd);
+ young = pmd_young(old_pmd);
+ soft_dirty = pmd_soft_dirty(old_pmd);
+ }
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
- if (pmd_dirty(old_pmd))
- SetPageDirty(page);
- write = pmd_write(old_pmd);
- young = pmd_young(old_pmd);
- soft_dirty = pmd_soft_dirty(old_pmd);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
From: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Subject: mm, memory_hotplug: initialize struct pages for the full memory section
If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers:
Here are the the panic examples:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
[<00000000008f15c4>] show_valid_zones+0x5c/0x190
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
kernel parameter mem=3075M
--------------------------
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
[<00000000008f12fa>] show_mem_removable+0x9a/0xd8
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<000000000038596c>] is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.
Michal said:
: This has alwways been problem AFAIU. It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8d9 kernels mostly and
: therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during
: allocation in vmemmap")
Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer(a)de.ibm.com>
Suggested-by: Michal Hocko <mhocko(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin(a)microsoft.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/mm/page_alloc.c~mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section
+++ a/mm/page_alloc.c
@@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned
cond_resched();
}
}
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * If the zone does not span the rest of the section then
+ * we should at least initialize those pages. Otherwise we
+ * could blow up on a poisoned page in some paths which depend
+ * on full sections being initialized (e.g. memory hotplug).
+ */
+ while (end_pfn % PAGES_PER_SECTION) {
+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+ end_pfn++;
+ }
+#endif
}
#ifdef CONFIG_ZONE_DEVICE
_
While looking at BUGs associated with invalid huge page map counts,
it was discovered and observed that a huge pte pointer could become
'invalid' and point to another task's page table. Consider the
following:
A task takes a page fault on a shared hugetlbfs file and calls
huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
shared pmd.
Now, another task truncates the hugetlbfs file. As part of truncation,
it unmaps everyone who has the file mapped. If the range being
truncated is covered by a shared pmd, huge_pmd_unshare will be called.
For all but the last user of the shared pmd, huge_pmd_unshare will
clear the pud pointing to the pmd. If the task in the middle of the
page fault is not the last user, the ptep returned by huge_pte_alloc
now points to another task's page table or worse. This leads to bad
things such as incorrect page map/reference counts or invalid memory
references.
To fix, expand the use of i_mmap_rwsem as follows:
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
huge_pmd_share is only called via huge_pte_alloc, so callers of
huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Cc: <stable(a)vger.kernel.org>
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++-----------
mm/memory-failure.c | 14 ++++++++-
mm/migrate.c | 13 ++++++++-
mm/rmap.c | 3 ++
mm/userfaultfd.c | 11 +++++--
5 files changed, 91 insertions(+), 20 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 309fb8c969af..ab4c77b8c72c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3239,6 +3239,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
int cow;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
+ struct address_space *mapping = vma->vm_file->f_mapping;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
int ret = 0;
@@ -3252,11 +3253,23 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
spinlock_t *src_ptl, *dst_ptl;
+
src_pte = huge_pte_offset(src, addr, sz);
if (!src_pte)
continue;
+
+ /*
+ * i_mmap_rwsem must be held to call huge_pte_alloc.
+ * Continue to hold until finished with dst_pte, otherwise
+ * it could go away if part of a shared pmd.
+ *
+ * Technically, i_mmap_rwsem is only needed in the non-cow
+ * case as cow mappings are not shared.
+ */
+ i_mmap_lock_read(mapping);
dst_pte = huge_pte_alloc(dst, addr, sz);
if (!dst_pte) {
+ i_mmap_unlock_read(mapping);
ret = -ENOMEM;
break;
}
@@ -3271,8 +3284,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* after taking the lock below.
*/
dst_entry = huge_ptep_get(dst_pte);
- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+ if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+ i_mmap_unlock_read(mapping);
continue;
+ }
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -3321,6 +3336,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
}
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
+
+ i_mmap_unlock_read(mapping);
}
if (cow)
@@ -3772,14 +3789,18 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
};
/*
- * hugetlb_fault_mutex must be dropped before
- * handling userfault. Reacquire after handling
- * fault to make calling code simpler.
+ * hugetlb_fault_mutex and i_mmap_rwsem must be
+ * dropped before handling userfault. Reacquire
+ * after handling fault to make calling code simpler.
*/
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping,
idx, haddr);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+
ret = handle_userfault(&vmf, VM_UFFD_MISSING);
+
+ i_mmap_lock_read(mapping);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
goto out;
}
@@ -3927,6 +3948,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
+ /*
+ * Since we hold no locks, ptep could be stale. That is
+ * OK as we are only making decisions based on content and
+ * not actually modifying content here.
+ */
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
migration_entry_wait_huge(vma, mm, ptep);
@@ -3934,20 +3960,31 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- } else {
- ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
- if (!ptep)
- return VM_FAULT_OOM;
}
+ /*
+ * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
+ * until finished with ptep. This prevents huge_pmd_unshare from
+ * being called elsewhere and making the ptep no longer valid.
+ *
+ * ptep could have already be assigned via huge_pte_offset. That
+ * is OK, as huge_pte_alloc will return the same value unless
+ * something changed.
+ */
mapping = vma->vm_file->f_mapping;
- idx = vma_hugecache_offset(h, vma, haddr);
+ i_mmap_lock_read(mapping);
+ ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
+ if (!ptep) {
+ i_mmap_unlock_read(mapping);
+ return VM_FAULT_OOM;
+ }
/*
* Serialize hugepage allocation and instantiation, so that we don't
* get spurious allocation failures if two CPUs race to instantiate
* the same page in the page cache.
*/
+ idx = vma_hugecache_offset(h, vma, haddr);
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -4035,6 +4072,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
out_mutex:
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
/*
* Generally it's safe to hold refcount during waiting page lock. But
* here we just wait to defer the next page fault to avoid busy loop and
@@ -4639,10 +4677,12 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
* Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc()
* and returns the corresponding pte. While this is not necessary for the
* !shared pmd case because we can allocate the pmd later as well, it makes the
- * code much cleaner. pmd allocation is essential for the shared case because
- * pud has to be populated inside the same i_mmap_rwsem section - otherwise
- * racing tasks could either miss the sharing (see huge_pte_offset) or select a
- * bad pmd for sharing.
+ * code much cleaner.
+ *
+ * This routine must be called with i_mmap_rwsem held in at least read mode.
+ * For hugetlbfs, this prevents removal of any page table entries associated
+ * with the address space. This is important as we are setting up sharing
+ * based on existing page table entries (mappings).
*/
pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
{
@@ -4659,7 +4699,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
if (!vma_shareable(vma, addr))
return (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_lock_write(mapping);
vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -4689,7 +4728,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
spin_unlock(ptl);
out:
pte = (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_unlock_write(mapping);
return pte;
}
@@ -4700,7 +4738,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
* indicated by page_count > 1, unmap is achieved by clearing pud and
* decrementing the ref count. If count == 1, the pte page is not shared.
*
- * called with page table lock held.
+ * Called with page table lock held and i_mmap_rwsem held in write mode.
*
* returns: 1 successfully unmapped a shared pte page
* 0 the underlying pte page is not shared, or it is the last user
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0cd3de3550f0..b992d1295578 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1028,7 +1028,19 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
if (kill)
collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
- unmap_success = try_to_unmap(hpage, ttu);
+ if (!PageHuge(hpage)) {
+ unmap_success = try_to_unmap(hpage, ttu);
+ } else {
+ /*
+ * For hugetlb pages, try_to_unmap could potentially call
+ * huge_pmd_unshare. Because of this, take semaphore in
+ * write mode here and set TTU_RMAP_LOCKED to indicate we
+ * have taken the lock at this higer level.
+ */
+ i_mmap_lock_write(mapping);
+ unmap_success = try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
+ }
if (!unmap_success)
pr_err("Memory failure: %#lx: failed to unmap page (mapcount=%d)\n",
pfn, page_mapcount(hpage));
diff --git a/mm/migrate.c b/mm/migrate.c
index 84381b55b2bd..725edaef238a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1307,8 +1307,19 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
goto put_anon;
if (page_mapped(hpage)) {
+ struct address_space *mapping = page_mapping(hpage);
+
+ /*
+ * try_to_unmap could potentially call huge_pmd_unshare.
+ * Because of this, take semaphore in write mode here and
+ * set TTU_RMAP_LOCKED to let lower levels know we have
+ * taken the lock.
+ */
+ i_mmap_lock_write(mapping);
try_to_unmap(hpage,
- TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+ TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
+ TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
page_was_mapped = 1;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 85b7f9423352..322e656d0225 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1374,6 +1374,9 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/*
* If sharing is possible, start and end will be adjusted
* accordingly.
+ *
+ * If called for a huge page, caller must hold i_mmap_rwsem
+ * in write mode as it is possible to call huge_pmd_unshare.
*/
adjust_range_if_pmd_sharing_possible(vma, &start, &end);
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 458acda96f20..48368589f519 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -267,10 +267,14 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
VM_BUG_ON(dst_addr & ~huge_page_mask(h));
/*
- * Serialize via hugetlb_fault_mutex
+ * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
+ * i_mmap_rwsem ensures the dst_pte remains valid even
+ * in the case of shared pmds. fault mutex prevents
+ * races with other faulting threads.
*/
- idx = linear_page_index(dst_vma, dst_addr);
mapping = dst_vma->vm_file->f_mapping;
+ i_mmap_lock_read(mapping);
+ idx = linear_page_index(dst_vma, dst_addr);
hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping,
idx, dst_addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -279,6 +283,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
dst_pte = huge_pte_alloc(dst_mm, dst_addr, huge_page_size(h));
if (!dst_pte) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -286,6 +291,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
dst_pteval = huge_ptep_get(dst_pte);
if (!huge_pte_none(dst_pteval)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -293,6 +299,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
dst_addr, src_addr, &page);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
vm_alloc_shared = vm_shared;
cond_resched();
--
2.17.2
This is a note to let you know that I've just added the patch titled
USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 8d503f206c336677954160ac62f0c7d9c219cd89 Mon Sep 17 00:00:00 2001
From: Scott Chen <scott(a)labau.com.tw>
Date: Thu, 13 Dec 2018 06:01:47 -0500
Subject: USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
Add device ids to pl2303 for the HP POS pole displays:
LM920: 03f0:026b
TD620: 03f0:0956
LD960TA: 03f0:4439
LD220TA: 03f0:4349
LM940: 03f0:5039
Signed-off-by: Scott Chen <scott(a)labau.com.tw>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/pl2303.c | 5 +++++
drivers/usb/serial/pl2303.h | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
index a4e0d13fc121..98e7a5df0f6d 100644
--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -91,9 +91,14 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(YCCABLE_VENDOR_ID, YCCABLE_PRODUCT_ID) },
{ USB_DEVICE(SUPERIAL_VENDOR_ID, SUPERIAL_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD220_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LD220TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD960_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LD960TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LCM220_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LCM960_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LM920_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LM940_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_TD620_PRODUCT_ID) },
{ USB_DEVICE(CRESSI_VENDOR_ID, CRESSI_EDY_PRODUCT_ID) },
{ USB_DEVICE(ZEAGLE_VENDOR_ID, ZEAGLE_N2ITION3_PRODUCT_ID) },
{ USB_DEVICE(SONY_VENDOR_ID, SONY_QN3USB_PRODUCT_ID) },
diff --git a/drivers/usb/serial/pl2303.h b/drivers/usb/serial/pl2303.h
index 26965cc23c17..4e2554d55362 100644
--- a/drivers/usb/serial/pl2303.h
+++ b/drivers/usb/serial/pl2303.h
@@ -119,10 +119,15 @@
/* Hewlett-Packard POS Pole Displays */
#define HP_VENDOR_ID 0x03f0
+#define HP_LM920_PRODUCT_ID 0x026b
+#define HP_TD620_PRODUCT_ID 0x0956
#define HP_LD960_PRODUCT_ID 0x0b39
#define HP_LCM220_PRODUCT_ID 0x3139
#define HP_LCM960_PRODUCT_ID 0x3239
#define HP_LD220_PRODUCT_ID 0x3524
+#define HP_LD220TA_PRODUCT_ID 0x4349
+#define HP_LD960TA_PRODUCT_ID 0x4439
+#define HP_LM940_PRODUCT_ID 0x5039
/* Cressi Edy (diving computer) PC interface */
#define CRESSI_VENDOR_ID 0x04b8
--
2.20.1
> From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Date: 2018年12月20日周四 上午10:39
> Subject: [PATCH 4.14 00/72] 4.14.90-stable review
> To: <linux-kernel(a)vger.kernel.org>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>,
> <torvalds(a)linux-foundation.org>, <akpm(a)linux-foundation.org>,
> <linux(a)roeck-us.net>, <shuah(a)kernel.org>, <patches(a)kernelci.org>,
> <ben.hutchings(a)codethink.co.uk>, <lkft-triage(a)lists.linaro.org>,
> <stable(a)vger.kernel.org>
>
>
> This is the start of the stable review cycle for the 4.14.90 release.
> There are 72 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat Dec 22 08:59:06 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.90-rc…
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
Merged, basic functional tests, no regression found.
Thanks,
--
Jack Wang
Linux Kernel Developer
1&1 IONOS Cloud GmbH | Greifswalder Str. 207 | 10405 Berlin | Germany
Phone: +49 30 57700-8042 | Fax: +49 30 57700-8598
E-mail: jinpu.wang(a)cloud.ionos.com | Web: www.ionos.de
Head Office: Berlin, Germany
District Court Berlin Charlottenburg, Registration number: HRB 125506 B
Executive Management: Christoph Steffens, Matthias Steinberg, Achim Weiss
Member of United Internet
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this
e-mail in any way is prohibited. If you have received this e-mail in
error, please notify the sender and delete the e-mail.
This is the start of the stable review cycle for the 4.9.147 release.
There are 61 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Dec 22 08:58:31 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.147-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.147-rc1
Trent Piepho <tpiepho(a)impinj.com>
rtc: snvs: Add timeouts to avoid kernel lockups
Guy Shapiro <guy.shapiro(a)mobi-wize.com>
rtc: snvs: add a missing write sync
Israel Rukshin <israelr(a)mellanox.com>
nvmet-rdma: fix response use after free
Hans de Goede <hdegoede(a)redhat.com>
i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
Adamski, Krzysztof (Nokia - PL/Wroclaw) <krzysztof.adamski(a)nokia.com>
i2c: axxia: properly handle master timeout
Stefan Hajnoczi <stefanha(a)redhat.com>
vhost/vsock: fix reset orphans race with close timeout
Steve French <stfrench(a)microsoft.com>
cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Sam Bobroff <sbobroff(a)linux.ibm.com>
drm/ast: Fix connector leak during driver unload
Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
ethernet: fman: fix wrong of_node_put() in probe function
Vladimir Murzin <vladimir.murzin(a)arm.com>
ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart
Chris Cole <chris(a)sageembedded.com>
ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
Alexei Starovoitov <ast(a)kernel.org>
bpf: check pending signals while verifying programs
Saeed Mahameed <saeedm(a)mellanox.com>
net/mlx4_en: Fix build break when CONFIG_INET is off
Anderson Luiz Alves <alacn1(a)gmail.com>
mv88e6060: disable hardware level MAC learning
Juha-Matti Tilli <juha-matti.tilli(a)iki.fi>
libata: whitelist all SAMSUNG MZ7KM* solid-state disks
Tony Lindgren <tony(a)atomide.com>
Input: omap-keypad - fix keyboard debounce configuration
Dan Carpenter <dan.carpenter(a)oracle.com>
clk: mmp: Off by one in mmp_clk_add()
Dan Carpenter <dan.carpenter(a)oracle.com>
clk: mvebu: Off by one bugs in cp110_of_clk_get()
Yangtao Li <tiny.windzz(a)gmail.com>
ide: pmac: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/tty: add missing of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/sbus/char: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
sbus: char: add of_node_put()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
SUNRPC: Fix a potential race in xprt_connect()
Dave Kleikamp <dave.kleikamp(a)oracle.com>
nfs: don't dirty kernel pages read by direct-io
Toni Peltonen <peltzi(a)peltzi.fi>
bonding: fix 802.3ad state sent to partner when unbinding slave
Jose Abreu <joabreu(a)synopsys.com>
ARC: io.h: Implement reads{x}()/writes{x}()
Sean Paul <seanpaul(a)chromium.org>
drm/msm: Grab a vblank reference when waiting for commit_done
YiFei Zhu <zhuyifei1999(a)gmail.com>
x86/earlyprintk/efi: Fix infinite loop on some screen widths
Cathy Avery <cavery(a)redhat.com>
scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload
Fred Herard <fred.herard(a)oracle.com>
scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
Alexey Khoroshilov <khoroshilov(a)ispras.ru>
mac80211_hwsim: fix module init error paths for netlink
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
locking/qspinlock: Fix build for anonymous union in older GCC compilers
Peter Zijlstra <peterz(a)infradead.org>
locking/qspinlock, x86: Provide liveness guarantee
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock/x86: Increase _Q_PENDING_LOOPS upper bound
Peter Zijlstra <peterz(a)infradead.org>
locking/qspinlock: Re-order code
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Kill cmpxchg() loop when claiming lock from head of queue
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Remove duplicate clear_pending() function from PV code
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock'
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Bound spinning on pending->locked transition in slowpath
Will Deacon <will.deacon(a)arm.com>
locking/qspinlock: Ensure node is initialised before updating prev->next
Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
locking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath()
Michael J. Ruhl <michael.j.ruhl(a)intel.com>
IB/hfi1: Remove race conditions in user_sdma send path
Ilan Peer <ilan.peer(a)intel.com>
mac80211: Fix condition validating WMM IE
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
mac80211: don't WARN on bad WMM parameters from buggy APs
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/execlists: Apply a full mb before execution for Braswell
Brian Norris <briannorris(a)chromium.org>
Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec"
Radu Rendec <radu.rendec(a)gmail.com>
powerpc/msi: Fix NULL pointer access in teardown code
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak of instance function hash filters
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak in set_trigger_filter()
Lubomir Rintel <lkundrak(a)v3.sk>
ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt
Aaro Koskinen <aaro.koskinen(a)iki.fi>
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Jeff Moyer <jmoyer(a)redhat.com>
aio: fix spectre gadget in lookup_ioctx
Chen-Yu Tsai <wens(a)csie.org>
pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11
Ingo Molnar <mingo(a)kernel.org>
timer/debug: Change /proc/timer_list from 0444 to 0400
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow users to limit scope of endpoint
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree-test: lower default params
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree_test.c: make input module parameters
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow full tree search
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: make test options module parameters
Will Deacon <will.deacon(a)arm.com>
signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
-------------
Diffstat:
Makefile | 4 +-
arch/arc/include/asm/io.h | 72 ++++++++++
arch/arm/mach-mmp/cputype.h | 6 +-
arch/arm/mm/cache-v7.S | 8 +-
arch/arm/mm/cache-v7m.S | 14 +-
arch/powerpc/kernel/msi.c | 7 +-
arch/x86/include/asm/qspinlock.h | 25 +++-
arch/x86/include/asm/qspinlock_paravirt.h | 3 +-
arch/x86/platform/efi/early_printk.c | 2 +-
drivers/ata/libata-core.c | 1 +
drivers/clk/mmp/clk.c | 2 +-
drivers/clk/mvebu/cp110-system-controller.c | 4 +-
drivers/gpu/drm/ast/ast_fb.c | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 7 +-
drivers/gpu/drm/msm/msm_atomic.c | 5 +
drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 -
drivers/i2c/busses/i2c-axxia.c | 40 ++++--
drivers/i2c/busses/i2c-scmi.c | 10 +-
drivers/ide/pmac.c | 1 +
drivers/infiniband/hw/hfi1/user_sdma.c | 28 ++--
drivers/infiniband/hw/hfi1/user_sdma.h | 7 +-
drivers/input/keyboard/omap4-keypad.c | 18 ++-
drivers/mmc/host/omap.c | 11 +-
drivers/net/bonding/bond_3ad.c | 3 +
drivers/net/dsa/mv88e6060.c | 10 +-
drivers/net/ethernet/freescale/fman/fman.c | 5 +-
drivers/net/ethernet/mellanox/mlx4/Kconfig | 2 +-
drivers/net/wireless/mac80211_hwsim.c | 12 +-
drivers/nvme/target/rdma.c | 3 +-
drivers/pinctrl/sunxi/pinctrl-sun8i-a83t.c | 2 +-
drivers/rtc/rtc-snvs.c | 104 ++++++++++-----
drivers/sbus/char/display7seg.c | 1 +
drivers/sbus/char/envctrl.c | 2 +
drivers/scsi/libiscsi.c | 4 +-
drivers/scsi/vmw_pvscsi.c | 4 +-
drivers/tty/serial/suncore.c | 1 +
drivers/vhost/vsock.c | 22 +++-
fs/aio.c | 2 +
fs/cifs/Kconfig | 2 +-
fs/nfs/direct.c | 9 +-
include/asm-generic/qspinlock_types.h | 32 ++++-
include/linux/compat.h | 3 +
kernel/bpf/verifier.c | 3 +
kernel/locking/qspinlock.c | 195 ++++++++++++++--------------
kernel/locking/qspinlock_paravirt.h | 42 ++----
kernel/signal.c | 17 ++-
kernel/time/timer_list.c | 2 +-
kernel/trace/ftrace.c | 1 +
kernel/trace/trace_events_trigger.c | 6 +-
lib/interval_tree_test.c | 93 ++++++++-----
lib/rbtree_test.c | 55 +++++---
net/mac80211/mlme.c | 3 +-
net/sunrpc/xprt.c | 11 +-
53 files changed, 604 insertions(+), 329 deletions(-)
The current implementation of elan_i2c is known to not support those
2 laptops.
A proper fix is to tweak both elantech and elan_i2c to transmit the
correct information from PS/2, which would make a bad candidate for
stable.
So to give us some time for fixing the root of the problem, disable
elan_i2c for the devices we know are not behaving properly.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1803600
Link: https://bugs.archlinux.org/task/59714
Fixes: df077237cf55 Input: elantech - detect new ICs and setup Host Notify for them
Cc: stable(a)vger.kernel.org # v4.18+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
drivers/input/mouse/elantech.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c
index 2d95e8d93cc7..830ae9f07045 100644
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1767,6 +1767,18 @@ static int elantech_smbus = IS_ENABLED(CONFIG_MOUSE_ELAN_I2C_SMBUS) ?
module_param_named(elantech_smbus, elantech_smbus, int, 0644);
MODULE_PARM_DESC(elantech_smbus, "Use a secondary bus for the Elantech device.");
+static const char * const i2c_blacklist_pnp_ids[] = {
+ /*
+ * these are known to not be working properly as bits are missing
+ * in elan_i2c
+ */
+ "LEN2131", /* ThinkPad P52 w/ NFC */
+ "LEN2132", /* ThinkPad P52 */
+ "LEN2133", /* ThinkPad P72 w/ NFC */
+ "LEN2134", /* ThinkPad P72 */
+ NULL
+};
+
static int elantech_create_smbus(struct psmouse *psmouse,
struct elantech_device_info *info,
bool leave_breadcrumbs)
@@ -1802,10 +1814,12 @@ static int elantech_setup_smbus(struct psmouse *psmouse,
if (elantech_smbus == ELANTECH_SMBUS_NOT_SET) {
/*
- * New ICs are enabled by default.
+ * New ICs are enabled by default, unless mentioned in
+ * i2c_blacklist_pnp_ids.
* Old ICs are up to the user to decide.
*/
- if (!ETP_NEW_IC_SMBUS_HOST_NOTIFY(info->fw_version))
+ if (!ETP_NEW_IC_SMBUS_HOST_NOTIFY(info->fw_version) ||
+ psmouse_matches_pnp_id(psmouse, i2c_blacklist_pnp_ids))
return -ENXIO;
}
--
2.19.2
This is a note to let you know that I've just added the patch titled
USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 8d503f206c336677954160ac62f0c7d9c219cd89 Mon Sep 17 00:00:00 2001
From: Scott Chen <scott(a)labau.com.tw>
Date: Thu, 13 Dec 2018 06:01:47 -0500
Subject: USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
Add device ids to pl2303 for the HP POS pole displays:
LM920: 03f0:026b
TD620: 03f0:0956
LD960TA: 03f0:4439
LD220TA: 03f0:4349
LM940: 03f0:5039
Signed-off-by: Scott Chen <scott(a)labau.com.tw>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/pl2303.c | 5 +++++
drivers/usb/serial/pl2303.h | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
index a4e0d13fc121..98e7a5df0f6d 100644
--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -91,9 +91,14 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(YCCABLE_VENDOR_ID, YCCABLE_PRODUCT_ID) },
{ USB_DEVICE(SUPERIAL_VENDOR_ID, SUPERIAL_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD220_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LD220TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD960_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LD960TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LCM220_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LCM960_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LM920_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_LM940_PRODUCT_ID) },
+ { USB_DEVICE(HP_VENDOR_ID, HP_TD620_PRODUCT_ID) },
{ USB_DEVICE(CRESSI_VENDOR_ID, CRESSI_EDY_PRODUCT_ID) },
{ USB_DEVICE(ZEAGLE_VENDOR_ID, ZEAGLE_N2ITION3_PRODUCT_ID) },
{ USB_DEVICE(SONY_VENDOR_ID, SONY_QN3USB_PRODUCT_ID) },
diff --git a/drivers/usb/serial/pl2303.h b/drivers/usb/serial/pl2303.h
index 26965cc23c17..4e2554d55362 100644
--- a/drivers/usb/serial/pl2303.h
+++ b/drivers/usb/serial/pl2303.h
@@ -119,10 +119,15 @@
/* Hewlett-Packard POS Pole Displays */
#define HP_VENDOR_ID 0x03f0
+#define HP_LM920_PRODUCT_ID 0x026b
+#define HP_TD620_PRODUCT_ID 0x0956
#define HP_LD960_PRODUCT_ID 0x0b39
#define HP_LCM220_PRODUCT_ID 0x3139
#define HP_LCM960_PRODUCT_ID 0x3239
#define HP_LD220_PRODUCT_ID 0x3524
+#define HP_LD220TA_PRODUCT_ID 0x4349
+#define HP_LD960TA_PRODUCT_ID 0x4439
+#define HP_LM940_PRODUCT_ID 0x5039
/* Cressi Edy (diving computer) PC interface */
#define CRESSI_VENDOR_ID 0x04b8
--
2.20.1
This is a note to let you know that I've just added the patch titled
staging: wilc1000: fix missing read_write setting when reading data
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c58eef061dda7d843dcc0ad6fea7e597d4c377c0 Mon Sep 17 00:00:00 2001
From: Colin Ian King <colin.king(a)canonical.com>
Date: Wed, 19 Dec 2018 16:30:07 +0000
Subject: staging: wilc1000: fix missing read_write setting when reading data
Currently the cmd.read_write setting is not initialized so it contains
garbage from the stack. Fix this by setting it to 0 to indicate a
read is required.
Detected by CoverityScan, CID#1357925 ("Uninitialized scalar variable")
Fixes: c5c77ba18ea6 ("staging: wilc1000: Add SDIO/SPI 802.11 driver")
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Ajay Singh <ajay.kathat(a)microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wilc1000/wilc_sdio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/wilc1000/wilc_sdio.c b/drivers/staging/wilc1000/wilc_sdio.c
index 27fdfbdda5c0..e2f739fef21c 100644
--- a/drivers/staging/wilc1000/wilc_sdio.c
+++ b/drivers/staging/wilc1000/wilc_sdio.c
@@ -861,6 +861,7 @@ static int sdio_read_int(struct wilc *wilc, u32 *int_status)
if (!sdio_priv->irq_gpio) {
int i;
+ cmd.read_write = 0;
cmd.function = 1;
cmd.address = 0x04;
cmd.data = 0;
--
2.20.1
This is the start of the stable review cycle for the 4.4.169 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Dec 22 08:58:16 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.169-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.169-rc1
Dan Carpenter <dan.carpenter(a)oracle.com>
ALSA: isa/wavefront: prevent some out of bound writes
Trent Piepho <tpiepho(a)impinj.com>
rtc: snvs: Add timeouts to avoid kernel lockups
Guy Shapiro <guy.shapiro(a)mobi-wize.com>
rtc: snvs: add a missing write sync
Hans de Goede <hdegoede(a)redhat.com>
i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
Adamski, Krzysztof (Nokia - PL/Wroclaw) <krzysztof.adamski(a)nokia.com>
i2c: axxia: properly handle master timeout
Steve French <stfrench(a)microsoft.com>
cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Chris Cole <chris(a)sageembedded.com>
ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
Anderson Luiz Alves <alacn1(a)gmail.com>
mv88e6060: disable hardware level MAC learning
Juha-Matti Tilli <juha-matti.tilli(a)iki.fi>
libata: whitelist all SAMSUNG MZ7KM* solid-state disks
Tony Lindgren <tony(a)atomide.com>
Input: omap-keypad - fix keyboard debounce configuration
Dan Carpenter <dan.carpenter(a)oracle.com>
clk: mmp: Off by one in mmp_clk_add()
Yangtao Li <tiny.windzz(a)gmail.com>
ide: pmac: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/tty: add missing of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/sbus/char: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
sbus: char: add of_node_put()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
SUNRPC: Fix a potential race in xprt_connect()
Toni Peltonen <peltzi(a)peltzi.fi>
bonding: fix 802.3ad state sent to partner when unbinding slave
Jose Abreu <joabreu(a)synopsys.com>
ARC: io.h: Implement reads{x}()/writes{x}()
Sean Paul <seanpaul(a)chromium.org>
drm/msm: Grab a vblank reference when waiting for commit_done
YiFei Zhu <zhuyifei1999(a)gmail.com>
x86/earlyprintk/efi: Fix infinite loop on some screen widths
Cathy Avery <cavery(a)redhat.com>
scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload
Fred Herard <fred.herard(a)oracle.com>
scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
Alexey Khoroshilov <khoroshilov(a)ispras.ru>
mac80211_hwsim: fix module init error paths for netlink
Ilan Peer <ilan.peer(a)intel.com>
mac80211: Fix condition validating WMM IE
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
mac80211: don't WARN on bad WMM parameters from buggy APs
Yunlei He <heyunlei(a)huawei.com>
f2fs: fix a panic caused by NULL flush_cmd_control
Brian Norris <briannorris(a)chromium.org>
Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec"
Radu Rendec <radu.rendec(a)gmail.com>
powerpc/msi: Fix NULL pointer access in teardown code
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak of instance function hash filters
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak in set_trigger_filter()
Aaro Koskinen <aaro.koskinen(a)iki.fi>
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Jeff Moyer <jmoyer(a)redhat.com>
aio: fix spectre gadget in lookup_ioctx
Chen-Yu Tsai <wens(a)csie.org>
pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11
Guenter Roeck <linux(a)roeck-us.net>
powerpc/boot: Fix random libfdt related build errors
Ingo Molnar <mingo(a)kernel.org>
timer/debug: Change /proc/timer_list from 0444 to 0400
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow users to limit scope of endpoint
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree-test: lower default params
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree_test.c: make input module parameters
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow full tree search
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: make test options module parameters
-------------
Diffstat:
Makefile | 4 +-
arch/arc/include/asm/io.h | 72 +++++++++++++++++++
arch/arm/mm/cache-v7.S | 8 ++-
arch/powerpc/boot/Makefile | 3 +-
arch/powerpc/kernel/msi.c | 7 +-
arch/x86/platform/efi/early_printk.c | 2 +-
drivers/ata/libata-core.c | 1 +
drivers/clk/mmp/clk.c | 2 +-
drivers/gpu/drm/msm/msm_atomic.c | 5 ++
drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 --
drivers/i2c/busses/i2c-axxia.c | 40 ++++++++---
drivers/i2c/busses/i2c-scmi.c | 10 ++-
drivers/ide/pmac.c | 1 +
drivers/input/keyboard/omap4-keypad.c | 18 +++--
drivers/mmc/host/omap.c | 11 ++-
drivers/net/bonding/bond_3ad.c | 3 +
drivers/net/dsa/mv88e6060.c | 10 +--
drivers/net/wireless/mac80211_hwsim.c | 12 ++--
drivers/pinctrl/sunxi/pinctrl-sun8i-a83t.c | 2 +-
drivers/rtc/rtc-snvs.c | 104 +++++++++++++++++++---------
drivers/sbus/char/display7seg.c | 1 +
drivers/sbus/char/envctrl.c | 2 +
drivers/scsi/libiscsi.c | 4 +-
drivers/scsi/vmw_pvscsi.c | 4 +-
drivers/tty/serial/suncore.c | 1 +
fs/aio.c | 2 +
fs/cifs/Kconfig | 2 +-
fs/f2fs/segment.c | 5 +-
kernel/time/timer_list.c | 2 +-
kernel/trace/ftrace.c | 1 +
kernel/trace/trace_events_trigger.c | 6 +-
lib/interval_tree_test.c | 93 ++++++++++++++++---------
lib/rbtree_test.c | 55 +++++++++------
net/mac80211/mlme.c | 3 +-
net/sunrpc/xprt.c | 11 ++-
sound/isa/wavefront/wavefront_synth.c | 9 +++
36 files changed, 376 insertions(+), 146 deletions(-)
This is the start of the stable review cycle for the 3.18.131 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Dec 22 08:57:30 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.131-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.131-rc1
Lior David <qca_liord(a)qca.qualcomm.com>
wil6210: missing length check in wmi_set_ie
Kees Cook <keescook(a)chromium.org>
swiotlb: clean up reporting
Jens Axboe <axboe(a)kernel.dk>
sr: pass down correctly sized SCSI sense buffer
Thomas Gleixner <tglx(a)linutronix.de>
posix-timers: Sanitize overrun handling
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command
Dan Carpenter <dan.carpenter(a)oracle.com>
ALSA: isa/wavefront: prevent some out of bound writes
Hans de Goede <hdegoede(a)redhat.com>
i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
Steve French <stfrench(a)microsoft.com>
cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
Chris Cole <chris(a)sageembedded.com>
ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
Juha-Matti Tilli <juha-matti.tilli(a)iki.fi>
libata: whitelist all SAMSUNG MZ7KM* solid-state disks
Tony Lindgren <tony(a)atomide.com>
Input: omap-keypad - fix keyboard debounce configuration
Yangtao Li <tiny.windzz(a)gmail.com>
ide: pmac: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/tty: add missing of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
drivers/sbus/char: add of_node_put()
Yangtao Li <tiny.windzz(a)gmail.com>
sbus: char: add of_node_put()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
SUNRPC: Fix a potential race in xprt_connect()
Toni Peltonen <peltzi(a)peltzi.fi>
bonding: fix 802.3ad state sent to partner when unbinding slave
YiFei Zhu <zhuyifei1999(a)gmail.com>
x86/earlyprintk/efi: Fix infinite loop on some screen widths
Cathy Avery <cavery(a)redhat.com>
scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload
Fred Herard <fred.herard(a)oracle.com>
scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
powerpc: Look for "stdout-path" when setting up legacy consoles
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak of instance function hash filters
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak in set_trigger_filter()
Aaro Koskinen <aaro.koskinen(a)iki.fi>
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Guenter Roeck <linux(a)roeck-us.net>
powerpc/boot: Fix random libfdt related build errors
Ingo Molnar <mingo(a)kernel.org>
timer/debug: Change /proc/timer_list from 0444 to 0400
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow users to limit scope of endpoint
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree-test: lower default params
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rbtree_test.c: make input module parameters
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: allow full tree search
Davidlohr Bueso <dave(a)stgolabs.net>
lib/interval_tree_test.c: make test options module parameters
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mm/cache-v7.S | 8 +--
arch/powerpc/boot/Makefile | 3 +-
arch/powerpc/kernel/legacy_serial.c | 6 ++-
arch/x86/platform/efi/early_printk.c | 2 +-
drivers/ata/libata-core.c | 1 +
drivers/i2c/busses/i2c-scmi.c | 10 ++--
drivers/ide/pmac.c | 1 +
drivers/input/keyboard/omap4-keypad.c | 18 +++++--
drivers/mmc/host/omap.c | 11 +++-
drivers/net/bonding/bond_3ad.c | 3 ++
drivers/net/wireless/ath/wil6210/wmi.c | 7 ++-
drivers/sbus/char/display7seg.c | 1 +
drivers/sbus/char/envctrl.c | 2 +
drivers/scsi/libiscsi.c | 4 +-
drivers/scsi/sr_ioctl.c | 21 +++-----
drivers/scsi/vmw_pvscsi.c | 4 +-
drivers/tty/serial/suncore.c | 1 +
fs/cifs/Kconfig | 2 +-
include/linux/posix-timers.h | 4 +-
include/sound/pcm.h | 2 +-
kernel/time/posix-cpu-timers.c | 2 +-
kernel/time/posix-timers.c | 29 +++++++----
kernel/time/timer_list.c | 2 +-
kernel/trace/ftrace.c | 1 +
kernel/trace/trace_events_trigger.c | 6 ++-
lib/interval_tree_test.c | 93 ++++++++++++++++++++++------------
lib/rbtree_test.c | 55 ++++++++++++--------
lib/swiotlb.c | 18 +++----
net/sunrpc/xprt.c | 11 +++-
sound/core/pcm_lib.c | 2 -
sound/core/pcm_native.c | 6 +--
sound/isa/wavefront/wavefront_synth.c | 9 ++++
33 files changed, 224 insertions(+), 125 deletions(-)
AppArmor recently added the ability for profiles to match extended
attributes, with the intent of targeting "security.ima" and
"security.evm" to differentiate between sign and unsigned files.
The current implementation uses a path glob to match the extended
attribute value. To require the presence of a extended attribute,
profiles supply a wildcard:
# Match any file with the "security.apparmor" attribute
profile test /** xattrs=(security.apparmor="**") {
# ...
}
However, the glob matching implementation is intended for file paths and
doesn't handle null characters correctly. It's currently impossible to
write a profile that targets IMA and EVM attributes, since the
signatures can contain a null byte.
Add the ability for AppArmor to check the presence of an extended
attribute, and not its value. This fixes the profile matching allowing
profiles conditional on EVM and IMA signatures:
profile signed_binary /** xattrs=(security.evm security.ima) {
# ...
}
A modified apparmor_parser and associated regression tests to exercise
this fix can be found at:
https://gitlab.com/ericchiang/apparmor/commits/parser-xattrs-keys
Signed-off-by: Eric Chiang <ericchiang(a)google.com>
CC: stable(a)vger.kernel.org
---
security/apparmor/apparmorfs.c | 1 +
security/apparmor/domain.c | 25 +++++++++++++++++++++----
security/apparmor/include/policy.h | 6 ++++++
security/apparmor/policy.c | 3 +++
security/apparmor/policy_unpack.c | 18 ++++++++++++++++++
5 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 8963203319ea..03d9b7f8a2fb 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2212,6 +2212,7 @@ static struct aa_sfs_entry aa_sfs_entry_signal[] = {
static struct aa_sfs_entry aa_sfs_entry_attach[] = {
AA_SFS_FILE_BOOLEAN("xattr", 1),
+ AA_SFS_FILE_BOOLEAN("xattr_key", 1),
{ }
};
static struct aa_sfs_entry aa_sfs_entry_domain[] = {
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 08c88de0ffda..9f223756b416 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -317,16 +317,33 @@ static int aa_xattrs_match(const struct linux_binprm *bprm,
ssize_t size;
struct dentry *d;
char *value = NULL;
- int value_size = 0, ret = profile->xattr_count;
+ int value_size = 0;
+ int ret = profile->xattr_count + profile->xattr_keys_count;
- if (!bprm || !profile->xattr_count)
+ if (!bprm)
return 0;
+ d = bprm->file->f_path.dentry;
+
+ if (profile->xattr_keys_count) {
+ /* validate that these attributes are present, ignore values */
+ for (i = 0; i < profile->xattr_keys_count; i++) {
+ size = vfs_getxattr_alloc(d, profile->xattr_keys[i],
+ &value, value_size,
+ GFP_KERNEL);
+ if (size < 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+ }
+
+ if (!profile->xattr_count)
+ goto out;
+
/* transition from exec match to xattr set */
state = aa_dfa_null_transition(profile->xmatch, state);
- d = bprm->file->f_path.dentry;
-
for (i = 0; i < profile->xattr_count; i++) {
size = vfs_getxattr_alloc(d, profile->xattrs[i], &value,
value_size, GFP_KERNEL);
diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h
index 8e6707c837be..8ed1d30de7ce 100644
--- a/security/apparmor/include/policy.h
+++ b/security/apparmor/include/policy.h
@@ -112,6 +112,10 @@ struct aa_data {
* @policy: general match rules governing policy
* @file: The set of rules governing basic file access and domain transitions
* @caps: capabilities for the profile
+ * @xattr_count: number of xattrs values
+ * @xattrs: extended attributes whose values must match the xmatch
+ * @xattr_keys_count: number of xattr keys values
+ * @xattr_keys: extended attributes that must be present to match the profile
* @rlimits: rlimits for the profile
*
* @dents: dentries for the profiles file entries in apparmorfs
@@ -152,6 +156,8 @@ struct aa_profile {
int xattr_count;
char **xattrs;
+ int xattr_keys_count;
+ char **xattr_keys;
struct aa_rlimit rlimits;
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index df9c5890a878..e0f9cf8b8318 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -231,6 +231,9 @@ void aa_free_profile(struct aa_profile *profile)
for (i = 0; i < profile->xattr_count; i++)
kzfree(profile->xattrs[i]);
kzfree(profile->xattrs);
+ for (i = 0; i < profile->xattr_keys_count; i++)
+ kzfree(profile->xattr_keys[i]);
+ kzfree(profile->xattr_keys);
for (i = 0; i < profile->secmark_count; i++)
kzfree(profile->secmark[i].label);
kzfree(profile->secmark);
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index 379682e2a8d5..d1fd75093260 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -535,6 +535,24 @@ static bool unpack_xattrs(struct aa_ext *e, struct aa_profile *profile)
goto fail;
}
+ if (unpack_nameX(e, AA_STRUCT, "xattr_keys")) {
+ int i, size;
+
+ size = unpack_array(e, NULL);
+ profile->xattr_keys_count = size;
+ profile->xattr_keys = kcalloc(size, sizeof(char *), GFP_KERNEL);
+ if (!profile->xattr_keys)
+ goto fail;
+ for (i = 0; i < size; i++) {
+ if (!unpack_strdup(e, &profile->xattr_keys[i], NULL))
+ goto fail;
+ }
+ if (!unpack_nameX(e, AA_ARRAYEND, NULL))
+ goto fail;
+ if (!unpack_nameX(e, AA_STRUCTEND, NULL))
+ goto fail;
+ }
+
return 1;
fail:
--
2.20.1.415.g653613c723-goog
The patch titled
Subject: hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
has been added to the -mm tree. Its filename is
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-use-i_mmap_rwsem-to-fix-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-use-i_mmap_rwsem-to-fix-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.
Instead of writing the required complicated code for this rare
occurrence, just eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem
longer in truncation and hold punch code to cover the call to
remove_inode_hugepages.
With this modification, code in remove_inode_hugepages checking for
races becomes 'dead' as it can not longer happen. Remove the dead code
and expand comments to explain reasoning. Similarly, checks for races
with truncation in the page fault path can be simplified and removed.
Link: http://lkml.kernel.org/r/20181218223557.5202-3-mike.kravetz@oracle.com
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 50 +++++++++++++----------------------------
mm/hugetlb.c | 21 ++++++++---------
2 files changed, 27 insertions(+), 44 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race
+++ a/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cac
* truncation is indicated by end of range being LLONG_MAX
* In this case, we first scan the range and release found pages.
* After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- * maps and global counts. Page faults can not race with truncation
- * in this routine. hugetlb_no_page() prevents page faults in the
- * truncated range. It checks i_size before allocation, and again after
- * with the page table lock for the page held. The same lock must be
- * acquired to unmap a page.
+ * maps and global counts.
* hole punch is indicated if end is not LLONG_MAX
* In the hole punch case we scan the range and release found pages.
* Only when releasing a page is the associated region/reserv map
* deleted. The region/reserv map for ranges without associated
- * pages are not modified. Page faults can race with hole punch.
- * This is indicated if we find a mapped page.
+ * pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
* Note: If the passed end of range value is beyond the end of file, but
* not LLONG_MAX this routine still performs a hole punch operation.
*/
@@ -423,32 +422,14 @@ static void remove_inode_hugepages(struc
for (i = 0; i < pagevec_count(&pvec); ++i) {
struct page *page = pvec.pages[i];
- u32 hash;
index = page->index;
- hash = hugetlb_fault_mutex_hash(h, current->mm,
- &pseudo_vma,
- mapping, index, 0);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
/*
- * If page is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) now after taking
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the page.
- *
- * This race can only happen in the hole punch case.
- * Getting here in a truncate operation is a bug.
+ * A mapped page is impossible as callers should unmap
+ * all references before calling. And, i_mmap_rwsem
+ * prevents the creation of additional mappings.
*/
- if (unlikely(page_mapped(page))) {
- BUG_ON(truncate_op);
-
- i_mmap_lock_write(mapping);
- hugetlb_vmdelete_list(&mapping->i_mmap,
- index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
- i_mmap_unlock_write(mapping);
- }
+ VM_BUG_ON(page_mapped(page));
lock_page(page);
/*
@@ -470,7 +451,6 @@ static void remove_inode_hugepages(struc
}
unlock_page(page);
- mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
huge_pagevec_release(&pvec);
cond_resched();
@@ -505,8 +485,8 @@ static int hugetlb_vmtruncate(struct ino
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
return 0;
}
@@ -540,8 +520,8 @@ static long hugetlbfs_punch_hole(struct
hugetlb_vmdelete_list(&mapping->i_mmap,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
+ i_mmap_unlock_write(mapping);
inode_unlock(inode);
}
@@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct f
/* addr is the offset within the file (zero based) */
addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */
+ /*
+ * fault mutex taken here, protects against fault path
+ * and hole punch. inode_lock previously taken protects
+ * against truncation.
+ */
hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
index, addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
--- a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race
+++ a/mm/hugetlb.c
@@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct
}
/*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
+ * We can not race with truncation due to holding i_mmap_rwsem.
+ * Check once here for faults beyond end of file.
*/
+ size = i_size_read(mapping->host) >> huge_page_shift(h);
+ if (idx >= size)
+ goto out;
+
retry:
page = find_lock_page(mapping, idx);
if (!page) {
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto out;
-
/*
* Check for page in userfault range
*/
@@ -3859,9 +3859,6 @@ retry:
}
ptl = huge_pte_lock(h, mm, ptep);
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto backout;
ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struc
/*
* Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
- * until finished with ptep. This prevents huge_pmd_unshare from
- * being called elsewhere and making the ptep no longer valid.
+ * until finished with ptep. This serves two purposes:
+ * 1) It prevents huge_pmd_unshare from being called elsewhere
+ * and making the ptep no longer valid.
+ * 2) It synchronizes us with file truncation.
*
* ptep could have already be assigned via huge_pte_offset. That
* is OK, as huge_pte_alloc will return the same value unless
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
The patch titled
Subject: hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
has been added to the -mm tree. Its filename is
hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-use-i_mmap_rwsem-for-mor…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-use-i_mmap_rwsem-for-mor…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
While looking at BUGs associated with invalid huge page map counts, it was
discovered and observed that a huge pte pointer could become 'invalid' and
point to another task's page table. Consider the following:
A task takes a page fault on a shared hugetlbfs file and calls
huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
shared pmd.
Now, another task truncates the hugetlbfs file. As part of truncation, it
unmaps everyone who has the file mapped. If the range being truncated is
covered by a shared pmd, huge_pmd_unshare will be called. For all but the
last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
to the pmd. If the task in the middle of the page fault is not the last
user, the ptep returned by huge_pte_alloc now points to another task's
page table or worse. This leads to bad things such as incorrect page
map/reference counts or invalid memory references.
To fix, expand the use of i_mmap_rwsem as follows:
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
huge_pmd_share is only called via huge_pte_alloc, so callers of
huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Link: http://lkml.kernel.org/r/20181218223557.5202-2-mike.kravetz@oracle.com
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++----------
mm/memory-failure.c | 14 +++++++-
mm/migrate.c | 13 +++++++
mm/rmap.c | 3 +
mm/userfaultfd.c | 11 +++++-
5 files changed, 91 insertions(+), 20 deletions(-)
--- a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/hugetlb.c
@@ -3238,6 +3238,7 @@ int copy_hugetlb_page_range(struct mm_st
struct page *ptepage;
unsigned long addr;
int cow;
+ struct address_space *mapping = vma->vm_file->f_mapping;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
struct mmu_notifier_range range;
@@ -3253,11 +3254,23 @@ int copy_hugetlb_page_range(struct mm_st
for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
spinlock_t *src_ptl, *dst_ptl;
+
src_pte = huge_pte_offset(src, addr, sz);
if (!src_pte)
continue;
+
+ /*
+ * i_mmap_rwsem must be held to call huge_pte_alloc.
+ * Continue to hold until finished with dst_pte, otherwise
+ * it could go away if part of a shared pmd.
+ *
+ * Technically, i_mmap_rwsem is only needed in the non-cow
+ * case as cow mappings are not shared.
+ */
+ i_mmap_lock_read(mapping);
dst_pte = huge_pte_alloc(dst, addr, sz);
if (!dst_pte) {
+ i_mmap_unlock_read(mapping);
ret = -ENOMEM;
break;
}
@@ -3272,8 +3285,10 @@ int copy_hugetlb_page_range(struct mm_st
* after taking the lock below.
*/
dst_entry = huge_ptep_get(dst_pte);
- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+ if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+ i_mmap_unlock_read(mapping);
continue;
+ }
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -3322,6 +3337,8 @@ int copy_hugetlb_page_range(struct mm_st
}
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
+
+ i_mmap_unlock_read(mapping);
}
if (cow)
@@ -3772,14 +3789,18 @@ retry:
};
/*
- * hugetlb_fault_mutex must be dropped before
- * handling userfault. Reacquire after handling
- * fault to make calling code simpler.
+ * hugetlb_fault_mutex and i_mmap_rwsem must be
+ * dropped before handling userfault. Reacquire
+ * after handling fault to make calling code simpler.
*/
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping,
idx, haddr);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+
ret = handle_userfault(&vmf, VM_UFFD_MISSING);
+
+ i_mmap_lock_read(mapping);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
goto out;
}
@@ -3927,6 +3948,11 @@ vm_fault_t hugetlb_fault(struct mm_struc
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
+ /*
+ * Since we hold no locks, ptep could be stale. That is
+ * OK as we are only making decisions based on content and
+ * not actually modifying content here.
+ */
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
migration_entry_wait_huge(vma, mm, ptep);
@@ -3934,20 +3960,31 @@ vm_fault_t hugetlb_fault(struct mm_struc
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- } else {
- ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
- if (!ptep)
- return VM_FAULT_OOM;
}
+ /*
+ * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
+ * until finished with ptep. This prevents huge_pmd_unshare from
+ * being called elsewhere and making the ptep no longer valid.
+ *
+ * ptep could have already be assigned via huge_pte_offset. That
+ * is OK, as huge_pte_alloc will return the same value unless
+ * something changed.
+ */
mapping = vma->vm_file->f_mapping;
- idx = vma_hugecache_offset(h, vma, haddr);
+ i_mmap_lock_read(mapping);
+ ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
+ if (!ptep) {
+ i_mmap_unlock_read(mapping);
+ return VM_FAULT_OOM;
+ }
/*
* Serialize hugepage allocation and instantiation, so that we don't
* get spurious allocation failures if two CPUs race to instantiate
* the same page in the page cache.
*/
+ idx = vma_hugecache_offset(h, vma, haddr);
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -4035,6 +4072,7 @@ out_ptl:
}
out_mutex:
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
/*
* Generally it's safe to hold refcount during waiting page lock. But
* here we just wait to defer the next page fault to avoid busy loop and
@@ -4640,10 +4678,12 @@ void adjust_range_if_pmd_sharing_possibl
* Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc()
* and returns the corresponding pte. While this is not necessary for the
* !shared pmd case because we can allocate the pmd later as well, it makes the
- * code much cleaner. pmd allocation is essential for the shared case because
- * pud has to be populated inside the same i_mmap_rwsem section - otherwise
- * racing tasks could either miss the sharing (see huge_pte_offset) or select a
- * bad pmd for sharing.
+ * code much cleaner.
+ *
+ * This routine must be called with i_mmap_rwsem held in at least read mode.
+ * For hugetlbfs, this prevents removal of any page table entries associated
+ * with the address space. This is important as we are setting up sharing
+ * based on existing page table entries (mappings).
*/
pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
{
@@ -4660,7 +4700,6 @@ pte_t *huge_pmd_share(struct mm_struct *
if (!vma_shareable(vma, addr))
return (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_lock_write(mapping);
vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -4690,7 +4729,6 @@ pte_t *huge_pmd_share(struct mm_struct *
spin_unlock(ptl);
out:
pte = (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_unlock_write(mapping);
return pte;
}
@@ -4701,7 +4739,7 @@ out:
* indicated by page_count > 1, unmap is achieved by clearing pud and
* decrementing the ref count. If count == 1, the pte page is not shared.
*
- * called with page table lock held.
+ * Called with page table lock held and i_mmap_rwsem held in write mode.
*
* returns: 1 successfully unmapped a shared pte page
* 0 the underlying pte page is not shared, or it is the last user
--- a/mm/memory-failure.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/memory-failure.c
@@ -1028,7 +1028,19 @@ static bool hwpoison_user_mappings(struc
if (kill)
collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
- unmap_success = try_to_unmap(hpage, ttu);
+ if (!PageHuge(hpage)) {
+ unmap_success = try_to_unmap(hpage, ttu);
+ } else {
+ /*
+ * For hugetlb pages, try_to_unmap could potentially call
+ * huge_pmd_unshare. Because of this, take semaphore in
+ * write mode here and set TTU_RMAP_LOCKED to indicate we
+ * have taken the lock at this higer level.
+ */
+ i_mmap_lock_write(mapping);
+ unmap_success = try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
+ }
if (!unmap_success)
pr_err("Memory failure: %#lx: failed to unmap page (mapcount=%d)\n",
pfn, page_mapcount(hpage));
--- a/mm/migrate.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/migrate.c
@@ -1324,8 +1324,19 @@ static int unmap_and_move_huge_page(new_
goto put_anon;
if (page_mapped(hpage)) {
+ struct address_space *mapping = page_mapping(hpage);
+
+ /*
+ * try_to_unmap could potentially call huge_pmd_unshare.
+ * Because of this, take semaphore in write mode here and
+ * set TTU_RMAP_LOCKED to let lower levels know we have
+ * taken the lock.
+ */
+ i_mmap_lock_write(mapping);
try_to_unmap(hpage,
- TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+ TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
+ TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
page_was_mapped = 1;
}
--- a/mm/rmap.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/rmap.c
@@ -1380,6 +1380,9 @@ static bool try_to_unmap_one(struct page
/*
* If sharing is possible, start and end will be adjusted
* accordingly.
+ *
+ * If called for a huge page, caller must hold i_mmap_rwsem
+ * in write mode as it is possible to call huge_pmd_unshare.
*/
adjust_range_if_pmd_sharing_possible(vma, &range.start,
&range.end);
--- a/mm/userfaultfd.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/userfaultfd.c
@@ -267,10 +267,14 @@ retry:
VM_BUG_ON(dst_addr & ~huge_page_mask(h));
/*
- * Serialize via hugetlb_fault_mutex
+ * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
+ * i_mmap_rwsem ensures the dst_pte remains valid even
+ * in the case of shared pmds. fault mutex prevents
+ * races with other faulting threads.
*/
- idx = linear_page_index(dst_vma, dst_addr);
mapping = dst_vma->vm_file->f_mapping;
+ i_mmap_lock_read(mapping);
+ idx = linear_page_index(dst_vma, dst_addr);
hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping,
idx, dst_addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -279,6 +283,7 @@ retry:
dst_pte = huge_pte_alloc(dst_mm, dst_addr, huge_page_size(h));
if (!dst_pte) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -286,6 +291,7 @@ retry:
dst_pteval = huge_ptep_get(dst_pte);
if (!huge_pte_none(dst_pteval)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -293,6 +299,7 @@ retry:
dst_addr, src_addr, &page);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
vm_alloc_shared = vm_shared;
cond_resched();
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
This is a note to let you know that I've just added the patch titled
staging: wilc1000: fix missing read_write setting when reading data
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c58eef061dda7d843dcc0ad6fea7e597d4c377c0 Mon Sep 17 00:00:00 2001
From: Colin Ian King <colin.king(a)canonical.com>
Date: Wed, 19 Dec 2018 16:30:07 +0000
Subject: staging: wilc1000: fix missing read_write setting when reading data
Currently the cmd.read_write setting is not initialized so it contains
garbage from the stack. Fix this by setting it to 0 to indicate a
read is required.
Detected by CoverityScan, CID#1357925 ("Uninitialized scalar variable")
Fixes: c5c77ba18ea6 ("staging: wilc1000: Add SDIO/SPI 802.11 driver")
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Ajay Singh <ajay.kathat(a)microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wilc1000/wilc_sdio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/wilc1000/wilc_sdio.c b/drivers/staging/wilc1000/wilc_sdio.c
index 27fdfbdda5c0..e2f739fef21c 100644
--- a/drivers/staging/wilc1000/wilc_sdio.c
+++ b/drivers/staging/wilc1000/wilc_sdio.c
@@ -861,6 +861,7 @@ static int sdio_read_int(struct wilc *wilc, u32 *int_status)
if (!sdio_priv->irq_gpio) {
int i;
+ cmd.read_write = 0;
cmd.function = 1;
cmd.address = 0x04;
cmd.data = 0;
--
2.20.1
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail. This change only affects
subsequent connects if the reconnected device happens to get the same minor
number.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Cc: stable(a)vger.kernel.org
---
Changes for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
Changes for v3:
- Replace quirks testing from init_termios to tty->termios.
- Remove parenthesis for ECHO flag.
Changes for v4:
- Drop quirks varible to simplify the patch.
- Move termios operation right after the driver_data has been installed.
- Write general style comment for suppressing initial echoing.
Changes for v5:
- Fix: termios operation right abover the driver_data has been installed.
- Update commit comment about this patch affects the reconnected device
which get the same minor numbers.
Changes for v6:
- Update VID/PID:0x0e8d/0x0003 as Mediatek Inc BROM.
- Update VID/PID:0x0e8d/0x2000 as Mediatek Inc Preloader.
Changes for v7:
- Keep VID/PID:0x0e8d/0x0003 unchanged because of 2 different UNION
descriptor implementated in Mediatek Inc BROM (MT6589/MT6765).
drivers/usb/class/cdc-acm.c | 10 ++++++++++
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..ed8c62b 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -581,6 +581,13 @@ static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
if (retval)
goto error_init_termios;
+ /*
+ * Suppress initial echoing for some devices which might send data
+ * immediately after acm driver has been installed.
+ */
+ if (acm->quirks & DISABLE_ECHO)
+ tty->termios.c_lflag &= ~ECHO;
+
tty->driver_data = acm;
return 0;
@@ -1657,6 +1664,9 @@ static int acm_pre_reset(struct usb_interface *intf)
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
+ { USB_DEVICE(0x0e8d, 0x2000), /* MediaTek Inc Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail. This change only affects
subsequent connects if the reconnected device happens to get the same minor
number.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Cc: stable(a)vger.kernel.org
---
Changes for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
Changes for v3:
- Replace quirks testing from init_termios to tty->termios.
- Remove parenthesis for ECHO flag.
Changes for v4:
- Drop quirks varible to simplify the patch.
- Move termios operation right after the driver_data has been installed.
- Write general style comment for suppressing initial echoing.
Changes for v5:
- Fix: termios operation right abover the driver_data has been installed.
- Update commit comment about this patch affects the reconnected device
which get the same minor numbers.
drivers/usb/class/cdc-acm.c | 12 +++++++++++-
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..336cf13 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -581,6 +581,13 @@ static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
if (retval)
goto error_init_termios;
+ /*
+ * Suppress initial echoing for some devices which might send data
+ * immediately after acm driver has been installed.
+ */
+ if (acm->quirks & DISABLE_ECHO)
+ tty->termios.c_lflag &= ~ECHO;
+
tty->driver_data = acm;
return 0;
@@ -1655,7 +1662,10 @@ static int acm_pre_reset(struct usb_interface *intf)
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
- .driver_info = NO_UNION_NORMAL, /* has no union descriptor */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
+ { USB_DEVICE(0x0e8d, 0x2000), /* FIREFLY, MediaTek Inc; Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
},
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
commit 0ae976a11b4fb5704b597e103b5189237641c1a1 upstream.
This is one line hw feature backport from 0ae976a11b4f ("mt76x0: init
hw capabilities"), which add also other different features, however
those are not supported in 4.19.
802.11w is supported by mac80211 and mt76x0u driver in 4.19 correctly
fall-back to software encryption when 802.11w ciphers are used.
Without the patch we fail to associate with WPA3 APs, so this is
considered as fix.
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Felix Fietkau <nbd(a)nbd.name>
[remove marking non-working features on 4.19, make topic correspond the change]
Signed-off-by: Stanislaw Gruszka <sgruszka(a)redhat.com>
---
drivers/net/wireless/mediatek/mt76/mt76x0/init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/init.c b/drivers/net/wireless/mediatek/mt76/mt76x0/init.c
index 7cdb3e740522..0a3e046d78db 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/init.c
@@ -681,6 +681,7 @@ int mt76x0_register_device(struct mt76x0_dev *dev)
ieee80211_hw_set(hw, SUPPORTS_HT_CCK_RATES);
ieee80211_hw_set(hw, AMPDU_AGGREGATION);
ieee80211_hw_set(hw, SUPPORTS_RC_TABLE);
+ ieee80211_hw_set(hw, MFP_CAPABLE);
hw->max_rates = 1;
hw->max_report_rates = 7;
hw->max_rate_tries = 1;
--
2.7.5
Hi,
I didn’t know if you had received my email from last week?
Can you direct me to the person that handles your company marketing and
promo items?
Do you have any upcoming events, tradeshows or promotional needs?
We manufacture ALL custom LOGO and branded products.
The most asked about product that we make, is the custom printed USB flash
drives.
We can print your logo on them and load your digital images, videos and
files.
Here is what we include:
-Any size memory you need: 64MB up to 128GB
-We will print your logo on both sides, just ask!
-Very Low Order Minimums
-Need them quickly? Not a problem, we offer Rush Service
We can make a custom shaped USB drive to look like your Logo or product!
Email over a copy of your logo and we will create a design mock up for you
at no cost!
Our higher memory sizes are a really good option right now.
Ask about the “Double Your Memory” upgrade promotion going on right
now!
Let us know what you need and we will get you a quick quote.
We always offer great rates for schools and nonprofits as well.
Regards,
Lilly Koller
Logo USB Account Manager
Hi,
I didn’t know if you had received my email from last week?
Can you direct me to the person that handles your company marketing and
promo items?
Do you have any upcoming events, tradeshows or promotional needs?
We manufacture ALL custom LOGO and branded products.
The most asked about product that we make, is the custom printed USB flash
drives.
We can print your logo on them and load your digital images, videos and
files.
Here is what we include:
-Any size memory you need: 64MB up to 128GB
-We will print your logo on both sides, just ask!
-Very Low Order Minimums
-Need them quickly? Not a problem, we offer Rush Service
We can make a custom shaped USB drive to look like your Logo or product!
Email over a copy of your logo and we will create a design mock up for you
at no cost!
Our higher memory sizes are a really good option right now.
Ask about the “Double Your Memory” upgrade promotion going on right
now!
Let us know what you need and we will get you a quick quote.
We always offer great rates for schools and nonprofits as well.
Regards,
Lilly Koller
Logo USB Account Manager
This is a note to let you know that I've just added the patch titled
stm class: Fix a module refcount leak in policy creation error path
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c18614a1a11276837bdd44403d84d207c9951538 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Date: Wed, 19 Dec 2018 17:19:20 +0200
Subject: stm class: Fix a module refcount leak in policy creation error path
Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
adds a bug into the error path of policy creation, that would do a
module_put() on a wrong module, if one tried to create a policy for
an stm device which already has a policy, using a different protocol.
IOW,
| mkdir /config/stp-policy/dummy_stm.0:p_basic.test
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic"
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1
throws:
| general protection fault: 0000 [#1] SMP PTI
| CPU: 3 PID: 2887 Comm: mkdir
| RIP: 0010:module_put.part.31+0xe/0x90
| Call Trace:
| module_put+0x13/0x20
| stm_put_protocol+0x11/0x20 [stm_core]
| stp_policy_make+0xf1/0x210 [stm_core]
| ? __kmalloc+0x183/0x220
| ? configfs_mkdir+0x10d/0x4c0
| configfs_mkdir+0x169/0x4c0
| vfs_mkdir+0x108/0x1c0
| do_mkdirat+0xe8/0x110
| __x64_sys_mkdir+0x1b/0x20
| do_syscall_64+0x5a/0x140
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
Correct this sad mistake by calling calling 'put' on the correct
reference, which happens to match another error path in the same
function, so we consolidate the two at the same time.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/hwtracing/stm/policy.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/stm/policy.c b/drivers/hwtracing/stm/policy.c
index 0910ec807187..4b9e44b227d8 100644
--- a/drivers/hwtracing/stm/policy.c
+++ b/drivers/hwtracing/stm/policy.c
@@ -440,10 +440,8 @@ stp_policy_make(struct config_group *group, const char *name)
stm->policy = kzalloc(sizeof(*stm->policy), GFP_KERNEL);
if (!stm->policy) {
- mutex_unlock(&stm->policy_mutex);
- stm_put_protocol(pdrv);
- stm_put_device(stm);
- return ERR_PTR(-ENOMEM);
+ ret = ERR_PTR(-ENOMEM);
+ goto unlock_policy;
}
config_group_init_type_name(&stm->policy->group, name,
@@ -458,7 +456,11 @@ stp_policy_make(struct config_group *group, const char *name)
mutex_unlock(&stm->policy_mutex);
if (IS_ERR(ret)) {
- stm_put_protocol(stm->pdrv);
+ /*
+ * pdrv and stm->pdrv at this point can be quite different,
+ * and only one of them needs to be 'put'
+ */
+ stm_put_protocol(pdrv);
stm_put_device(stm);
}
--
2.20.1
This is a note to let you know that I've just added the patch titled
serial: uartps: Fix interrupt mask issue to handle the RX interrupts
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 260683137ab5276113fc322fdbbc578024185fee Mon Sep 17 00:00:00 2001
From: Nava kishore Manne <nava.manne(a)xilinx.com>
Date: Tue, 18 Dec 2018 13:18:42 +0100
Subject: serial: uartps: Fix interrupt mask issue to handle the RX interrupts
properly
This patch Correct the RX interrupt mask value to handle the
RX interrupts properly.
Fixes: c8dbdc842d30 ("serial: xuartps: Rewrite the interrupt handling logic")
Signed-off-by: Nava kishore Manne <nava.manne(a)xilinx.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Michal Simek <michal.simek(a)xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/xilinx_uartps.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index c6d38617d622..094f2958cb2b 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -123,7 +123,7 @@ MODULE_PARM_DESC(rx_timeout, "Rx timeout, 1-255");
#define CDNS_UART_IXR_RXTRIG 0x00000001 /* RX FIFO trigger interrupt */
#define CDNS_UART_IXR_RXFULL 0x00000004 /* RX FIFO full interrupt. */
#define CDNS_UART_IXR_RXEMPTY 0x00000002 /* RX FIFO empty interrupt. */
-#define CDNS_UART_IXR_MASK 0x00001FFF /* Valid bit mask */
+#define CDNS_UART_IXR_RXMASK 0x000021e7 /* Valid RX bit mask */
/*
* Do not enable parity error interrupt for the following
@@ -364,7 +364,7 @@ static irqreturn_t cdns_uart_isr(int irq, void *dev_id)
cdns_uart_handle_tx(dev_id);
isrstatus &= ~CDNS_UART_IXR_TXEMPTY;
}
- if (isrstatus & CDNS_UART_IXR_MASK)
+ if (isrstatus & CDNS_UART_IXR_RXMASK)
cdns_uart_handle_rx(dev_id, isrstatus);
spin_unlock(&port->lock);
--
2.20.1
In setup_arch_memory we reserve the memory area wherein the kernel
is located. Current implementation may reserve more memory than
it actually required in case of CONFIG_LINUX_LINK_BASE is not
equal to CONFIG_LINUX_RAM_BASE. This happens because we calculate
start of the reserved region relatively to the CONFIG_LINUX_RAM_BASE
and end of the region relatively to the CONFIG_LINUX_RAM_BASE.
For example in case of HSDK board we wasted 256MiB of physical memory:
------------------->8------------------------------
Memory: 770416K/1048576K available (5496K kernel code,
240K rwdata, 1064K rodata, 2200K init, 275K bss,
278160K reserved, 0K cma-reserved)
------------------->8------------------------------
Fix that.
Cc: stable(a)vger.kernel.org
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
---
arch/arc/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index f8fe5668b30f..a56e6a8ed259 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -137,7 +137,8 @@ void __init setup_arch_memory(void)
*/
memblock_add_node(low_mem_start, low_mem_sz, 0);
- memblock_reserve(low_mem_start, __pa(_end) - low_mem_start);
+ memblock_reserve(CONFIG_LINUX_LINK_BASE,
+ __pa(_end) - CONFIG_LINUX_LINK_BASE);
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start)
--
2.14.5
Hi, Sasha,
We should backport commit a9aec5881b9d4aca184b29d33484a6a5 ("lib/iomap_copy.c: add __ioread32_copy()") for linux-4.4, linux-3.18 and linux-3.16.
Huacai
------------------ Original ------------------
From: "Sasha Levin"<sashal(a)kernel.org>;
Date: Wed, Dec 19, 2018 09:47 PM
To: "Sasha Levin"<sashal(a)kernel.org>; "Huacai Chen"<chenhc(a)lemote.com>; "James E . J . Bottomley"<jejb(a)linux.vnet.ibm.com>;
Cc: "Martin K . Petersen"<martin.petersen(a)oracle.com>; "stable"<stable(a)vger.kernel.org>; "stable"<stable(a)vger.kernel.org>;
Subject: Re: [PATCH V2] scsi: lpfc: Switch memcpy_fromio() to __read32_copy()
Hi,
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, v4.4.168, v3.18.130,
v4.19.10: Build OK!
v4.14.89: Build OK!
v4.9.146: Build OK!
v4.4.168: Build failed! Errors:
drivers/scsi/lpfc/lpfc_compat.h:93:2: error: implicit declaration of function ‘__ioread32_copy’; did you mean ‘__iowrite32_copy’? [-Werror=implicit-function-declaration]
v3.18.130: Build failed! Errors:
drivers/scsi/lpfc/lpfc_compat.h:93:2: error: implicit declaration of function ‘__ioread32_copy’; did you mean ‘__iowrite32_copy’? [-Werror=implicit-function-declaration]
How should we proceed with this patch?
--
Thanks,
Sasha
Hi,
I want to ask for the changes in cd7f3a249dbe (rtc: snvs: Add timeouts
to avoid kernel lockups) to be backported to the stable releases.
The reason is, that this patch fixes a real bug, that can cause the
kernel to lock up. I can reproduce this lockup reliably with an i.MX6UL,
PREEMPTIVE_RT_FULL enabled and v4.14.89.
Thanks,
Frieder
This is a note to let you know that I've just added the patch titled
stm class: Fix a module refcount leak in policy creation error path
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c18614a1a11276837bdd44403d84d207c9951538 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Date: Wed, 19 Dec 2018 17:19:20 +0200
Subject: stm class: Fix a module refcount leak in policy creation error path
Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
adds a bug into the error path of policy creation, that would do a
module_put() on a wrong module, if one tried to create a policy for
an stm device which already has a policy, using a different protocol.
IOW,
| mkdir /config/stp-policy/dummy_stm.0:p_basic.test
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic"
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1
throws:
| general protection fault: 0000 [#1] SMP PTI
| CPU: 3 PID: 2887 Comm: mkdir
| RIP: 0010:module_put.part.31+0xe/0x90
| Call Trace:
| module_put+0x13/0x20
| stm_put_protocol+0x11/0x20 [stm_core]
| stp_policy_make+0xf1/0x210 [stm_core]
| ? __kmalloc+0x183/0x220
| ? configfs_mkdir+0x10d/0x4c0
| configfs_mkdir+0x169/0x4c0
| vfs_mkdir+0x108/0x1c0
| do_mkdirat+0xe8/0x110
| __x64_sys_mkdir+0x1b/0x20
| do_syscall_64+0x5a/0x140
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
Correct this sad mistake by calling calling 'put' on the correct
reference, which happens to match another error path in the same
function, so we consolidate the two at the same time.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/hwtracing/stm/policy.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/stm/policy.c b/drivers/hwtracing/stm/policy.c
index 0910ec807187..4b9e44b227d8 100644
--- a/drivers/hwtracing/stm/policy.c
+++ b/drivers/hwtracing/stm/policy.c
@@ -440,10 +440,8 @@ stp_policy_make(struct config_group *group, const char *name)
stm->policy = kzalloc(sizeof(*stm->policy), GFP_KERNEL);
if (!stm->policy) {
- mutex_unlock(&stm->policy_mutex);
- stm_put_protocol(pdrv);
- stm_put_device(stm);
- return ERR_PTR(-ENOMEM);
+ ret = ERR_PTR(-ENOMEM);
+ goto unlock_policy;
}
config_group_init_type_name(&stm->policy->group, name,
@@ -458,7 +456,11 @@ stp_policy_make(struct config_group *group, const char *name)
mutex_unlock(&stm->policy_mutex);
if (IS_ERR(ret)) {
- stm_put_protocol(stm->pdrv);
+ /*
+ * pdrv and stm->pdrv at this point can be quite different,
+ * and only one of them needs to be 'put'
+ */
+ stm_put_protocol(pdrv);
stm_put_device(stm);
}
--
2.20.1