The patch titled
Subject: fork, memcg: fix cached_stacks case
has been removed from the -mm tree. Its filename was
fork-memcg-fix-cached_stacks-case.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: fork, memcg: fix cached_stacks case
5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge
fail") fixes a crash caused due to failed memcg charge of the kernel
stack. However the fix misses the cached_stacks case which this patch
fixes. So, the same crash can happen if the memcg charge of a cached
stack is failed.
Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Rik van Riel <riel(a)surriel.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/fork.c~fork-memcg-fix-cached_stacks-case
+++ a/kernel/fork.c
@@ -217,6 +217,7 @@ static unsigned long *alloc_thread_stack
memset(s->addr, 0, THREAD_SIZE);
tsk->stack_vm_area = s;
+ tsk->stack = s->addr;
return s->addr;
}
_
Patches currently in -mm which might be from shakeelb(a)google.com are
memcg-localize-memcg_kmem_enabled-check.patch
memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e1c3743e1a20647c53b719dbf28b48f45d23f2cd Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 21 Nov 2018 17:21:09 -0200
Subject: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
pc: c000000000054550: restore_gprs+0xb0/0x180
lr: 0000000000000000
sp: c00000041f157950
msr: 8000000100021033
current = 0xc00000041f143000
paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01
pid = 1021, comm = kworker/11:1
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
enter ? for help
[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
be atomic from a preemption perspective, thus, calling
preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it
might be exposed to the preemption issue, thus, disabling preemption in
this chunk of code.
Changes from v2:
* Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable(a)vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 9d39e0eb03ff..7484f43493d3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* If TM bits are set to the reserved value, it's an invalid context */
if (MSR_TM_RESV(msr_hi))
return 1;
- /* Pull in the MSR TM bits from the user context */
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /*
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ *
+ * Pull in the MSR TM bits from the user context
+ */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
/* Now, recheckpoint. This loads up all of the checkpointed (older)
* registers, including FP and V[S]Rs. After recheckpointing, the
@@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
}
#endif
+ preempt_enable();
+
return 0;
}
#endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index e53ad11be385..ba093ec5a21f 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ *
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
regs->msr |= MSR_VEC;
}
+ preempt_enable();
+
return err;
}
#endif
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e1c3743e1a20647c53b719dbf28b48f45d23f2cd Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 21 Nov 2018 17:21:09 -0200
Subject: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
pc: c000000000054550: restore_gprs+0xb0/0x180
lr: 0000000000000000
sp: c00000041f157950
msr: 8000000100021033
current = 0xc00000041f143000
paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01
pid = 1021, comm = kworker/11:1
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
enter ? for help
[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
be atomic from a preemption perspective, thus, calling
preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it
might be exposed to the preemption issue, thus, disabling preemption in
this chunk of code.
Changes from v2:
* Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable(a)vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 9d39e0eb03ff..7484f43493d3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* If TM bits are set to the reserved value, it's an invalid context */
if (MSR_TM_RESV(msr_hi))
return 1;
- /* Pull in the MSR TM bits from the user context */
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /*
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ *
+ * Pull in the MSR TM bits from the user context
+ */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
/* Now, recheckpoint. This loads up all of the checkpointed (older)
* registers, including FP and V[S]Rs. After recheckpointing, the
@@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
}
#endif
+ preempt_enable();
+
return 0;
}
#endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index e53ad11be385..ba093ec5a21f 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ *
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
regs->msr |= MSR_VEC;
}
+ preempt_enable();
+
return err;
}
#endif
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e1c3743e1a20647c53b719dbf28b48f45d23f2cd Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 21 Nov 2018 17:21:09 -0200
Subject: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
pc: c000000000054550: restore_gprs+0xb0/0x180
lr: 0000000000000000
sp: c00000041f157950
msr: 8000000100021033
current = 0xc00000041f143000
paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01
pid = 1021, comm = kworker/11:1
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
enter ? for help
[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
be atomic from a preemption perspective, thus, calling
preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it
might be exposed to the preemption issue, thus, disabling preemption in
this chunk of code.
Changes from v2:
* Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable(a)vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 9d39e0eb03ff..7484f43493d3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* If TM bits are set to the reserved value, it's an invalid context */
if (MSR_TM_RESV(msr_hi))
return 1;
- /* Pull in the MSR TM bits from the user context */
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /*
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ *
+ * Pull in the MSR TM bits from the user context
+ */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
/* Now, recheckpoint. This loads up all of the checkpointed (older)
* registers, including FP and V[S]Rs. After recheckpointing, the
@@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
}
#endif
+ preempt_enable();
+
return 0;
}
#endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index e53ad11be385..ba093ec5a21f 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ *
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
regs->msr |= MSR_VEC;
}
+ preempt_enable();
+
return err;
}
#endif
The patch below does not apply to the 4.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e1c3743e1a20647c53b719dbf28b48f45d23f2cd Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 21 Nov 2018 17:21:09 -0200
Subject: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called. More
importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
pc: c000000000054550: restore_gprs+0xb0/0x180
lr: 0000000000000000
sp: c00000041f157950
msr: 8000000100021033
current = 0xc00000041f143000
paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01
pid = 1021, comm = kworker/11:1
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
enter ? for help
[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
be atomic from a preemption perspective, thus, calling
preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it
might be exposed to the preemption issue, thus, disabling preemption in
this chunk of code.
Changes from v2:
* Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable(a)vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 9d39e0eb03ff..7484f43493d3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* If TM bits are set to the reserved value, it's an invalid context */
if (MSR_TM_RESV(msr_hi))
return 1;
- /* Pull in the MSR TM bits from the user context */
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /*
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ *
+ * Pull in the MSR TM bits from the user context
+ */
regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK);
/* Now, recheckpoint. This loads up all of the checkpointed (older)
* registers, including FP and V[S]Rs. After recheckpointing, the
@@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
}
#endif
+ preempt_enable();
+
return 0;
}
#endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index e53ad11be385..ba093ec5a21f 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
- /* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
- /*
- * Ensure that TM is enabled in regs->msr before we leave the signal
- * handler. It could be the case that (a) user disabled the TM bit
- * through the manipulation of the MSR bits in uc_mcontext or (b) the
- * TM bit was disabled because a sufficient number of context switches
- * happened whilst in the signal handler and load_tm overflowed,
- * disabling the TM bit. In either case we can end up with an illegal
- * TM state leading to a TM Bad Thing when we return to userspace.
- */
- regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+ /*
+ * Disabling preemption, since it is unsafe to be preempted
+ * with MSR[TS] set without recheckpointing.
+ */
+ preempt_disable();
+
+ /* pull in MSR TS bits from user context */
+ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+ /*
+ * Ensure that TM is enabled in regs->msr before we leave the signal
+ * handler. It could be the case that (a) user disabled the TM bit
+ * through the manipulation of the MSR bits in uc_mcontext or (b) the
+ * TM bit was disabled because a sufficient number of context switches
+ * happened whilst in the signal handler and load_tm overflowed,
+ * disabling the TM bit. In either case we can end up with an illegal
+ * TM state leading to a TM Bad Thing when we return to userspace.
+ *
+ * CAUTION:
+ * After regs->MSR[TS] being updated, make sure that get_user(),
+ * put_user() or similar functions are *not* called. These
+ * functions can generate page faults which will cause the process
+ * to be de-scheduled with MSR[TS] set but without calling
+ * tm_recheckpoint(). This can cause a bug.
+ */
+ regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
regs->msr |= MSR_VEC;
}
+ preempt_enable();
+
return err;
}
#endif
There is an imbalance between when slab_pre_alloc_hook calls
memcg_kmem_get_cache and when slab_post_alloc_hook calls
memcg_kmem_put_cache.
This can cause a memcg kmem cache to be destroyed right as
an object from that cache is being allocated, which is probably
not good. It could lead to things like a memcg allocating new
kmalloc slabs instead of using freed space in old ones, maybe
memory leaks, and maybe oopses as a memcg kmalloc slab is getting
destroyed on one CPU while another CPU is trying to do an allocation
from that same memcg.
The obvious fix would be to use the same condition for calling
memcg_kmem_put_cache that we also use to decide whether to call
memcg_kmem_get_cache.
I am not sure how long this bug has been around, since the last
changeset to touch that code - 452647784b2f ("mm: memcontrol: cleanup
kmem charge functions") - merely moved the bug from one location to
another. I am still tagging that changeset, because the fix should
automatically apply that far back.
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Fixes: 452647784b2f ("mm: memcontrol: cleanup kmem charge functions")
Cc: kernel-team(a)fb.com
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
---
mm/slab.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/slab.h b/mm/slab.h
index 4190c24ef0e9..ab3d95bef8a0 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -444,7 +444,8 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
p[i] = kasan_slab_alloc(s, object, flags);
}
- if (memcg_kmem_enabled())
+ if (memcg_kmem_enabled() &&
+ ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT)))
memcg_kmem_put_cache(s);
}
--
2.17.1
Commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.
We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.
However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.
As far as I can tell, this can lead to the same kind of race:
VM 0, VCPU 0 VM 0, VCPU 1
------------ ------------
update_vttbr (vmid 254)
update_vttbr (vmid 1) // roll over
read_lock(kvm_vmid_lock);
force_vm_exit()
local_irq_disable
need_new_vmid_gen == false //because vmid gen matches
enter_guest (vmid 254)
kvm_arch.vttbr = <PGD>:<VMID 1>
read_unlock(kvm_vmid_lock);
enter_guest (vmid 1)
Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.
Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.
Change-Id: I40aae6e89a3c8a496e13fcd8ae6bb663d16b057c
Cc: stable(a)vger.kernel.org # v4.14
Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry <julien.thierry(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
---
virt/kvm/arm/arm.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index ed42b8cf6f5b..32aa88c19b8d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -61,7 +61,7 @@ static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu);
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
static u32 kvm_next_vmid;
static unsigned int kvm_vmid_bits __read_mostly;
-static DEFINE_RWLOCK(kvm_vmid_lock);
+static DEFINE_SPINLOCK(kvm_vmid_lock);
static bool vgic_present;
@@ -447,7 +447,9 @@ void force_vm_exit(const cpumask_t *mask)
*/
static bool need_new_vmid_gen(struct kvm *kvm)
{
- return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+ u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
+ smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
+ return unlikely(READ_ONCE(kvm->arch.vmid_gen) != current_vmid_gen);
}
/**
@@ -462,16 +464,11 @@ static void update_vttbr(struct kvm *kvm)
{
phys_addr_t pgd_phys;
u64 vmid;
- bool new_gen;
- read_lock(&kvm_vmid_lock);
- new_gen = need_new_vmid_gen(kvm);
- read_unlock(&kvm_vmid_lock);
-
- if (!new_gen)
+ if (!need_new_vmid_gen(kvm))
return;
- write_lock(&kvm_vmid_lock);
+ spin_lock(&kvm_vmid_lock);
/*
* We need to re-check the vmid_gen here to ensure that if another vcpu
@@ -479,7 +476,7 @@ static void update_vttbr(struct kvm *kvm)
* use the same vmid.
*/
if (!need_new_vmid_gen(kvm)) {
- write_unlock(&kvm_vmid_lock);
+ spin_unlock(&kvm_vmid_lock);
return;
}
@@ -502,7 +499,6 @@ static void update_vttbr(struct kvm *kvm)
kvm_call_hyp(__kvm_flush_vm_context);
}
- kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
kvm->arch.vmid = kvm_next_vmid;
kvm_next_vmid++;
kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
@@ -513,7 +509,10 @@ static void update_vttbr(struct kvm *kvm)
vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK(kvm_vmid_bits);
kvm->arch.vttbr = pgd_phys | vmid;
- write_unlock(&kvm_vmid_lock);
+ smp_wmb();
+ WRITE_ONCE(kvm->arch.vmid_gen, atomic64_read(&kvm_vmid_gen));
+
+ spin_unlock(&kvm_vmid_lock);
}
static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
--
2.18.0
This is the start of the stable review cycle for the 4.9.149 release.
There are 71 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jan 9 10:53:04 UTC 2019.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.149-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.149-rc1
Tomas Winkler <tomas.winkler(a)intel.com>
tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x
Maciej W. Rozycki <macro(a)linux-mips.org>
rtc: m41t80: Correct alarm month range with RTC reads
Will Deacon <will.deacon(a)arm.com>
arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1
Vitaly Kuznetsov <vkuznets(a)redhat.com>
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
Georgy A Bystrenin <gkot(a)altlinux.org>
CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
Aaro Koskinen <aaro.koskinen(a)iki.fi>
MIPS: OCTEON: mark RGMII interface disabled on OCTEON III
Huacai Chen <chenhc(a)lemote.com>
MIPS: Align kernel load address to 64KB
Huacai Chen <chenhc(a)lemote.com>
MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: v4l2-tpg: array index could become negative
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: vivid: free bitmap_cap when updating std/timings/etc.
Nava kishore Manne <nava.manne(a)xilinx.com>
serial: uartps: Fix interrupt mask issue to handle the RX interrupts properly
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
f2fs: fix validation of the block count in sanity_check_raw_super
Breno Leitao <leitao(a)debian.org>
powerpc/tm: Set MSR[TS] just prior to recheckpoint
Josef Bacik <jbacik(a)fb.com>
btrfs: run delayed items before dropping the snapshot
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix fsync of files with multiple hard links in new directories
Macpaul Lin <macpaul.lin(a)mediatek.com>
cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader.
Johan Jonker <jbx9999(a)hotmail.com>
clk: rockchip: fix typo in rk3188 spdif_frac parent
Lukas Wunner <lukas(a)wunner.de>
spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
Lukas Wunner <lukas(a)wunner.de>
spi: bcm2835: Fix book-keeping of DMA termination
Lukas Wunner <lukas(a)wunner.de>
spi: bcm2835: Fix race on DMA termination
Theodore Ts'o <tytso(a)mit.edu>
ext4: check for shutdown and r/o file system in ext4_write_inode()
Theodore Ts'o <tytso(a)mit.edu>
ext4: force inode writes when nfsd calls commit_metadata()
Theodore Ts'o <tytso(a)mit.edu>
ext4: include terminating u32 in size of xattr entries when expanding inodes
ruippan (潘睿) <ruippan(a)tencent.com>
ext4: fix EXT4_IOC_GROUP_ADD ioctl
Maurizio Lombardi <mlombard(a)redhat.com>
ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
Pan Bian <bianpan2016(a)163.com>
ext4: fix possible use after free in ext4_quota_enable
Ben Hutchings <ben(a)decadent.org.uk>
perf pmu: Suppress potential format-truncation warning
Miquel Raynal <miquel.raynal(a)bootlin.com>
platform-msi: Free descriptors in platform_msi_domain_free()
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
Patrick Dreyer <Patrick(a)Dreyer.name>
Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
Bjørn Mork <bjorn(a)mork.no>
qmi_wwan: apply SET_DTR quirk to the SIMCOM shared device ID
Colin Ian King <colin.king(a)canonical.com>
staging: wilc1000: fix missing read_write setting when reading data
Jia-Ju Bai <baijiaju1990(a)gmail.com>
usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable()
Jörgen Storvist <jorgen.storvist(a)gmail.com>
USB: serial: option: add Fibocom NL678 series
Scott Chen <scott(a)labau.com.tw>
USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
Sameer Pujar <spujar(a)nvidia.com>
ALSA: hda/tegra: clear pending irq handlers
Mantas Mikulėnas <grawity(a)gmail.com>
ALSA: hda: add mute LED support for HP EliteBook 840 G4
Arnd Bergmann <arnd(a)arndb.de>
mtd: atmel-quadspi: disallow building on ebsa110
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: emux: Fix potential Spectre v1 vulnerabilities
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: pcm: Fix potential Spectre v1 vulnerability
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: rme9652: Fix potential Spectre v1 vulnerability
Cong Wang <xiyou.wangcong(a)gmail.com>
ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
Deepa Dinamani <deepa.kernel(a)gmail.com>
sock: Make sock->sk_stamp thread-safe
Yuval Avnery <yuvalav(a)mellanox.com>
net/mlx5: Typo fix in del_sw_hw_rule
Alaa Hleihel <alaa(a)mellanox.com>
net/mlx5e: Remove the false indication of software timestamping support
Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
gro_cell: add napi_disable in gro_cells_destroy
Cong Wang <xiyou.wangcong(a)gmail.com>
tipc: compare remote and local protocols in tipc_udp_enable()
Cong Wang <xiyou.wangcong(a)gmail.com>
tipc: use lock_sock() in tipc_sk_reinit()
Juergen Gross <jgross(a)suse.com>
xen/netfront: tolerate frags with no data
Jorgen Hansen <jhansen(a)vmware.com>
VSOCK: Send reset control packet when socket is partially bound
Jason Wang <jasowang(a)redhat.com>
vhost: make sure used idx is seen before log in vhost_add_used_n()
Cong Wang <xiyou.wangcong(a)gmail.com>
tipc: fix a double kfree_skb()
Xin Long <lucien.xin(a)gmail.com>
sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
Willem de Bruijn <willemb(a)google.com>
packet: validate address length if non-zero
Willem de Bruijn <willemb(a)google.com>
packet: validate address length
Cong Wang <xiyou.wangcong(a)gmail.com>
net/wan: fix a double free in x25_asy_open_tty()
Cong Wang <xiyou.wangcong(a)gmail.com>
netrom: fix locking in nr_find_socket()
Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
net: phy: Fix the issue that netif always links up after resuming
Michal Kubecek <mkubecek(a)suse.cz>
net: ipv4: do not handle duplicate fragments as overlapping
Eric Dumazet <edumazet(a)google.com>
isdn: fix kernel-infoleak in capi_unlocked_ioctl
Eric Dumazet <edumazet(a)google.com>
ipv6: tunnels: fix two use-after-free
Cong Wang <xiyou.wangcong(a)gmail.com>
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
Willem de Bruijn <willemb(a)google.com>
ieee802154: lowpan_header_create check must check daddr
Tyrel Datwyler <tyreld(a)linux.vnet.ibm.com>
ibmveth: fix DMA unmap error in ibmveth_xmit_start error path
Cong Wang <xiyou.wangcong(a)gmail.com>
ax25: fix a use-after-free in ax25_fillin_cb()
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
phonet: af_phonet: Fix Spectre v1 vulnerability
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
net: core: Fix Spectre v1 vulnerability
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ipv4: Fix potential Spectre v1 vulnerability
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ip6mr: Fix potential Spectre v1 vulnerability
Guenter Roeck <linux(a)roeck-us.net>
NFC: nxp-nci: Include unaligned.h instead of access_ok.h
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/include/asm/kvm_arm.h | 2 +-
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c | 7 ++-
arch/mips/cavium-octeon/executive/cvmx-helper.c | 3 +-
arch/mips/include/asm/pgtable-64.h | 5 ++
arch/powerpc/kernel/signal_32.c | 20 ++++++-
arch/powerpc/kernel/signal_64.c | 44 +++++++++-----
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx.c | 19 +++++-
arch/x86/kvm/x86.c | 3 +-
drivers/base/platform-msi.c | 6 +-
drivers/char/tpm/tpm_i2c_nuvoton.c | 11 ++--
drivers/clk/rockchip/clk-rk3188.c | 2 +-
drivers/input/mouse/elan_i2c_core.c | 1 +
drivers/isdn/capi/kcapi.c | 4 +-
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 2 +-
drivers/media/platform/vivid/vivid-vid-cap.c | 2 +
drivers/mtd/spi-nor/Kconfig | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 11 +---
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
drivers/net/phy/phy_device.c | 7 +--
drivers/net/usb/qmi_wwan.c | 2 +-
drivers/net/wan/x25_asy.c | 2 +
drivers/net/xen-netfront.c | 2 +-
drivers/nfc/nxp-nci/firmware.c | 2 +-
drivers/nfc/nxp-nci/i2c.c | 2 +-
drivers/rtc/rtc-m41t80.c | 2 +-
drivers/spi/spi-bcm2835.c | 14 ++---
drivers/staging/wilc1000/wilc_sdio.c | 1 +
drivers/tty/serial/xilinx_uartps.c | 4 +-
drivers/usb/class/cdc-acm.c | 10 ++++
drivers/usb/class/cdc-acm.h | 1 +
drivers/usb/host/r8a66597-hcd.c | 5 +-
drivers/usb/serial/option.c | 4 ++
drivers/usb/serial/pl2303.c | 5 ++
drivers/usb/serial/pl2303.h | 5 ++
drivers/vhost/vhost.c | 2 +
fs/btrfs/btrfs_inode.h | 6 ++
fs/btrfs/extent-tree.c | 4 ++
fs/btrfs/inode.c | 17 ++++++
fs/btrfs/tree-log.c | 16 ++++++
fs/cifs/smb2maperror.c | 4 +-
fs/ext4/inline.c | 5 +-
fs/ext4/inode.c | 9 ++-
fs/ext4/resize.c | 2 +-
fs/ext4/super.c | 13 ++++-
fs/ext4/xattr.c | 2 +-
fs/f2fs/super.c | 6 +-
include/linux/msi.h | 2 +
include/linux/ptr_ring.h | 2 +
include/net/gro_cells.h | 1 +
include/net/sock.h | 36 +++++++++++-
include/trace/events/ext4.h | 20 +++++++
net/ax25/af_ax25.c | 11 +++-
net/ax25/ax25_dev.c | 2 +
net/compat.c | 15 +++--
net/core/filter.c | 2 +
net/core/sock.c | 3 +
net/ieee802154/6lowpan/tx.c | 3 +
net/ipv4/ip_fragment.c | 18 ++++--
net/ipv4/ipmr.c | 3 +
net/ipv6/ip6_tunnel.c | 1 +
net/ipv6/ip6_udp_tunnel.c | 3 +-
net/ipv6/ip6_vti.c | 1 +
net/ipv6/ip6mr.c | 4 ++
net/netrom/af_netrom.c | 15 +++--
net/packet/af_packet.c | 8 ++-
net/phonet/af_phonet.c | 3 +
net/sctp/ipv6.c | 1 +
net/sunrpc/svcsock.c | 2 +-
net/tipc/socket.c | 8 ++-
net/tipc/udp_media.c | 9 ++-
net/vmw_vsock/vmci_transport.c | 67 ++++++++++++++++------
sound/core/pcm.c | 2 +
sound/pci/emu10k1/emufx.c | 5 ++
sound/pci/hda/hda_tegra.c | 2 +
sound/pci/hda/patch_conexant.c | 1 +
sound/pci/rme9652/hdsp.c | 10 ++--
sound/synth/emux/emux_hwdep.c | 7 ++-
tools/perf/util/pmu.c | 8 +--
81 files changed, 451 insertions(+), 136 deletions(-)
Do you have photos for editing? We asked this because we see your photos on
your website.
We mainly supply service for photos cut out , clipping path, and
retouching.
You may just send us a photo, we can provide you test editing to check
quality.
Thanks,
Jane
Do you have photos for editing? We asked this because we see your photos on
your website.
We mainly supply service for photos cut out , clipping path, and
retouching.
You may just send us a photo, we can provide you test editing to check
quality.
Thanks,
Jane
Do you have photos for editing? We asked this because we see your photos on
your website.
We mainly supply service for photos cut out , clipping path, and
retouching.
You may just send us a photo, we can provide you test editing to check
quality.
Thanks,
Jane
Do you have photos for editing? We asked this because we see your photos on
your website.
We mainly supply service for photos cut out , clipping path, and
retouching.
You may just send us a photo, we can provide you test editing to check
quality.
Thanks,
Jane
We see your photos photos for editing.
Please send us the details of this task.
Do your photos need cut out? Or clipping path, and retouching?
You may give us 1 photo, we will do test for you to check the quality.
Thanks,
Helen
From: Jan Stancek <jstancek(a)redhat.com>
Subject: mm: page_mapped: don't assume compound page is huge or THP
LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
page_mapped+0x78/0xb4
stable_page_flags+0x27c/0x338
kpageflags_read+0xfc/0x164
proc_reg_read+0x7c/0xb8
__vfs_read+0x58/0x178
vfs_read+0x90/0x14c
SyS_read+0x60/0xc0
Issue is that page_mapped() assumes that if compound page is not huge,
then it must be THP. But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:
for (i = 0; i < hpage_nr_pages(page); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}
I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
- allocates compound page (PAGEC) of order 1
- allocates 2 normal pages (COPY), which are initialized to 0xff
(to satisfy _mapcount >= 0)
- 2 PAGEC page structs are copied to address of first COPY page
- second page of COPY is marked as not present
- call to page_mapped(COPY) now triggers fault on access to 2nd
COPY page at offset 0x30 (_mapcount)
[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_cras…
Fix the loop to iterate for "1 << compound_order" pages.
Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".
Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.154357715…
Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
Debugged-by: Laszlo Ersek <lersek(a)redhat.com>
Suggested-by: "Kirill A. Shutemov" <kirill(a)shutemov.name>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/util.c~mm-page_mapped-dont-assume-compound-page-is-huge-or-thp
+++ a/mm/util.c
@@ -478,7 +478,7 @@ bool page_mapped(struct page *page)
return true;
if (PageHuge(page))
return false;
- for (i = 0; i < hpage_nr_pages(page); i++) {
+ for (i = 0; i < (1 << compound_order(page)); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}
_
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, memcg: fix reclaim deadlock with writeback
Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
ext4 writeback
task1:
[<ffffffff811aaa52>] wait_on_page_bit+0x82/0xa0
[<ffffffff811c5777>] shrink_page_list+0x907/0x960
[<ffffffff811c6027>] shrink_inactive_list+0x2c7/0x680
[<ffffffff811c6ba4>] shrink_node_memcg+0x404/0x830
[<ffffffff811c70a8>] shrink_node+0xd8/0x300
[<ffffffff811c73dd>] do_try_to_free_pages+0x10d/0x330
[<ffffffff811c7865>] try_to_free_mem_cgroup_pages+0xd5/0x1b0
[<ffffffff8122df2d>] try_charge+0x14d/0x720
[<ffffffff812320cc>] memcg_kmem_charge_memcg+0x3c/0xa0
[<ffffffff812321ae>] memcg_kmem_charge+0x7e/0xd0
[<ffffffff811b68a8>] __alloc_pages_nodemask+0x178/0x260
[<ffffffff8120bff5>] alloc_pages_current+0x95/0x140
[<ffffffff81074247>] pte_alloc_one+0x17/0x40
[<ffffffff811e34de>] __pte_alloc+0x1e/0x110
[<ffffffffa06739de>] alloc_set_pte+0x5fe/0xc20
[<ffffffff811e5d93>] do_fault+0x103/0x970
[<ffffffff811e6e5e>] handle_mm_fault+0x61e/0xd10
[<ffffffff8106ea02>] __do_page_fault+0x252/0x4d0
[<ffffffff8106ecb0>] do_page_fault+0x30/0x80
[<ffffffff8171bce8>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
task2:
[<ffffffff811aadc6>] __lock_page+0x86/0xa0
[<ffffffffa02f1e47>] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
[<ffffffffa08a2689>] ext4_writepages+0x479/0xd60
[<ffffffff811bbede>] do_writepages+0x1e/0x30
[<ffffffff812725e5>] __writeback_single_inode+0x45/0x320
[<ffffffff81272de2>] writeback_sb_inodes+0x272/0x600
[<ffffffff81273202>] __writeback_inodes_wb+0x92/0xc0
[<ffffffff81273568>] wb_writeback+0x268/0x300
[<ffffffff81273d24>] wb_workfn+0xb4/0x390
[<ffffffff810a2f19>] process_one_work+0x189/0x420
[<ffffffff810a31fe>] worker_thread+0x4e/0x4b0
[<ffffffff810a9786>] kthread+0xe6/0x100
[<ffffffff8171a9a1>] ret_from_fork+0x41/0x50
[<ffffffffffffffff>] 0xffffffffffffffff
He adds
: task1 is waiting for the PageWriteback bit of the page that task2 has
: collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED
: bit the page which tasks1 has locked.
More precisely task1 is handling a page fault and it has a page locked
while it charges a new page table to a memcg. That in turn hits a memory
limit reclaim and the memcg reclaim for legacy controller is waiting on
the writeback but that is never going to finish because the writeback
itself is waiting for the page locked in the #PF path. So this is
essentially ABBA deadlock:
lock_page(A)
SetPageWriteback(A)
unlock_page(A)
lock_page(B)
lock_page(B)
pte_alloc_pne
shrink_page_list
wait_on_page_writeback(A)
SetPageWriteback(B)
unlock_page(B)
# flush A, B to clear the writeback
This accumulating of more pages to flush is used by several filesystems to
generate a more optimal IO patterns.
Waiting for the writeback in legacy memcg controller is a workaround for
pre-mature OOM killer invocations because there is no dirty IO throttling
available for the controller. There is no easy way around that
unfortunately. Therefore fix this specific issue by pre-allocating the
page table outside of the page lock. We have that handy infrastructure
for that already so simply reuse the fault-around pattern which already
does this.
There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
from under a fs page locked but they should be really rare. I am not
aware of a better solution unfortunately.
[akpm(a)linux-foundation.org: fix mm/memory.c:__do_fault()]
[akpm(a)linux-foundation.org: coding-style fixes]
[mhocko(a)kernel.org: enhance comment, per Johannes]
Link: http://lkml.kernel.org/r/20181214084948.GA5624@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20181213092221.27270-1-mhocko@kernel.org
Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Liu Bo <bo.liu(a)linux.alibaba.com>
Debugged-by: Liu Bo <bo.liu(a)linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Liu Bo <bo.liu(a)linux.alibaba.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
--- a/mm/memory.c~mm-memcg-fix-reclaim-deadlock-with-writeback
+++ a/mm/memory.c
@@ -2994,6 +2994,28 @@ static vm_fault_t __do_fault(struct vm_f
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret;
+ /*
+ * Preallocate pte before we take page_lock because this might lead to
+ * deadlocks for memcg reclaim which waits for pages under writeback:
+ * lock_page(A)
+ * SetPageWriteback(A)
+ * unlock_page(A)
+ * lock_page(B)
+ * lock_page(B)
+ * pte_alloc_pne
+ * shrink_page_list
+ * wait_on_page_writeback(A)
+ * SetPageWriteback(B)
+ * unlock_page(B)
+ * # flush A, B to clear the writeback
+ */
+ if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
+ vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);
+ if (!vmf->prealloc_pte)
+ return VM_FAULT_OOM;
+ smp_wmb(); /* See comment in __pte_alloc() */
+ }
+
ret = vma->vm_ops->fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
VM_FAULT_DONE_COW)))
_
From: Christoph Lameter <cl(a)linux.com>
Subject: slab: alien caches must not be initialized if the allocation of the alien cache failed
Callers of __alloc_alien() check for NULL. We must do the same check in
__alloc_alien_cache to avoid NULL pointer dereferences on allocation
failures.
Link: http://lkml.kernel.org/r/010001680f42f192-82b4e12e-1565-4ee0-ae1f-1e9897490…
Fixes: 49dfc304ba241 ("slab: use the lock on alien_cache, instead of the lock on array_cache")
Fixes: c8522a3a5832b ("Slab: introduce alloc_alien")
Signed-off-by: Christoph Lameter <cl(a)linux.com>
Reported-by: syzbot+d6ed4ec679652b4fd4e4(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/slab.c~slab-alien-caches-must-not-be-initialized-if-the-allocation-of-the-alien-cache-failed
+++ a/mm/slab.c
@@ -666,8 +666,10 @@ static struct alien_cache *__alloc_alien
struct alien_cache *alc = NULL;
alc = kmalloc_node(memsize, gfp, node);
- init_arraycache(&alc->ac, entries, batch);
- spin_lock_init(&alc->lock);
+ if (alc) {
+ init_arraycache(&alc->ac, entries, batch);
+ spin_lock_init(&alc->lock);
+ }
return alc;
}
_