Dear All,
When running dhcp tests using the 4.4.y (and 4.4.y-cip kernel as well), I
encountered an issue where the dhcp client in the kernel could not get an
IP address when multiple network devices were enabled. It seems that the
current implementation of the dhcp client in the 4.4 kernel is send dhcp
request via device 1 -> wait <1s for response from server on device 1 ->
if no response, switch to device 2 -> repeat process on device 2 ...etc.
When the dhcp server is slow to respond, this means it is impossible to get
a dhcp address.
This series backported from upstream fixes the issue, is it possible
to apply this to 4.4.y and/or 4.4.y-cip?
Thanks,
Patryk
Geert Uytterhoeven (1):
net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout
Thierry Reding (1):
net: ipconfig: Fix more use after free
Uwe Kleine-König (3):
net: ipconfig: Support using "delayed" DHCP replies
net: ipconfig: drop inter-device timeout
net: ipconfig: fix use after free
net/ipv4/ipconfig.c | 61 ++++++++++++++++++++++++-----------------------------
1 file changed, 28 insertions(+), 33 deletions(-)
--
2.7.4
From: "Leonidas P. Papadakos" <papadakospan(a)gmail.com>
[ Upstream commit 924726888f660b2a86382a5dd051ec9ca1b18190 ]
The rk3328-roc-cc board exhibits tx stability issues with large packets,
as does the rock64 board, which was fixed with this patch
https://patchwork.kernel.org/patch/10178969/
A similar patch was merged for the rk3328-roc-cc here
https://patchwork.kernel.org/patch/10804863/
but it doesn't include the tx/rx_delay tweaks, and I find that they
help with an issue where large transfers would bring the ethernet
link down, causing a link reset regularly.
Signed-off-by: Leonidas P. Papadakos <papadakospan(a)gmail.com>
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
index 246c317f6a68..91061d9cf78b 100644
--- a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
@@ -94,8 +94,8 @@
snps,reset-gpio = <&gpio1 RK_PC2 GPIO_ACTIVE_LOW>;
snps,reset-active-low;
snps,reset-delays-us = <0 10000 50000>;
- tx_delay = <0x25>;
- rx_delay = <0x11>;
+ tx_delay = <0x24>;
+ rx_delay = <0x18>;
status = "okay";
};
--
2.19.1
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.
However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
We have invented ways for addressing this kind of issue before, such as:
8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")
But still can't cover all cases, recently James reports another such
kind of issue:
https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.
Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.
This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang <dongli.zhang(a)oracle.com>
Cc: James Smart <james.smart(a)broadcom.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: linux-scsi(a)vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen(a)oracle.com>,
Cc: Christoph Hellwig <hch(a)lst.de>,
Cc: James E . J . Bottomley <jejb(a)linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart(a)broadcom.com>
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable(a)vger.kernel.org
Reviewed-by: Hannes Reinecke <hare(a)suse.com>
Tested-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-core.c | 2 +-
block/blk-mq-sysfs.c | 6 ++++++
block/blk-mq.c | 8 ++------
block/blk-mq.h | 2 +-
4 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 93dc588fabe2..2dd94b3e9ece 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -374,7 +374,7 @@ void blk_cleanup_queue(struct request_queue *q)
blk_exit_queue(q);
if (queue_is_mq(q))
- blk_mq_free_queue(q);
+ blk_mq_exit_queue(q);
percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 3f9c3f4ac44c..4040e62c3737 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -10,6 +10,7 @@
#include <linux/smp.h>
#include <linux/blk-mq.h>
+#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
@@ -33,6 +34,11 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
{
struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx,
kobj);
+
+ if (hctx->flags & BLK_MQ_F_BLOCKING)
+ cleanup_srcu_struct(hctx->srcu);
+ blk_free_flush_queue(hctx->fq);
+ sbitmap_free(&hctx->ctx_map);
free_cpumask_var(hctx->cpumask);
kfree(hctx->ctxs);
kfree(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3e321048b259..862eb41f24f8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2273,12 +2273,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- cleanup_srcu_struct(hctx->srcu);
-
blk_mq_remove_cpuhp(hctx);
- blk_free_flush_queue(hctx->fq);
- sbitmap_free(&hctx->ctx_map);
}
static void blk_mq_exit_hw_queues(struct request_queue *q,
@@ -2913,7 +2908,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
}
EXPORT_SYMBOL(blk_mq_init_allocated_queue);
-void blk_mq_free_queue(struct request_queue *q)
+/* tags can _not_ be used after returning from blk_mq_exit_queue */
+void blk_mq_exit_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 423ea88ab6fb..633a5a77ee8b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,7 +37,7 @@ struct blk_mq_ctx {
struct kobject kobj;
} ____cacheline_aligned_in_smp;
-void blk_mq_free_queue(struct request_queue *q);
+void blk_mq_exit_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
--
2.9.5
On 04/24, Oleg wrote:
>On 04/24, Christian Brauner wrote:
>>
>> On Wed, Apr 24, 2019 at 08:52:38PM +0800, Zhenliang Wei wrote:
>>
>> > Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
>
>Yes, but ...
>
>> > Reported-by: kbuild test robot <lkp(a)intel.com>
>
>Hmm, really?
Yes, the kbuild test robot says that if I fix the problem with the third parameter type,
I should add this tag. What is wrong or missing?
Wei.
From: "He, Bo" <bo.he(a)intel.com>
[ Upstream commit cef0d4948cb0a02db37ebfdc320e127c77ab1637 ]
There is a race condition that could happen if hid_debug_rdesc_show()
is running while hdev is in the process of going away (device removal,
system suspend, etc) which could result in NULL pointer dereference:
BUG: unable to handle kernel paging request at 0000000783316040
CPU: 1 PID: 1512 Comm: getevent Tainted: G U O 4.19.20-quilt-2e5dc0ac-00029-gc455a447dd55 #1
RIP: 0010:hid_dump_device+0x9b/0x160
Call Trace:
hid_debug_rdesc_show+0x72/0x1d0
seq_read+0xe0/0x410
full_proxy_read+0x5f/0x90
__vfs_read+0x3a/0x170
vfs_read+0xa0/0x150
ksys_read+0x58/0xc0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x55/0x110
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Grab driver_input_lock to make sure the input device exists throughout the
whole process of dumping the rdesc.
[jkosina(a)suse.cz: update changelog a bit]
Signed-off-by: "he, bo" <bo.he(a)intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang(a)intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-debug.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/hid/hid-debug.c b/drivers/hid/hid-debug.c
index e930627d0c76..71b069bd2a24 100644
--- a/drivers/hid/hid-debug.c
+++ b/drivers/hid/hid-debug.c
@@ -1057,10 +1057,15 @@ static int hid_debug_rdesc_show(struct seq_file *f, void *p)
seq_printf(f, "\n\n");
/* dump parsed data and input mappings */
+ if (down_interruptible(&hdev->driver_input_lock))
+ return 0;
+
hid_dump_device(hdev, f);
seq_printf(f, "\n");
hid_dump_input_mapping(hdev, f);
+ up(&hdev->driver_input_lock);
+
return 0;
}
--
2.19.1
From: "He, Bo" <bo.he(a)intel.com>
[ Upstream commit cef0d4948cb0a02db37ebfdc320e127c77ab1637 ]
There is a race condition that could happen if hid_debug_rdesc_show()
is running while hdev is in the process of going away (device removal,
system suspend, etc) which could result in NULL pointer dereference:
BUG: unable to handle kernel paging request at 0000000783316040
CPU: 1 PID: 1512 Comm: getevent Tainted: G U O 4.19.20-quilt-2e5dc0ac-00029-gc455a447dd55 #1
RIP: 0010:hid_dump_device+0x9b/0x160
Call Trace:
hid_debug_rdesc_show+0x72/0x1d0
seq_read+0xe0/0x410
full_proxy_read+0x5f/0x90
__vfs_read+0x3a/0x170
vfs_read+0xa0/0x150
ksys_read+0x58/0xc0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x55/0x110
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Grab driver_input_lock to make sure the input device exists throughout the
whole process of dumping the rdesc.
[jkosina(a)suse.cz: update changelog a bit]
Signed-off-by: "he, bo" <bo.he(a)intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang(a)intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-debug.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/hid/hid-debug.c b/drivers/hid/hid-debug.c
index d7179dd3c9ef..3cafa1d28fed 100644
--- a/drivers/hid/hid-debug.c
+++ b/drivers/hid/hid-debug.c
@@ -1058,10 +1058,15 @@ static int hid_debug_rdesc_show(struct seq_file *f, void *p)
seq_printf(f, "\n\n");
/* dump parsed data and input mappings */
+ if (down_interruptible(&hdev->driver_input_lock))
+ return 0;
+
hid_dump_device(hdev, f);
seq_printf(f, "\n");
hid_dump_input_mapping(hdev, f);
+ up(&hdev->driver_input_lock);
+
return 0;
}
--
2.19.1
From: "Leonidas P. Papadakos" <papadakospan(a)gmail.com>
[ Upstream commit 924726888f660b2a86382a5dd051ec9ca1b18190 ]
The rk3328-roc-cc board exhibits tx stability issues with large packets,
as does the rock64 board, which was fixed with this patch
https://patchwork.kernel.org/patch/10178969/
A similar patch was merged for the rk3328-roc-cc here
https://patchwork.kernel.org/patch/10804863/
but it doesn't include the tx/rx_delay tweaks, and I find that they
help with an issue where large transfers would bring the ethernet
link down, causing a link reset regularly.
Signed-off-by: Leonidas P. Papadakos <papadakospan(a)gmail.com>
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin (Microsoft) <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
index 99d0d9912950..a91f87df662e 100644
--- a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
@@ -107,8 +107,8 @@
snps,reset-gpio = <&gpio1 RK_PC2 GPIO_ACTIVE_LOW>;
snps,reset-active-low;
snps,reset-delays-us = <0 10000 50000>;
- tx_delay = <0x25>;
- rx_delay = <0x11>;
+ tx_delay = <0x24>;
+ rx_delay = <0x18>;
status = "okay";
};
--
2.19.1
Hi,
On Sep 10, 2018 at 4:39 PM Katsuhiro Suzuki <katsuhiro(a)katsuster.net> wrote:
>
> This patch adds SNDRV_PCM_INFO_INTERLEAVED into PCM hardware info.
>
> Signed-off-by: Katsuhiro Suzuki <katsuhiro(a)katsuster.net>
> ---
> sound/soc/rockchip/rockchip_pcm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
I just tracked down that audio was broken on 4.19 and needs this
change to work. Stable: can you please pick it? Specifically I
believe you'd want to treat it as if it had:
Fixes: 75b31192fe6a ("ASoC: rockchip: add config for rockchip
dmaengine pcm register")
-Doug
From: "Leonidas P. Papadakos" <papadakospan(a)gmail.com>
[ Upstream commit 924726888f660b2a86382a5dd051ec9ca1b18190 ]
The rk3328-roc-cc board exhibits tx stability issues with large packets,
as does the rock64 board, which was fixed with this patch
https://patchwork.kernel.org/patch/10178969/
A similar patch was merged for the rk3328-roc-cc here
https://patchwork.kernel.org/patch/10804863/
but it doesn't include the tx/rx_delay tweaks, and I find that they
help with an issue where large transfers would bring the ethernet
link down, causing a link reset regularly.
Signed-off-by: Leonidas P. Papadakos <papadakospan(a)gmail.com>
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
index 246c317f6a68..91061d9cf78b 100644
--- a/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-roc-cc.dts
@@ -94,8 +94,8 @@
snps,reset-gpio = <&gpio1 RK_PC2 GPIO_ACTIVE_LOW>;
snps,reset-active-low;
snps,reset-delays-us = <0 10000 50000>;
- tx_delay = <0x25>;
- rx_delay = <0x11>;
+ tx_delay = <0x24>;
+ rx_delay = <0x18>;
status = "okay";
};
--
2.19.1
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Mon Sep 17 00:00:00 2001
From: Phil Auld <pauld(a)redhat.com>
Date: Tue, 19 Mar 2019 09:00:05 -0400
Subject: [PATCH] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard
lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
On 04/24, Christian Brauner wrote:
>On Wed, Apr 24, 2019 at 08:52:38PM +0800, Zhenliang Wei wrote:
>>
>> Acked-by: Christian Brauner <christian(a)brauner.io>
>
>I think we're supposed to use more Reviewed-bys so feel free (or Andrew) to change this to:
>
>Reviewed-by: Christian Brauner <christian(a)brauner.io>
Ok, I will change this in patch v5.
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2441,6 +2441,8 @@ bool get_signal(struct ksignal *ksig)
>> if (signal_group_exit(signal)) {
>> ksig->info.si_signo = signr = SIGKILL;
>> sigdelset(¤t->pending.signal, SIGKILL);
>> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>> + &sighand->action[signr - 1]);
>
>Hm, sorry for being the really nitpicky person here. Just for the sake of consistency how about we do either:
>
>+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>+ &sighand->action[SIGKILL - 1]);
>
>or
>
>+ trace_signal_deliver(signr, SEND_SIG_NOINFO,
>+ &sighand->action[signr - 1]);
>
>I'm not going to argue about this though. Can just also leave it as is.
Thank you for your comments and learn from rigorous people! I will take:
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
+ &sighand->action[SIGKILL - 1]);
Any other suggestions about the patch?
Wei
In the fixes commit, removing SIGKILL from each thread signal mask
and executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the SIGKILL
signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal"
after executing sigdelset.
Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.
Acked-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Reported-by: kbuild test robot <lkp(a)intel.com>
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
---
kernel/signal.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/signal.c b/kernel/signal.c
index 227ba170298e..3edf526db7c6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2441,6 +2441,8 @@ bool get_signal(struct ksignal *ksig)
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
+ &sighand->action[signr - 1]);
recalc_sigpending();
goto fatal;
}
--
2.14.1.windows.1
Existing data command CRC error handling on kernel 4.20 stable branch
is non-standard and does not work with some Intel host controllers.
Specifically, the assumption that the host controller will continue
operating normally after the error interrupt, is not valid. So we
suggest cherry-pick the 3 patches listed below to kernel 4.20 stable
branch, which can change the driver to handle the error in the same
manner as a data CRC error.
4bf7809: mmc: sdhci: Fix data command CRC error handling
869f8a6: mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR
af849c8: mmc: sdhci: Handle auto-command errors
All of the patches above have landed on kernel 5.0 stable branch.
Starting from 9c225f2655 (vfs: atomic f_pos accesses as per POSIX) files
opened even via nonseekable_open gate read and write via lock and do not
allow them to be run simultaneously. This can create read vs write
deadlock if a filesystem is trying to implement a socket-like file which
is intended to be simultaneously used for both read and write from
filesystem client. See 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. 581d21a2d0 (xenbus: fix deadlock on
writes to /proc/xen/xenbus) for a similar deadlock example on /proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfuse…https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfuse…https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfuse…
so if we would do such a change it will break a real user.
-> Add another flag (FOPEN_STREAM) for filesystem servers to indicate
that the opened handler is having stream-like semantics; does not use
file position and thus the kernel is free to issue simultaneous read and
write request on opened file handle.
This patch together with stream_open (10dce8af3422) should be added to
stable kernels starting from v3.14+ (the kernel where 9c225f2655 first
appeared). This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in
open handler and this way avoid the deadlock on all kernel versions. This
should work because fuse_finish_open ignores unknown open flags returned
from a filesystem and so passing FOPEN_STREAM to a kernel that is not
aware of this flag cannot hurt. In turn the kernel that is not aware of
FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to
implement streams without read vs write deadlock.
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Yongzhi Pan <panyongzhi(a)gmail.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: David Vrabel <david.vrabel(a)citrix.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall(a)lip6.fr>
Cc: Nikolaus Rath <Nikolaus(a)rath.org>
Cc: Han-Wen Nienhuys <hanwen(a)google.com>
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr(a)nexedi.com>
---
( resending the same patch with updated description to reference stream_open
that was landed to master as 10dce8af3422; also added cc stable )
fs/fuse/file.c | 4 +++-
include/uapi/linux/fuse.h | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 06096b60f1df..44de96cb7871 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -178,7 +178,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
if (!(ff->open_flags & FOPEN_KEEP_CACHE))
invalidate_inode_pages2(inode->i_mapping);
- if (ff->open_flags & FOPEN_NONSEEKABLE)
+ if (ff->open_flags & FOPEN_STREAM)
+ stream_open(inode, file);
+ else if (ff->open_flags & FOPEN_NONSEEKABLE)
nonseekable_open(inode, file);
if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
struct fuse_inode *fi = get_fuse_inode(inode);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 2ac598614a8f..26abf0a571c7 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -229,11 +229,13 @@ struct fuse_file_lock {
* FOPEN_KEEP_CACHE: don't invalidate the data cache on open
* FOPEN_NONSEEKABLE: the file is not seekable
* FOPEN_CACHE_DIR: allow caching this directory
+ * FOPEN_STREAM: the file is stream-like
*/
#define FOPEN_DIRECT_IO (1 << 0)
#define FOPEN_KEEP_CACHE (1 << 1)
#define FOPEN_NONSEEKABLE (1 << 2)
#define FOPEN_CACHE_DIR (1 << 3)
+#define FOPEN_STREAM (1 << 4)
/**
* INIT request/reply flags
--
2.21.0.765.geec228f530
Hi Greg and Sasha,
Please apply this commit to 4.4 through 5.0 (patches are threaded in
reply to this one), which will prevent Clang from emitting references
to compiler runtime functions and doing some performance-killing
optimization when using CONFIG_CC_OPTIMIZE_FOR_SIZE.
Please let me know if I did something wrong or if there are any
objections.
Cheers,
Nathan
As the comment notes, the return codes for TSYNC and NEW_LISTENER conflict,
because they both return positive values, one in the case of success and
one in the case of error. So, let's disallow both of these flags together.
While this is technically a userspace break, all the users I know of are
still waiting on me to land this feature in libseccomp, so I think it'll be
safe. Also, at present my use case doesn't require TSYNC at all, so this
isn't a big deal to disallow. If someone wanted to support this, a path
forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN, but
the use cases are so different I don't see it really happening.
Finally, it's worth noting that this does actually fix a UAF issue: at the end
of seccomp_set_mode_filter(), we have:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);
But if ret > 0 because TSYNC raced, we'll install the listener fd and then free
the filter out from underneath it, causing a UAF when the task closes it or
dies. This patch also switches the condition to be simply if (ret), so that
if someone does add the flag mentioned above, they won't have to remember
to fix this too.
Signed-off-by: Tycho Andersen <tycho(a)tycho.ws>
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
CC: stable(a)vger.kernel.org # v5.0+
---
kernel/seccomp.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index d0d355ded2f4..79bada51091b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -500,7 +500,10 @@ seccomp_prepare_user_filter(const char __user *user_filter)
*
* Caller must be holding current->sighand->siglock lock.
*
- * Returns 0 on success, -ve on error.
+ * Returns 0 on success, -ve on error, or
+ * - in TSYNC mode: the pid of a thread which was either not in the correct
+ * seccomp mode or did not have an ancestral seccomp filter
+ * - in NEW_LISTENER mode: the fd of the new listener
*/
static long seccomp_attach_filter(unsigned int flags,
struct seccomp_filter *filter)
@@ -1256,6 +1259,16 @@ static long seccomp_set_mode_filter(unsigned int flags,
if (flags & ~SECCOMP_FILTER_FLAG_MASK)
return -EINVAL;
+ /*
+ * In the successful case, NEW_LISTENER returns the new listener fd.
+ * But in the failure case, TSYNC returns the thread that died. If you
+ * combine these two flags, there's no way to tell whether something
+ * succeded or failed. So, let's disallow this combination.
+ */
+ if ((flags & SECCOMP_FILTER_FLAG_TSYNC) &&
+ (flags && SECCOMP_FILTER_FLAG_NEW_LISTENER))
+ return -EINVAL;
+
/* Prepare the new filter before holding any locks. */
prepared = seccomp_prepare_user_filter(filter);
if (IS_ERR(prepared))
@@ -1302,7 +1315,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
mutex_unlock(¤t->signal->cred_guard_mutex);
out_put_fd:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
- if (ret < 0) {
+ if (ret) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
--
2.19.1
Lars Persson <lists(a)bofh.nu> reported that a label was unused in
the 4.14 version of this patchset, and the issue was present in
the 4.19 patchset as well, so I'm sending a v2 that fixes it.
The original 4.19 patchset queued for stable is OK, and
can be used as is, but this v2 is a bit better: it fixes the
unused label issue and handles overlapping fragments better.
Sorry for the mess/v2.
=======================
Currently, 4.19 and earlier stable kernels contain a security fix
that is not fully IPv6 standard compliant.
This patchset backports IPv6 defrag fixes from 5.1rc that restore
standard-compliance.
Original 5.1 patchet: https://patchwork.ozlabs.org/cover/1029418/
v2 changes: handle overlapping fragments the way it is done upstream
Peter Oskolkov (3):
net: IP defrag: encapsulate rbtree defrag code into callable functions
net: IP6 defrag: use rbtrees for IPv6 defrag
net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
include/net/inet_frag.h | 16 +-
include/net/ipv6_frag.h | 11 +-
net/ipv4/inet_fragment.c | 293 +++++++++++++++++++++++
net/ipv4/ip_fragment.c | 302 +++---------------------
net/ipv6/netfilter/nf_conntrack_reasm.c | 260 ++++++--------------
net/ipv6/reassembly.c | 240 ++++++-------------
6 files changed, 488 insertions(+), 634 deletions(-)
--
2.21.0.593.g511ec345e18-goog
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: e4abcebedac3 - Linux 5.0.9
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out a ref:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Ref: e4abcebedac3 - Linux 5.0.9
We then merged the patchset with `git am`:
bonding-fix-event-handling-for-stacked-bonds.patch
failover-allow-name-change-on-iff_up-slave-interfaces.patch
net-atm-fix-potential-spectre-v1-vulnerabilities.patch
net-bridge-fix-per-port-af_packet-sockets.patch
net-bridge-multicast-use-rcu-to-access-port-list-from-br_multicast_start_querier.patch
net-fec-manage-ahb-clock-in-runtime-pm.patch
net-fix-missing-meta-data-in-skb-with-vlan-packet.patch
net-fou-do-not-use-guehdr-after-iptunnel_pull_offloads-in-gue_udp_recv.patch
tcp-tcp_grow_window-needs-to-respect-tcp_space.patch
team-set-slave-to-promisc-if-team-is-already-in-promisc-mode.patch
tipc-missing-entries-in-name-table-of-publications.patch
vhost-reject-zero-size-iova-range.patch
ipv4-recompile-ip-options-in-ipv4_link_failure.patch
ipv4-ensure-rcu_read_lock-in-ipv4_link_failure.patch
mlxsw-spectrum_switchdev-add-mdb-entries-in-prepare-phase.patch
mlxsw-core-do-not-use-wq_mem_reclaim-for-emad-workqueue.patch
mlxsw-core-do-not-use-wq_mem_reclaim-for-mlxsw-ordered-workqueue.patch
mlxsw-core-do-not-use-wq_mem_reclaim-for-mlxsw-workqueue.patch
mlxsw-spectrum_router-do-not-check-vrf-mac-address.patch
net-thunderx-raise-xdp-mtu-to-1508.patch
net-thunderx-don-t-allow-jumbo-frames-with-xdp.patch
net-tls-fix-the-iv-leaks.patch
net-tls-don-t-leak-partially-sent-record-in-device-mode.patch
net-strparser-partially-revert-strparser-call-skb_unclone-conditionally.patch
net-tls-fix-build-without-config_tls_device.patch
net-bridge-fix-netlink-export-of-vlan_stats_per_port-option.patch
net-mlx5e-xdp-avoid-checksum-complete-when-xdp-prog-is-loaded.patch
net-mlx5e-protect-against-non-uplink-representor-for-encap.patch
net-mlx5e-switch-to-toeplitz-rss-hash-by-default.patch
net-mlx5e-rx-fixup-skb-checksum-for-packets-with-tail-padding.patch
net-mlx5e-rx-check-ip-headers-sanity.patch
revert-net-mlx5e-enable-reporting-checksum-unnecessary-also-for-l3-packets.patch
net-mlx5-fpga-tls-hold-rcu-read-lock-a-bit-longer.patch
net-tls-prevent-bad-memory-access-in-tls_is_sk_tx_device_offloaded.patch
net-mlx5-fpga-tls-idr-remove-on-flow-delete.patch
route-avoid-crash-from-dereferencing-null-rt-from.patch
nfp-flower-replace-cfi-with-vlan-present.patch
nfp-flower-remove-vlan-cfi-bit-from-push-vlan-action.patch
sch_cake-use-tc_skb_protocol-helper-for-getting-packet-protocol.patch
sch_cake-make-sure-we-can-write-the-ip-header-before-changing-dscp-bits.patch
nfc-nci-add-some-bounds-checking-in-nci_hci_cmd_received.patch
nfc-nci-potential-off-by-one-in-pipes-array.patch
sch_cake-simplify-logic-in-cake_select_tin.patch
nfit-ars-remove-ars_start_flags.patch
nfit-ars-introduce-scrub_flags.patch
nfit-ars-allow-root-to-busy-poll-the-ars-state-machi.patch
nfit-ars-avoid-stale-ars-results.patch
tpm-tpm_i2c_atmel-return-e2big-when-the-transfer-is-.patch
tpm-fix-the-type-of-the-return-value-in-calc_tpm2_ev.patch
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
s390x:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/s390x/kernel-stable_queue-s390x-e0…
kernel build: https://artifacts.cki-project.org/builds/s390x/kernel-stable_queue-s390x-e0…
x86_64:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
✅ Boot test [0]
✅ LTP lite [1]
✅ AMTU (Abstract Machine Test Utility) [2]
✅ httpd: mod_ssl smoke sanity [3]
✅ iotop: sanity [4]
✅ tuned: tune-processes-through-perf [5]
🚧 ✅ Networking route: pmtu [6]
🚧 ✅ audit: audit testsuite test [7]
🚧 ✅ httpd: php sanity [8]
🚧 ✅ stress: stress-ng [9]
ppc64le:
✅ Boot test [0]
✅ LTP lite [1]
✅ AMTU (Abstract Machine Test Utility) [2]
✅ httpd: mod_ssl smoke sanity [3]
✅ iotop: sanity [4]
✅ tuned: tune-processes-through-perf [5]
🚧 ✅ Networking route: pmtu [6]
🚧 ✅ audit: audit testsuite test [7]
🚧 ✅ httpd: php sanity [8]
🚧 ✅ selinux-policy: serge-testsuite [10]
🚧 ✅ stress: stress-ng [9]
s390x:
✅ Boot test [0]
✅ LTP lite [1]
✅ httpd: mod_ssl smoke sanity [3]
✅ iotop: sanity [4]
✅ tuned: tune-processes-through-perf [5]
🚧 ✅ Networking route: pmtu [6]
🚧 ✅ audit: audit testsuite test [7]
🚧 ✅ httpd: php sanity [8]
🚧 ✅ stress: stress-ng [9]
x86_64:
Test source:
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Mon Sep 17 00:00:00 2001
From: Phil Auld <pauld(a)redhat.com>
Date: Tue, 19 Mar 2019 09:00:05 -0400
Subject: [PATCH] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard
lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Mon Sep 17 00:00:00 2001
From: Phil Auld <pauld(a)redhat.com>
Date: Tue, 19 Mar 2019 09:00:05 -0400
Subject: [PATCH] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard
lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Mon Sep 17 00:00:00 2001
From: Phil Auld <pauld(a)redhat.com>
Date: Tue, 19 Mar 2019 09:00:05 -0400
Subject: [PATCH] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard
lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e8e19226398db8265a8e675fcc0118b9e80c9e8 Mon Sep 17 00:00:00 2001
From: Phil Auld <pauld(a)redhat.com>
Date: Tue, 19 Mar 2019 09:00:05 -0400
Subject: [PATCH] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard
lockup
With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0. The large number of children can make
do_sched_cfs_period_timer() take longer than the period.
NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
RIP: 0010:tg_nop+0x0/0x10
<IRQ>
walk_tg_tree_from+0x29/0xb0
unthrottle_cfs_rq+0xe0/0x1a0
distribute_cfs_runtime+0xd3/0xf0
sched_cfs_period_timer+0xcb/0x160
? sched_cfs_slack_timer+0xd0/0xd0
__hrtimer_run_queues+0xfb/0x270
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires. This preserves the relative runtime
quota while preventing the hard lockup.
A warning is issued reporting this state and the new values.
Signed-off-by: Phil Auld <pauld(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Anton Blanchard <anton(a)ozlabs.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+extern const u64 max_cfs_quota_period;
+
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
unsigned long flags;
int overrun;
int idle = 0;
+ int count = 0;
raw_spin_lock_irqsave(&cfs_b->lock, flags);
for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
if (!overrun)
break;
+ if (++count > 3) {
+ u64 new, old = ktime_to_ns(cfs_b->period);
+
+ new = (old * 147) / 128; /* ~115% */
+ new = min(new, max_cfs_quota_period);
+
+ cfs_b->period = ns_to_ktime(new);
+
+ /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+ cfs_b->quota *= new;
+ cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+ pr_warn_ratelimited(
+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+ smp_processor_id(),
+ div_u64(new, NSEC_PER_USEC),
+ div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+ /* reset count so we don't come right back in here */
+ count = 0;
+ }
+
idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
}
if (idle)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2f5fb19341883bb6e37da351bc3700489d8506a7 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 14 Apr 2019 19:51:06 +0200
Subject: [PATCH] x86/speculation: Prevent deadlock on ssb_state::lock
Mikhail reported a lockdep splat related to the AMD specific ssb_state
lock:
CPU0 CPU1
lock(&st->lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&st->lock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
The connection between sighand->siglock and st->lock comes through seccomp,
which takes st->lock while holding sighand->siglock.
Make sure interrupts are disabled when __speculation_ctrl_update() is
invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to
catch future offenders.
Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Thomas Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linu…
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58ac7be52c7a..957eae13b370 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -426,6 +426,8 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp,
u64 msr = x86_spec_ctrl_base;
bool updmsr = false;
+ lockdep_assert_irqs_disabled();
+
/*
* If TIF_SSBD is different, select the proper mitigation
* method. Note that if SSBD mitigation is disabled or permanentely
@@ -477,10 +479,12 @@ static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk)
void speculation_ctrl_update(unsigned long tif)
{
+ unsigned long flags;
+
/* Forced update. Make sure all relevant TIF flags are different */
- preempt_disable();
+ local_irq_save(flags);
__speculation_ctrl_update(~tif, tif);
- preempt_enable();
+ local_irq_restore(flags);
}
/* Called from seccomp/prctl update */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2f5fb19341883bb6e37da351bc3700489d8506a7 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 14 Apr 2019 19:51:06 +0200
Subject: [PATCH] x86/speculation: Prevent deadlock on ssb_state::lock
Mikhail reported a lockdep splat related to the AMD specific ssb_state
lock:
CPU0 CPU1
lock(&st->lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&st->lock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
The connection between sighand->siglock and st->lock comes through seccomp,
which takes st->lock while holding sighand->siglock.
Make sure interrupts are disabled when __speculation_ctrl_update() is
invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to
catch future offenders.
Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Thomas Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linu…
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58ac7be52c7a..957eae13b370 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -426,6 +426,8 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp,
u64 msr = x86_spec_ctrl_base;
bool updmsr = false;
+ lockdep_assert_irqs_disabled();
+
/*
* If TIF_SSBD is different, select the proper mitigation
* method. Note that if SSBD mitigation is disabled or permanentely
@@ -477,10 +479,12 @@ static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk)
void speculation_ctrl_update(unsigned long tif)
{
+ unsigned long flags;
+
/* Forced update. Make sure all relevant TIF flags are different */
- preempt_disable();
+ local_irq_save(flags);
__speculation_ctrl_update(~tif, tif);
- preempt_enable();
+ local_irq_restore(flags);
}
/* Called from seccomp/prctl update */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b191fa96ea6dc00d331dcc28c1f7db5e075693a0 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sun, 24 Feb 2019 01:50:49 +0900
Subject: [PATCH] x86/kprobes: Avoid kretprobe recursion bug
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea(a)gmail.com>
Tested-by: Andrea Righi <righi.andrea(a)gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Acked-by: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 18fbe9be2d68..fed46ddb1eef 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -749,11 +749,16 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+static struct kprobe kretprobe_kprobe = {
+ .addr = (void *)kretprobe_trampoline,
+};
+
/*
* Called from kretprobe_trampoline
*/
static __used void *trampoline_handler(struct pt_regs *regs)
{
+ struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -763,6 +768,17 @@ static __used void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;
+ preempt_disable();
+
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never run in kprobe handler, kprobe must not
+ * be running at this point.
+ */
+ kcb = get_kprobe_ctlblk();
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
@@ -838,10 +854,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
orig_ret_address = (unsigned long)ri->ret_addr;
if (ri->rp && ri->rp->handler) {
__this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
ri->ret_addr = correct_ret_addr;
ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
}
recycle_rp_inst(ri, &empty_rp);
@@ -857,6 +872,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
kretprobe_hash_unlock(current, &flags);
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b191fa96ea6dc00d331dcc28c1f7db5e075693a0 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sun, 24 Feb 2019 01:50:49 +0900
Subject: [PATCH] x86/kprobes: Avoid kretprobe recursion bug
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea(a)gmail.com>
Tested-by: Andrea Righi <righi.andrea(a)gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Acked-by: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 18fbe9be2d68..fed46ddb1eef 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -749,11 +749,16 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+static struct kprobe kretprobe_kprobe = {
+ .addr = (void *)kretprobe_trampoline,
+};
+
/*
* Called from kretprobe_trampoline
*/
static __used void *trampoline_handler(struct pt_regs *regs)
{
+ struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -763,6 +768,17 @@ static __used void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;
+ preempt_disable();
+
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never run in kprobe handler, kprobe must not
+ * be running at this point.
+ */
+ kcb = get_kprobe_ctlblk();
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
@@ -838,10 +854,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
orig_ret_address = (unsigned long)ri->ret_addr;
if (ri->rp && ri->rp->handler) {
__this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
ri->ret_addr = correct_ret_addr;
ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
}
recycle_rp_inst(ri, &empty_rp);
@@ -857,6 +872,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
kretprobe_hash_unlock(current, &flags);
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b191fa96ea6dc00d331dcc28c1f7db5e075693a0 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sun, 24 Feb 2019 01:50:49 +0900
Subject: [PATCH] x86/kprobes: Avoid kretprobe recursion bug
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea(a)gmail.com>
Tested-by: Andrea Righi <righi.andrea(a)gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Acked-by: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 18fbe9be2d68..fed46ddb1eef 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -749,11 +749,16 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+static struct kprobe kretprobe_kprobe = {
+ .addr = (void *)kretprobe_trampoline,
+};
+
/*
* Called from kretprobe_trampoline
*/
static __used void *trampoline_handler(struct pt_regs *regs)
{
+ struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -763,6 +768,17 @@ static __used void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;
+ preempt_disable();
+
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never run in kprobe handler, kprobe must not
+ * be running at this point.
+ */
+ kcb = get_kprobe_ctlblk();
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
@@ -838,10 +854,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
orig_ret_address = (unsigned long)ri->ret_addr;
if (ri->rp && ri->rp->handler) {
__this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
ri->ret_addr = correct_ret_addr;
ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
}
recycle_rp_inst(ri, &empty_rp);
@@ -857,6 +872,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
kretprobe_hash_unlock(current, &flags);
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b191fa96ea6dc00d331dcc28c1f7db5e075693a0 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sun, 24 Feb 2019 01:50:49 +0900
Subject: [PATCH] x86/kprobes: Avoid kretprobe recursion bug
Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.
This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.
This revives the old lost kprobe again.
With this fix, we don't see deadlock anymore.
And you can see that all inner-called kretprobe are skipped.
event_1 235 0
event_2 19375 19612
The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: Andrea Righi <righi.andrea(a)gmail.com>
Tested-by: Andrea Righi <righi.andrea(a)gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Acked-by: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 18fbe9be2d68..fed46ddb1eef 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -749,11 +749,16 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+static struct kprobe kretprobe_kprobe = {
+ .addr = (void *)kretprobe_trampoline,
+};
+
/*
* Called from kretprobe_trampoline
*/
static __used void *trampoline_handler(struct pt_regs *regs)
{
+ struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -763,6 +768,17 @@ static __used void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;
+ preempt_disable();
+
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never run in kprobe handler, kprobe must not
+ * be running at this point.
+ */
+ kcb = get_kprobe_ctlblk();
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
@@ -838,10 +854,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
orig_ret_address = (unsigned long)ri->ret_addr;
if (ri->rp && ri->rp->handler) {
__this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
ri->ret_addr = correct_ret_addr;
ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
}
recycle_rp_inst(ri, &empty_rp);
@@ -857,6 +872,9 @@ static __used void *trampoline_handler(struct pt_regs *regs)
kretprobe_hash_unlock(current, &flags);
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc8a3d8925a8fa09fa550e0da115d95851ce33c6 Mon Sep 17 00:00:00 2001
From: Ben Gardon <bgardon(a)google.com>
Date: Mon, 8 Apr 2019 11:07:30 -0700
Subject: [PATCH] kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..9b7b731a0032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -126,7 +126,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
}
#define KVM_PERMILLE_MMU_PAGES 20
-#define KVM_MIN_ALLOC_MMU_PAGES 64
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
@@ -844,9 +844,9 @@ enum kvm_irqchip_mode {
};
struct kvm_arch {
- unsigned int n_used_mmu_pages;
- unsigned int n_requested_mmu_pages;
- unsigned int n_max_mmu_pages;
+ unsigned long n_used_mmu_pages;
+ unsigned long n_requested_mmu_pages;
+ unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@@ -1256,8 +1256,8 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f753728953..e10962dfc203 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2007,7 +2007,7 @@ static int is_empty_shadow_page(u64 *spt)
* aggregate version in order to make the slab shrinker
* faster
*/
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
{
kvm->arch.n_used_mmu_pages += nr;
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
@@ -2763,7 +2763,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
*/
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
LIST_HEAD(invalid_list);
@@ -6031,10 +6031,10 @@ int kvm_mmu_module_init(void)
/*
* Calculate mmu pages needed for kvm.
*/
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
{
- unsigned int nr_mmu_pages;
- unsigned int nr_pages = 0;
+ unsigned long nr_mmu_pages;
+ unsigned long nr_pages = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
@@ -6047,8 +6047,7 @@ unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
}
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages,
- (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
return nr_mmu_pages;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbdc60f2fae8..54c2a377795b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,7 +64,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
-static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
{
if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
return kvm->arch.n_max_mmu_pages -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..455f156f56ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4270,7 +4270,7 @@ static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
}
static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
- u32 kvm_nr_mmu_pages)
+ unsigned long kvm_nr_mmu_pages)
{
if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
return -EINVAL;
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
-static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
+static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
{
return kvm->arch.n_max_mmu_pages;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc8a3d8925a8fa09fa550e0da115d95851ce33c6 Mon Sep 17 00:00:00 2001
From: Ben Gardon <bgardon(a)google.com>
Date: Mon, 8 Apr 2019 11:07:30 -0700
Subject: [PATCH] kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..9b7b731a0032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -126,7 +126,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
}
#define KVM_PERMILLE_MMU_PAGES 20
-#define KVM_MIN_ALLOC_MMU_PAGES 64
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
@@ -844,9 +844,9 @@ enum kvm_irqchip_mode {
};
struct kvm_arch {
- unsigned int n_used_mmu_pages;
- unsigned int n_requested_mmu_pages;
- unsigned int n_max_mmu_pages;
+ unsigned long n_used_mmu_pages;
+ unsigned long n_requested_mmu_pages;
+ unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@@ -1256,8 +1256,8 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f753728953..e10962dfc203 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2007,7 +2007,7 @@ static int is_empty_shadow_page(u64 *spt)
* aggregate version in order to make the slab shrinker
* faster
*/
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
{
kvm->arch.n_used_mmu_pages += nr;
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
@@ -2763,7 +2763,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
*/
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
LIST_HEAD(invalid_list);
@@ -6031,10 +6031,10 @@ int kvm_mmu_module_init(void)
/*
* Calculate mmu pages needed for kvm.
*/
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
{
- unsigned int nr_mmu_pages;
- unsigned int nr_pages = 0;
+ unsigned long nr_mmu_pages;
+ unsigned long nr_pages = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
@@ -6047,8 +6047,7 @@ unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
}
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages,
- (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
return nr_mmu_pages;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbdc60f2fae8..54c2a377795b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,7 +64,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
-static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
{
if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
return kvm->arch.n_max_mmu_pages -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..455f156f56ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4270,7 +4270,7 @@ static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
}
static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
- u32 kvm_nr_mmu_pages)
+ unsigned long kvm_nr_mmu_pages)
{
if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
return -EINVAL;
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
-static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
+static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
{
return kvm->arch.n_max_mmu_pages;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc8a3d8925a8fa09fa550e0da115d95851ce33c6 Mon Sep 17 00:00:00 2001
From: Ben Gardon <bgardon(a)google.com>
Date: Mon, 8 Apr 2019 11:07:30 -0700
Subject: [PATCH] kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..9b7b731a0032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -126,7 +126,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
}
#define KVM_PERMILLE_MMU_PAGES 20
-#define KVM_MIN_ALLOC_MMU_PAGES 64
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
@@ -844,9 +844,9 @@ enum kvm_irqchip_mode {
};
struct kvm_arch {
- unsigned int n_used_mmu_pages;
- unsigned int n_requested_mmu_pages;
- unsigned int n_max_mmu_pages;
+ unsigned long n_used_mmu_pages;
+ unsigned long n_requested_mmu_pages;
+ unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@@ -1256,8 +1256,8 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f753728953..e10962dfc203 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2007,7 +2007,7 @@ static int is_empty_shadow_page(u64 *spt)
* aggregate version in order to make the slab shrinker
* faster
*/
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
{
kvm->arch.n_used_mmu_pages += nr;
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
@@ -2763,7 +2763,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
*/
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
LIST_HEAD(invalid_list);
@@ -6031,10 +6031,10 @@ int kvm_mmu_module_init(void)
/*
* Calculate mmu pages needed for kvm.
*/
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
{
- unsigned int nr_mmu_pages;
- unsigned int nr_pages = 0;
+ unsigned long nr_mmu_pages;
+ unsigned long nr_pages = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
@@ -6047,8 +6047,7 @@ unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
}
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages,
- (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
return nr_mmu_pages;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbdc60f2fae8..54c2a377795b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,7 +64,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
-static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
{
if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
return kvm->arch.n_max_mmu_pages -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..455f156f56ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4270,7 +4270,7 @@ static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
}
static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
- u32 kvm_nr_mmu_pages)
+ unsigned long kvm_nr_mmu_pages)
{
if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
return -EINVAL;
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
-static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
+static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
{
return kvm->arch.n_max_mmu_pages;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc8a3d8925a8fa09fa550e0da115d95851ce33c6 Mon Sep 17 00:00:00 2001
From: Ben Gardon <bgardon(a)google.com>
Date: Mon, 8 Apr 2019 11:07:30 -0700
Subject: [PATCH] kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..9b7b731a0032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -126,7 +126,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
}
#define KVM_PERMILLE_MMU_PAGES 20
-#define KVM_MIN_ALLOC_MMU_PAGES 64
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
@@ -844,9 +844,9 @@ enum kvm_irqchip_mode {
};
struct kvm_arch {
- unsigned int n_used_mmu_pages;
- unsigned int n_requested_mmu_pages;
- unsigned int n_max_mmu_pages;
+ unsigned long n_used_mmu_pages;
+ unsigned long n_requested_mmu_pages;
+ unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@@ -1256,8 +1256,8 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f753728953..e10962dfc203 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2007,7 +2007,7 @@ static int is_empty_shadow_page(u64 *spt)
* aggregate version in order to make the slab shrinker
* faster
*/
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
{
kvm->arch.n_used_mmu_pages += nr;
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
@@ -2763,7 +2763,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
*/
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
LIST_HEAD(invalid_list);
@@ -6031,10 +6031,10 @@ int kvm_mmu_module_init(void)
/*
* Calculate mmu pages needed for kvm.
*/
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
{
- unsigned int nr_mmu_pages;
- unsigned int nr_pages = 0;
+ unsigned long nr_mmu_pages;
+ unsigned long nr_pages = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
@@ -6047,8 +6047,7 @@ unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
}
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages,
- (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
return nr_mmu_pages;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbdc60f2fae8..54c2a377795b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,7 +64,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
-static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
{
if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
return kvm->arch.n_max_mmu_pages -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..455f156f56ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4270,7 +4270,7 @@ static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
}
static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
- u32 kvm_nr_mmu_pages)
+ unsigned long kvm_nr_mmu_pages)
{
if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
return -EINVAL;
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
-static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
+static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
{
return kvm->arch.n_max_mmu_pages;
}
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc8a3d8925a8fa09fa550e0da115d95851ce33c6 Mon Sep 17 00:00:00 2001
From: Ben Gardon <bgardon(a)google.com>
Date: Mon, 8 Apr 2019 11:07:30 -0700
Subject: [PATCH] kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..9b7b731a0032 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -126,7 +126,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
}
#define KVM_PERMILLE_MMU_PAGES 20
-#define KVM_MIN_ALLOC_MMU_PAGES 64
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
@@ -844,9 +844,9 @@ enum kvm_irqchip_mode {
};
struct kvm_arch {
- unsigned int n_used_mmu_pages;
- unsigned int n_requested_mmu_pages;
- unsigned int n_max_mmu_pages;
+ unsigned long n_used_mmu_pages;
+ unsigned long n_requested_mmu_pages;
+ unsigned long n_max_mmu_pages;
unsigned int indirect_shadow_pages;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@@ -1256,8 +1256,8 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f753728953..e10962dfc203 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2007,7 +2007,7 @@ static int is_empty_shadow_page(u64 *spt)
* aggregate version in order to make the slab shrinker
* faster
*/
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
{
kvm->arch.n_used_mmu_pages += nr;
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
@@ -2763,7 +2763,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
*/
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
LIST_HEAD(invalid_list);
@@ -6031,10 +6031,10 @@ int kvm_mmu_module_init(void)
/*
* Calculate mmu pages needed for kvm.
*/
-unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
+unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
{
- unsigned int nr_mmu_pages;
- unsigned int nr_pages = 0;
+ unsigned long nr_mmu_pages;
+ unsigned long nr_pages = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int i;
@@ -6047,8 +6047,7 @@ unsigned int kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
}
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages,
- (unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
return nr_mmu_pages;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbdc60f2fae8..54c2a377795b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,7 +64,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
-static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
{
if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
return kvm->arch.n_max_mmu_pages -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..455f156f56ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4270,7 +4270,7 @@ static int kvm_vm_ioctl_set_identity_map_addr(struct kvm *kvm,
}
static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
- u32 kvm_nr_mmu_pages)
+ unsigned long kvm_nr_mmu_pages)
{
if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES)
return -EINVAL;
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
-static int kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
+static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
{
return kvm->arch.n_max_mmu_pages;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04f5866e41fb70690e28397487d8bd8eea7d712a Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange(a)redhat.com>
Date: Thu, 18 Apr 2019 17:50:52 -0700
Subject: [PATCH] coredump: fix race condition between
mmget_not_zero()/get_task_mm() and core dumping
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Jann Horn <jannh(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 70b7d80431a9..f2e7ffe6fc54 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -993,6 +993,8 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
* will only be one mm, so no big deal.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
mutex_lock(&ufile->umap_lock);
list_for_each_entry_safe (priv, next_priv, &ufile->umaps,
list) {
@@ -1007,6 +1009,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
vma->vm_flags &= ~(VM_SHARED | VM_MAYSHARE);
}
mutex_unlock(&ufile->umap_lock);
+ skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 92a91e7816d8..95ca1fe7283c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1143,6 +1143,24 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
count = -EINTR;
goto out_mm;
}
+ /*
+ * Avoid to modify vma->vm_flags
+ * without locked ops while the
+ * coredump reads the vm_flags.
+ */
+ if (!mmget_still_valid(mm)) {
+ /*
+ * Silently return "count"
+ * like if get_task_mm()
+ * failed. FIXME: should this
+ * function have returned
+ * -ESRCH if get_task_mm()
+ * failed like if
+ * get_proc_task() fails?
+ */
+ up_write(&mm->mmap_sem);
+ goto out_mm;
+ }
for (vma = mm->mmap; vma; vma = vma->vm_next) {
vma->vm_flags &= ~VM_SOFTDIRTY;
vma_set_page_prot(vma);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 89800fc7dc9d..f5de1e726356 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -629,6 +629,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
/* the various vma->vm_userfaultfd_ctx still points to it */
down_write(&mm->mmap_sem);
+ /* no task can run (and in turn coredump) yet */
+ VM_WARN_ON(!mmget_still_valid(mm));
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) {
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
@@ -883,6 +885,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -905,6 +909,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
+skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
@@ -1333,6 +1338,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -1520,6 +1527,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 0cd9f10423fb..a3fda9f024c3 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -49,6 +49,27 @@ static inline void mmdrop(struct mm_struct *mm)
__mmdrop(mm);
}
+/*
+ * This has to be called after a get_task_mm()/mmget_not_zero()
+ * followed by taking the mmap_sem for writing before modifying the
+ * vmas or anything the coredump pretends not to change from under it.
+ *
+ * NOTE: find_extend_vma() called from GUP context is the only place
+ * that can modify the "mm" (notably the vm_start/end) under mmap_sem
+ * for reading and outside the context of the process, so it is also
+ * the only case that holds the mmap_sem for reading that must call
+ * this function. Generally if the mmap_sem is hold for reading
+ * there's no need of this check after get_task_mm()/mmget_not_zero().
+ *
+ * This function can be obsoleted and the check can be removed, after
+ * the coredump code will hold the mmap_sem for writing before
+ * invoking the ->core_dump methods.
+ */
+static inline bool mmget_still_valid(struct mm_struct *mm)
+{
+ return likely(!mm->core_state);
+}
+
/**
* mmget() - Pin the address space associated with a &struct mm_struct.
* @mm: The address space to pin.
diff --git a/mm/mmap.c b/mm/mmap.c
index 41eb48d9b527..bd7b9f293b39 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -45,6 +45,7 @@
#include <linux/moduleparam.h>
#include <linux/pkeys.h>
#include <linux/oom.h>
+#include <linux/sched/mm.h>
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
@@ -2525,7 +2526,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
vma = find_vma_prev(mm, addr, &prev);
if (vma && (vma->vm_start <= addr))
return vma;
- if (!prev || expand_stack(prev, addr))
+ /* don't alter vm_end if the coredump is running */
+ if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr))
return NULL;
if (prev->vm_flags & VM_LOCKED)
populate_vma_page_range(prev, addr, prev->vm_end, NULL);
@@ -2551,6 +2553,9 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
return vma;
if (!(vma->vm_flags & VM_GROWSDOWN))
return NULL;
+ /* don't alter vm_start if the coredump is running */
+ if (!mmget_still_valid(mm))
+ return NULL;
start = vma->vm_start;
if (expand_stack(vma, addr))
return NULL;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04f5866e41fb70690e28397487d8bd8eea7d712a Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange(a)redhat.com>
Date: Thu, 18 Apr 2019 17:50:52 -0700
Subject: [PATCH] coredump: fix race condition between
mmget_not_zero()/get_task_mm() and core dumping
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Jann Horn <jannh(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 70b7d80431a9..f2e7ffe6fc54 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -993,6 +993,8 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
* will only be one mm, so no big deal.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
mutex_lock(&ufile->umap_lock);
list_for_each_entry_safe (priv, next_priv, &ufile->umaps,
list) {
@@ -1007,6 +1009,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
vma->vm_flags &= ~(VM_SHARED | VM_MAYSHARE);
}
mutex_unlock(&ufile->umap_lock);
+ skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 92a91e7816d8..95ca1fe7283c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1143,6 +1143,24 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
count = -EINTR;
goto out_mm;
}
+ /*
+ * Avoid to modify vma->vm_flags
+ * without locked ops while the
+ * coredump reads the vm_flags.
+ */
+ if (!mmget_still_valid(mm)) {
+ /*
+ * Silently return "count"
+ * like if get_task_mm()
+ * failed. FIXME: should this
+ * function have returned
+ * -ESRCH if get_task_mm()
+ * failed like if
+ * get_proc_task() fails?
+ */
+ up_write(&mm->mmap_sem);
+ goto out_mm;
+ }
for (vma = mm->mmap; vma; vma = vma->vm_next) {
vma->vm_flags &= ~VM_SOFTDIRTY;
vma_set_page_prot(vma);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 89800fc7dc9d..f5de1e726356 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -629,6 +629,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
/* the various vma->vm_userfaultfd_ctx still points to it */
down_write(&mm->mmap_sem);
+ /* no task can run (and in turn coredump) yet */
+ VM_WARN_ON(!mmget_still_valid(mm));
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) {
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
@@ -883,6 +885,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -905,6 +909,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
+skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
@@ -1333,6 +1338,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -1520,6 +1527,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 0cd9f10423fb..a3fda9f024c3 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -49,6 +49,27 @@ static inline void mmdrop(struct mm_struct *mm)
__mmdrop(mm);
}
+/*
+ * This has to be called after a get_task_mm()/mmget_not_zero()
+ * followed by taking the mmap_sem for writing before modifying the
+ * vmas or anything the coredump pretends not to change from under it.
+ *
+ * NOTE: find_extend_vma() called from GUP context is the only place
+ * that can modify the "mm" (notably the vm_start/end) under mmap_sem
+ * for reading and outside the context of the process, so it is also
+ * the only case that holds the mmap_sem for reading that must call
+ * this function. Generally if the mmap_sem is hold for reading
+ * there's no need of this check after get_task_mm()/mmget_not_zero().
+ *
+ * This function can be obsoleted and the check can be removed, after
+ * the coredump code will hold the mmap_sem for writing before
+ * invoking the ->core_dump methods.
+ */
+static inline bool mmget_still_valid(struct mm_struct *mm)
+{
+ return likely(!mm->core_state);
+}
+
/**
* mmget() - Pin the address space associated with a &struct mm_struct.
* @mm: The address space to pin.
diff --git a/mm/mmap.c b/mm/mmap.c
index 41eb48d9b527..bd7b9f293b39 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -45,6 +45,7 @@
#include <linux/moduleparam.h>
#include <linux/pkeys.h>
#include <linux/oom.h>
+#include <linux/sched/mm.h>
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
@@ -2525,7 +2526,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
vma = find_vma_prev(mm, addr, &prev);
if (vma && (vma->vm_start <= addr))
return vma;
- if (!prev || expand_stack(prev, addr))
+ /* don't alter vm_end if the coredump is running */
+ if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr))
return NULL;
if (prev->vm_flags & VM_LOCKED)
populate_vma_page_range(prev, addr, prev->vm_end, NULL);
@@ -2551,6 +2553,9 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
return vma;
if (!(vma->vm_flags & VM_GROWSDOWN))
return NULL;
+ /* don't alter vm_start if the coredump is running */
+ if (!mmget_still_valid(mm))
+ return NULL;
start = vma->vm_start;
if (expand_stack(vma, addr))
return NULL;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From be549d49115422f846b6d96ee8fd7173a5f7ceb0 Mon Sep 17 00:00:00 2001
From: Jaesoo Lee <jalee(a)purestorage.com>
Date: Tue, 9 Apr 2019 17:02:22 -0700
Subject: [PATCH] scsi: core: set result when the command cannot be dispatched
When SCSI blk-mq is enabled, there is a bug in handling errors in
scsi_queue_rq. Specifically, the bug is not setting result field of
scsi_request correctly when the dispatch of the command has been
failed. Since the upper layer code including the sg_io ioctl expects to
receive any error status from result field of scsi_request, the error is
silently ignored and this could cause data corruptions for some
applications.
Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jaesoo Lee <jalee(a)purestorage.com>
Reviewed-by: Hannes Reinecke <hare(a)suse.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 601b9f1de267..07dfc17d4824 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1706,8 +1706,12 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
ret = BLK_STS_DEV_RESOURCE;
break;
default:
+ if (unlikely(!scsi_device_online(sdev)))
+ scsi_req(req)->result = DID_NO_CONNECT << 16;
+ else
+ scsi_req(req)->result = DID_ERROR << 16;
/*
- * Make sure to release all allocated ressources when
+ * Make sure to release all allocated resources when
* we hit an error, as we will never see this command
* again.
*/
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From be549d49115422f846b6d96ee8fd7173a5f7ceb0 Mon Sep 17 00:00:00 2001
From: Jaesoo Lee <jalee(a)purestorage.com>
Date: Tue, 9 Apr 2019 17:02:22 -0700
Subject: [PATCH] scsi: core: set result when the command cannot be dispatched
When SCSI blk-mq is enabled, there is a bug in handling errors in
scsi_queue_rq. Specifically, the bug is not setting result field of
scsi_request correctly when the dispatch of the command has been
failed. Since the upper layer code including the sg_io ioctl expects to
receive any error status from result field of scsi_request, the error is
silently ignored and this could cause data corruptions for some
applications.
Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jaesoo Lee <jalee(a)purestorage.com>
Reviewed-by: Hannes Reinecke <hare(a)suse.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 601b9f1de267..07dfc17d4824 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1706,8 +1706,12 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
ret = BLK_STS_DEV_RESOURCE;
break;
default:
+ if (unlikely(!scsi_device_online(sdev)))
+ scsi_req(req)->result = DID_NO_CONNECT << 16;
+ else
+ scsi_req(req)->result = DID_ERROR << 16;
/*
- * Make sure to release all allocated ressources when
+ * Make sure to release all allocated resources when
* we hit an error, as we will never see this command
* again.
*/
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7f75591fc5a123929a29636834d1bcb8b5c9fee3 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Date: Mon, 25 Mar 2019 14:01:23 +0100
Subject: [PATCH] iio: core: fix a possible circular locking dependency
This fixes a possible circular locking dependency detected warning seen
with:
- CONFIG_PROVE_LOCKING=y
- consumer/provider IIO devices (ex: "voltage-divider" consumer of "adc")
When using the IIO consumer interface, e.g. iio_channel_get(), the consumer
device will likely call iio_read_channel_raw() or similar that rely on
'info_exist_lock' mutex.
typically:
...
mutex_lock(&chan->indio_dev->info_exist_lock);
if (chan->indio_dev->info == NULL) {
ret = -ENODEV;
goto err_unlock;
}
ret = do_some_ops()
err_unlock:
mutex_unlock(&chan->indio_dev->info_exist_lock);
return ret;
...
Same mutex is also hold in iio_device_unregister().
The following deadlock warning happens when:
- the consumer device has called an API like iio_read_channel_raw()
at least once.
- the consumer driver is unregistered, removed (unbind from sysfs)
======================================================
WARNING: possible circular locking dependency detected
4.19.24 #577 Not tainted
------------------------------------------------------
sh/372 is trying to acquire lock:
(kn->count#30){++++}, at: kernfs_remove_by_name_ns+0x3c/0x84
but task is already holding lock:
(&dev->info_exist_lock){+.+.}, at: iio_device_unregister+0x18/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&dev->info_exist_lock){+.+.}:
__mutex_lock+0x70/0xa3c
mutex_lock_nested+0x1c/0x24
iio_read_channel_raw+0x1c/0x60
iio_read_channel_info+0xa8/0xb0
dev_attr_show+0x1c/0x48
sysfs_kf_seq_show+0x84/0xec
seq_read+0x154/0x528
__vfs_read+0x2c/0x15c
vfs_read+0x8c/0x110
ksys_read+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbedefb60
-> #0 (kn->count#30){++++}:
lock_acquire+0xd8/0x268
__kernfs_remove+0x288/0x374
kernfs_remove_by_name_ns+0x3c/0x84
remove_files+0x34/0x78
sysfs_remove_group+0x40/0x9c
sysfs_remove_groups+0x24/0x34
device_remove_attrs+0x38/0x64
device_del+0x11c/0x360
cdev_device_del+0x14/0x2c
iio_device_unregister+0x24/0x60
release_nodes+0x1bc/0x200
device_release_driver_internal+0x1a0/0x230
unbind_store+0x80/0x130
kernfs_fop_write+0x100/0x1e4
__vfs_write+0x2c/0x160
vfs_write+0xa4/0x17c
ksys_write+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbe906840
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->info_exist_lock);
lock(kn->count#30);
lock(&dev->info_exist_lock);
lock(kn->count#30);
*** DEADLOCK ***
...
cdev_device_del() can be called without holding the lock. It should be safe
as info_exist_lock prevents kernelspace consumers to use the exported
routines during/after provider removal. cdev_device_del() is for userspace.
Help to reproduce:
See example: Documentation/devicetree/bindings/iio/afe/voltage-divider.txt
sysv {
compatible = "voltage-divider";
io-channels = <&adc 0>;
output-ohms = <22>;
full-ohms = <222>;
};
First, go to iio:deviceX for the "voltage-divider", do one read:
$ cd /sys/bus/iio/devices/iio:deviceX
$ cat in_voltage0_raw
Then, unbind the consumer driver. It triggers above deadlock warning.
$ cd /sys/bus/platform/drivers/iio-rescale/
$ echo sysv > unbind
Note I don't actually expect stable will pick this up all the
way back into IIO being in staging, but if's probably valid that
far back.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Fixes: ac917a81117c ("staging:iio:core set the iio_dev.info pointer to null on unregister")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 4700fd5d8c90..9c4d92115504 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1743,10 +1743,10 @@ EXPORT_SYMBOL(__iio_device_register);
**/
void iio_device_unregister(struct iio_dev *indio_dev)
{
- mutex_lock(&indio_dev->info_exist_lock);
-
cdev_device_del(&indio_dev->chrdev, &indio_dev->dev);
+ mutex_lock(&indio_dev->info_exist_lock);
+
iio_device_unregister_debugfs(indio_dev);
iio_disable_all_buffers(indio_dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7f75591fc5a123929a29636834d1bcb8b5c9fee3 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Date: Mon, 25 Mar 2019 14:01:23 +0100
Subject: [PATCH] iio: core: fix a possible circular locking dependency
This fixes a possible circular locking dependency detected warning seen
with:
- CONFIG_PROVE_LOCKING=y
- consumer/provider IIO devices (ex: "voltage-divider" consumer of "adc")
When using the IIO consumer interface, e.g. iio_channel_get(), the consumer
device will likely call iio_read_channel_raw() or similar that rely on
'info_exist_lock' mutex.
typically:
...
mutex_lock(&chan->indio_dev->info_exist_lock);
if (chan->indio_dev->info == NULL) {
ret = -ENODEV;
goto err_unlock;
}
ret = do_some_ops()
err_unlock:
mutex_unlock(&chan->indio_dev->info_exist_lock);
return ret;
...
Same mutex is also hold in iio_device_unregister().
The following deadlock warning happens when:
- the consumer device has called an API like iio_read_channel_raw()
at least once.
- the consumer driver is unregistered, removed (unbind from sysfs)
======================================================
WARNING: possible circular locking dependency detected
4.19.24 #577 Not tainted
------------------------------------------------------
sh/372 is trying to acquire lock:
(kn->count#30){++++}, at: kernfs_remove_by_name_ns+0x3c/0x84
but task is already holding lock:
(&dev->info_exist_lock){+.+.}, at: iio_device_unregister+0x18/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&dev->info_exist_lock){+.+.}:
__mutex_lock+0x70/0xa3c
mutex_lock_nested+0x1c/0x24
iio_read_channel_raw+0x1c/0x60
iio_read_channel_info+0xa8/0xb0
dev_attr_show+0x1c/0x48
sysfs_kf_seq_show+0x84/0xec
seq_read+0x154/0x528
__vfs_read+0x2c/0x15c
vfs_read+0x8c/0x110
ksys_read+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbedefb60
-> #0 (kn->count#30){++++}:
lock_acquire+0xd8/0x268
__kernfs_remove+0x288/0x374
kernfs_remove_by_name_ns+0x3c/0x84
remove_files+0x34/0x78
sysfs_remove_group+0x40/0x9c
sysfs_remove_groups+0x24/0x34
device_remove_attrs+0x38/0x64
device_del+0x11c/0x360
cdev_device_del+0x14/0x2c
iio_device_unregister+0x24/0x60
release_nodes+0x1bc/0x200
device_release_driver_internal+0x1a0/0x230
unbind_store+0x80/0x130
kernfs_fop_write+0x100/0x1e4
__vfs_write+0x2c/0x160
vfs_write+0xa4/0x17c
ksys_write+0x4c/0xac
ret_fast_syscall+0x0/0x28
0xbe906840
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->info_exist_lock);
lock(kn->count#30);
lock(&dev->info_exist_lock);
lock(kn->count#30);
*** DEADLOCK ***
...
cdev_device_del() can be called without holding the lock. It should be safe
as info_exist_lock prevents kernelspace consumers to use the exported
routines during/after provider removal. cdev_device_del() is for userspace.
Help to reproduce:
See example: Documentation/devicetree/bindings/iio/afe/voltage-divider.txt
sysv {
compatible = "voltage-divider";
io-channels = <&adc 0>;
output-ohms = <22>;
full-ohms = <222>;
};
First, go to iio:deviceX for the "voltage-divider", do one read:
$ cd /sys/bus/iio/devices/iio:deviceX
$ cat in_voltage0_raw
Then, unbind the consumer driver. It triggers above deadlock warning.
$ cd /sys/bus/platform/drivers/iio-rescale/
$ echo sysv > unbind
Note I don't actually expect stable will pick this up all the
way back into IIO being in staging, but if's probably valid that
far back.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Fixes: ac917a81117c ("staging:iio:core set the iio_dev.info pointer to null on unregister")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 4700fd5d8c90..9c4d92115504 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1743,10 +1743,10 @@ EXPORT_SYMBOL(__iio_device_register);
**/
void iio_device_unregister(struct iio_dev *indio_dev)
{
- mutex_lock(&indio_dev->info_exist_lock);
-
cdev_device_del(&indio_dev->chrdev, &indio_dev->dev);
+ mutex_lock(&indio_dev->info_exist_lock);
+
iio_device_unregister_debugfs(indio_dev);
iio_disable_all_buffers(indio_dev);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b68f3cc7d978943fcf85148165b00594c38db776 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 2 Apr 2019 08:10:48 -0700
Subject: [PATCH] KVM: x86: Always use 32-bit SMRAM save state for 32-bit
kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f3284827c432..d0d5dd44b4f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2331,12 +2331,16 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
+#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
+#else
+ return false;
+#endif
}
static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
@@ -2372,6 +2376,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
return X86EMUL_CONTINUE;
}
+#ifdef CONFIG_X86_64
static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
int n)
{
@@ -2391,6 +2396,7 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
return X86EMUL_CONTINUE;
}
+#endif
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
u64 cr0, u64 cr3, u64 cr4)
@@ -2492,6 +2498,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
+#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
const char *smstate)
{
@@ -2554,6 +2561,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
+#endif
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
@@ -2621,9 +2629,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->pre_leave_smm(ctxt, buf))
return X86EMUL_UNHANDLEABLE;
+#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
ret = rsm_load_state_64(ctxt, buf);
else
+#endif
ret = rsm_load_state_32(ctxt, buf);
if (ret != X86EMUL_CONTINUE) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 472bbbbe153a..f10fef561573 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7441,9 +7441,9 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
}
+#ifdef CONFIG_X86_64
static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
{
-#ifdef CONFIG_X86_64
struct desc_ptr dt;
struct kvm_segment seg;
unsigned long val;
@@ -7493,10 +7493,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
for (i = 0; i < 6; i++)
enter_smm_save_seg_64(vcpu, buf, i);
-#else
- WARN_ON_ONCE(1);
-#endif
}
+#endif
static void enter_smm(struct kvm_vcpu *vcpu)
{
@@ -7507,9 +7505,11 @@ static void enter_smm(struct kvm_vcpu *vcpu)
trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
memset(buf, 0, 512);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, buf);
else
+#endif
enter_smm_save_state_32(vcpu, buf);
/*
@@ -7567,8 +7567,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
kvm_x86_ops->set_efer(vcpu, 0);
+#endif
kvm_update_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b68f3cc7d978943fcf85148165b00594c38db776 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 2 Apr 2019 08:10:48 -0700
Subject: [PATCH] KVM: x86: Always use 32-bit SMRAM save state for 32-bit
kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f3284827c432..d0d5dd44b4f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2331,12 +2331,16 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
+#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
+#else
+ return false;
+#endif
}
static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
@@ -2372,6 +2376,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
return X86EMUL_CONTINUE;
}
+#ifdef CONFIG_X86_64
static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
int n)
{
@@ -2391,6 +2396,7 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
return X86EMUL_CONTINUE;
}
+#endif
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
u64 cr0, u64 cr3, u64 cr4)
@@ -2492,6 +2498,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
+#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
const char *smstate)
{
@@ -2554,6 +2561,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
+#endif
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
@@ -2621,9 +2629,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->pre_leave_smm(ctxt, buf))
return X86EMUL_UNHANDLEABLE;
+#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
ret = rsm_load_state_64(ctxt, buf);
else
+#endif
ret = rsm_load_state_32(ctxt, buf);
if (ret != X86EMUL_CONTINUE) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 472bbbbe153a..f10fef561573 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7441,9 +7441,9 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
}
+#ifdef CONFIG_X86_64
static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
{
-#ifdef CONFIG_X86_64
struct desc_ptr dt;
struct kvm_segment seg;
unsigned long val;
@@ -7493,10 +7493,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
for (i = 0; i < 6; i++)
enter_smm_save_seg_64(vcpu, buf, i);
-#else
- WARN_ON_ONCE(1);
-#endif
}
+#endif
static void enter_smm(struct kvm_vcpu *vcpu)
{
@@ -7507,9 +7505,11 @@ static void enter_smm(struct kvm_vcpu *vcpu)
trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
memset(buf, 0, 512);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, buf);
else
+#endif
enter_smm_save_state_32(vcpu, buf);
/*
@@ -7567,8 +7567,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
kvm_x86_ops->set_efer(vcpu, 0);
+#endif
kvm_update_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b68f3cc7d978943fcf85148165b00594c38db776 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 2 Apr 2019 08:10:48 -0700
Subject: [PATCH] KVM: x86: Always use 32-bit SMRAM save state for 32-bit
kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f3284827c432..d0d5dd44b4f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2331,12 +2331,16 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
+#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
+#else
+ return false;
+#endif
}
static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
@@ -2372,6 +2376,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
return X86EMUL_CONTINUE;
}
+#ifdef CONFIG_X86_64
static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
int n)
{
@@ -2391,6 +2396,7 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
return X86EMUL_CONTINUE;
}
+#endif
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
u64 cr0, u64 cr3, u64 cr4)
@@ -2492,6 +2498,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
+#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
const char *smstate)
{
@@ -2554,6 +2561,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
+#endif
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
@@ -2621,9 +2629,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->pre_leave_smm(ctxt, buf))
return X86EMUL_UNHANDLEABLE;
+#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
ret = rsm_load_state_64(ctxt, buf);
else
+#endif
ret = rsm_load_state_32(ctxt, buf);
if (ret != X86EMUL_CONTINUE) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 472bbbbe153a..f10fef561573 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7441,9 +7441,9 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
}
+#ifdef CONFIG_X86_64
static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
{
-#ifdef CONFIG_X86_64
struct desc_ptr dt;
struct kvm_segment seg;
unsigned long val;
@@ -7493,10 +7493,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
for (i = 0; i < 6; i++)
enter_smm_save_seg_64(vcpu, buf, i);
-#else
- WARN_ON_ONCE(1);
-#endif
}
+#endif
static void enter_smm(struct kvm_vcpu *vcpu)
{
@@ -7507,9 +7505,11 @@ static void enter_smm(struct kvm_vcpu *vcpu)
trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
memset(buf, 0, 512);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, buf);
else
+#endif
enter_smm_save_state_32(vcpu, buf);
/*
@@ -7567,8 +7567,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
kvm_x86_ops->set_efer(vcpu, 0);
+#endif
kvm_update_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b68f3cc7d978943fcf85148165b00594c38db776 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 2 Apr 2019 08:10:48 -0700
Subject: [PATCH] KVM: x86: Always use 32-bit SMRAM save state for 32-bit
kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f3284827c432..d0d5dd44b4f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2331,12 +2331,16 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
+#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
+#else
+ return false;
+#endif
}
static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
@@ -2372,6 +2376,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
return X86EMUL_CONTINUE;
}
+#ifdef CONFIG_X86_64
static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
int n)
{
@@ -2391,6 +2396,7 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
return X86EMUL_CONTINUE;
}
+#endif
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
u64 cr0, u64 cr3, u64 cr4)
@@ -2492,6 +2498,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
+#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
const char *smstate)
{
@@ -2554,6 +2561,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
+#endif
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
@@ -2621,9 +2629,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->pre_leave_smm(ctxt, buf))
return X86EMUL_UNHANDLEABLE;
+#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
ret = rsm_load_state_64(ctxt, buf);
else
+#endif
ret = rsm_load_state_32(ctxt, buf);
if (ret != X86EMUL_CONTINUE) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 472bbbbe153a..f10fef561573 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7441,9 +7441,9 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
}
+#ifdef CONFIG_X86_64
static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
{
-#ifdef CONFIG_X86_64
struct desc_ptr dt;
struct kvm_segment seg;
unsigned long val;
@@ -7493,10 +7493,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
for (i = 0; i < 6; i++)
enter_smm_save_seg_64(vcpu, buf, i);
-#else
- WARN_ON_ONCE(1);
-#endif
}
+#endif
static void enter_smm(struct kvm_vcpu *vcpu)
{
@@ -7507,9 +7505,11 @@ static void enter_smm(struct kvm_vcpu *vcpu)
trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
memset(buf, 0, 512);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, buf);
else
+#endif
enter_smm_save_state_32(vcpu, buf);
/*
@@ -7567,8 +7567,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
kvm_x86_ops->set_efer(vcpu, 0);
+#endif
kvm_update_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b68f3cc7d978943fcf85148165b00594c38db776 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 2 Apr 2019 08:10:48 -0700
Subject: [PATCH] KVM: x86: Always use 32-bit SMRAM save state for 32-bit
kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f3284827c432..d0d5dd44b4f4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2331,12 +2331,16 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
+#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
+#else
+ return false;
+#endif
}
static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
@@ -2372,6 +2376,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
return X86EMUL_CONTINUE;
}
+#ifdef CONFIG_X86_64
static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
int n)
{
@@ -2391,6 +2396,7 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
return X86EMUL_CONTINUE;
}
+#endif
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
u64 cr0, u64 cr3, u64 cr4)
@@ -2492,6 +2498,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
+#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
const char *smstate)
{
@@ -2554,6 +2561,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
+#endif
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
@@ -2621,9 +2629,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->pre_leave_smm(ctxt, buf))
return X86EMUL_UNHANDLEABLE;
+#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
ret = rsm_load_state_64(ctxt, buf);
else
+#endif
ret = rsm_load_state_32(ctxt, buf);
if (ret != X86EMUL_CONTINUE) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 472bbbbe153a..f10fef561573 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7441,9 +7441,9 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
}
+#ifdef CONFIG_X86_64
static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
{
-#ifdef CONFIG_X86_64
struct desc_ptr dt;
struct kvm_segment seg;
unsigned long val;
@@ -7493,10 +7493,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
for (i = 0; i < 6; i++)
enter_smm_save_seg_64(vcpu, buf, i);
-#else
- WARN_ON_ONCE(1);
-#endif
}
+#endif
static void enter_smm(struct kvm_vcpu *vcpu)
{
@@ -7507,9 +7505,11 @@ static void enter_smm(struct kvm_vcpu *vcpu)
trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
memset(buf, 0, 512);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, buf);
else
+#endif
enter_smm_save_state_32(vcpu, buf);
/*
@@ -7567,8 +7567,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
kvm_x86_ops->set_efer(vcpu, 0);
+#endif
kvm_update_cpuid(vcpu);
kvm_mmu_reset_context(vcpu);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b57a55e2200ede754e4dc9cce4ba9402544b9365 Mon Sep 17 00:00:00 2001
From: ZhangXiaoxu <zhangxiaoxu5(a)huawei.com>
Date: Sat, 6 Apr 2019 15:30:38 +0800
Subject: [PATCH] cifs: Fix lease buffer length error
There is a KASAN slab-out-of-bounds:
BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539
CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0xdd/0x12a
print_address_description+0xa7/0x540
kasan_report+0x1ff/0x550
check_memory_region+0x2f1/0x310
memcpy+0x2f/0x80
_copy_from_iter_full+0x783/0xaa0
tcp_sendmsg_locked+0x1840/0x4140
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x18c/0x490
sock_sendmsg+0xae/0x130
smb_send_kvec+0x29c/0x520
__smb_send_rqst+0x3ef/0xc60
smb_send_rqst+0x25a/0x2e0
compound_send_recv+0x9e8/0x2af0
cifs_send_recv+0x24/0x30
SMB2_open+0x35e/0x1620
open_shroot+0x27b/0x490
smb2_open_op_close+0x4e1/0x590
smb2_query_path_info+0x2ac/0x650
cifs_get_inode_info+0x1058/0x28f0
cifs_root_iget+0x3bb/0xf80
cifs_smb3_do_mount+0xe00/0x14c0
cifs_do_mount+0x15/0x20
mount_fs+0x5e/0x290
vfs_kern_mount+0x88/0x460
do_mount+0x398/0x31e0
ksys_mount+0xc6/0x150
__x64_sys_mount+0xea/0x190
do_syscall_64+0x122/0x590
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It can be reproduced by the following step:
1. samba configured with: server max protocol = SMB2_10
2. mount -o vers=default
When parse the mount version parameter, the 'ops' and 'vals'
was setted to smb30, if negotiate result is smb21, just
update the 'ops' to smb21, but the 'vals' is still smb30.
When add lease context, the iov_base is allocated with smb21
ops, but the iov_len is initiallited with the smb30. Because
the iov_len is longer than iov_base, when send the message,
copy array out of bounds.
we need to keep the 'ops' and 'vals' consistent.
Fixes: 9764c02fcbad ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5(a)huawei.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 5d6adc63ad62..b8f7262ac354 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -832,8 +832,11 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses *ses)
} else if (rsp->DialectRevision == cpu_to_le16(SMB21_PROT_ID)) {
/* ops set to 3.0 by default for default so update */
ses->server->ops = &smb21_operations;
- } else if (rsp->DialectRevision == cpu_to_le16(SMB311_PROT_ID))
+ ses->server->vals = &smb21_values;
+ } else if (rsp->DialectRevision == cpu_to_le16(SMB311_PROT_ID)) {
ses->server->ops = &smb311_operations;
+ ses->server->vals = &smb311_values;
+ }
} else if (le16_to_cpu(rsp->DialectRevision) !=
ses->server->vals->protocol_id) {
/* if requested single dialect ensure returned dialect matched */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b57a55e2200ede754e4dc9cce4ba9402544b9365 Mon Sep 17 00:00:00 2001
From: ZhangXiaoxu <zhangxiaoxu5(a)huawei.com>
Date: Sat, 6 Apr 2019 15:30:38 +0800
Subject: [PATCH] cifs: Fix lease buffer length error
There is a KASAN slab-out-of-bounds:
BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539
CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0xdd/0x12a
print_address_description+0xa7/0x540
kasan_report+0x1ff/0x550
check_memory_region+0x2f1/0x310
memcpy+0x2f/0x80
_copy_from_iter_full+0x783/0xaa0
tcp_sendmsg_locked+0x1840/0x4140
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x18c/0x490
sock_sendmsg+0xae/0x130
smb_send_kvec+0x29c/0x520
__smb_send_rqst+0x3ef/0xc60
smb_send_rqst+0x25a/0x2e0
compound_send_recv+0x9e8/0x2af0
cifs_send_recv+0x24/0x30
SMB2_open+0x35e/0x1620
open_shroot+0x27b/0x490
smb2_open_op_close+0x4e1/0x590
smb2_query_path_info+0x2ac/0x650
cifs_get_inode_info+0x1058/0x28f0
cifs_root_iget+0x3bb/0xf80
cifs_smb3_do_mount+0xe00/0x14c0
cifs_do_mount+0x15/0x20
mount_fs+0x5e/0x290
vfs_kern_mount+0x88/0x460
do_mount+0x398/0x31e0
ksys_mount+0xc6/0x150
__x64_sys_mount+0xea/0x190
do_syscall_64+0x122/0x590
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It can be reproduced by the following step:
1. samba configured with: server max protocol = SMB2_10
2. mount -o vers=default
When parse the mount version parameter, the 'ops' and 'vals'
was setted to smb30, if negotiate result is smb21, just
update the 'ops' to smb21, but the 'vals' is still smb30.
When add lease context, the iov_base is allocated with smb21
ops, but the iov_len is initiallited with the smb30. Because
the iov_len is longer than iov_base, when send the message,
copy array out of bounds.
we need to keep the 'ops' and 'vals' consistent.
Fixes: 9764c02fcbad ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5(a)huawei.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 5d6adc63ad62..b8f7262ac354 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -832,8 +832,11 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses *ses)
} else if (rsp->DialectRevision == cpu_to_le16(SMB21_PROT_ID)) {
/* ops set to 3.0 by default for default so update */
ses->server->ops = &smb21_operations;
- } else if (rsp->DialectRevision == cpu_to_le16(SMB311_PROT_ID))
+ ses->server->vals = &smb21_values;
+ } else if (rsp->DialectRevision == cpu_to_le16(SMB311_PROT_ID)) {
ses->server->ops = &smb311_operations;
+ ses->server->vals = &smb311_values;
+ }
} else if (le16_to_cpu(rsp->DialectRevision) !=
ses->server->vals->protocol_id) {
/* if requested single dialect ensure returned dialect matched */
On 4/23/19 11:21 AM, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 2dc702c991e3 HID: input: use the Resolution Multiplier for high-resolution scrolling.
>
> The bot has tested the following trees: v5.0.9.
>
> v5.0.9: Failed to apply! Possible dependencies:
> 724f54bc0063 ("HID: input: make sure the wheel high resolution multiplier is set")
>
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
>
This patch will only apply *after* the application of the first patch, since they "overlap".
Yes, that is probably not the best circumstance. I think Benjamin wanted to keep the issue in patch 2/2 distinct.
Was the first patch applied before attempting application of the second patch?
James
From: Chen-Yu Tsai <wens(a)csie.org>
The code path for Macs goes through bcm_apple_get_resources(), which
skips over the code that sets up the regulator supplies. As a result,
the call to regulator_bulk_enable() / regulator_bulk_disable() results
in a NULL pointer dereference.
This was reported on the kernel.org Bugzilla, bug 202963.
Unbreak Broadcom Bluetooth support on Intel Macs by checking if the
supplies were set up before enabling or disabling them.
The same does not need to be done for the clocks, as the common clock
framework API checks for NULL pointers.
Fixes: 75d11676dccb ("Bluetooth: hci_bcm: Add support for regulator supplies")
Cc: <stable(a)vger.kernel.org> # 5.0.x
Signed-off-by: Chen-Yu Tsai <wens(a)csie.org>
---
I do not own a Mac, so this needs to be tested by someone else.
---
drivers/bluetooth/hci_bcm.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
index ddbe518c3e5b..b5d31d583d60 100644
--- a/drivers/bluetooth/hci_bcm.c
+++ b/drivers/bluetooth/hci_bcm.c
@@ -228,9 +228,15 @@ static int bcm_gpio_set_power(struct bcm_device *dev, bool powered)
int err;
if (powered && !dev->res_enabled) {
- err = regulator_bulk_enable(BCM_NUM_SUPPLIES, dev->supplies);
- if (err)
- return err;
+ /* Intel Macs use bcm_apple_get_resources() and don't
+ * have regulator supplies configured.
+ */
+ if (dev->supplies[0].supply) {
+ err = regulator_bulk_enable(BCM_NUM_SUPPLIES,
+ dev->supplies);
+ if (err)
+ return err;
+ }
/* LPO clock needs to be 32.768 kHz */
err = clk_set_rate(dev->lpo_clk, 32768);
@@ -259,7 +265,13 @@ static int bcm_gpio_set_power(struct bcm_device *dev, bool powered)
if (!powered && dev->res_enabled) {
clk_disable_unprepare(dev->txco_clk);
clk_disable_unprepare(dev->lpo_clk);
- regulator_bulk_disable(BCM_NUM_SUPPLIES, dev->supplies);
+
+ /* Intel Macs use bcm_apple_get_resources() and don't
+ * have regulator supplies configured.
+ */
+ if (dev->supplies[0].supply)
+ regulator_bulk_disable(BCM_NUM_SUPPLIES,
+ dev->supplies);
}
/* wait for device to power on and come out of reset */
--
2.20.1
On 04/23, Greg KH wrote:
>On Mon, Apr 22, 2019 at 10:59:50PM +0800, Zhenliang Wei wrote:
>> In the fixes commit, removing SIGKILL from each thread signal mask and
>> executing "goto fatal" directly will skip the call to
>> "trace_signal_deliver". At this point, the delivery tracking of the
>> SIGKILL signal will be inaccurate.
>>
>> Therefore, we need to add trace_signal_deliver before "goto fatal"
>> after executing sigdelset.
>>
>> Note: The action[SIGKILL] must be SIG_DFL, and SEND_SIG_NOINFO matches
>> the fact that SIGKILL doesn't have any info.
>>
>> Acked-by: Christian Brauner <christian(a)brauner.io>
>> Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
>> Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
>> ---
>> kernel/signal.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c index
>> 227ba170298e..0f69ada376ef 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2441,6 +2441,7 @@ bool get_signal(struct ksignal *ksig)
>> if (signal_group_exit(signal)) {
>> ksig->info.si_signo = signr = SIGKILL;
>> sigdelset(¤t->pending.signal, SIGKILL);
>> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
>> recalc_sigpending();
>> goto fatal;
>> }
>> --
>> 2.14.1.windows.1
>>
>>
>
><formletter>
>
>This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read:
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>for how to do this properly.
>
></formletter>
Thans for your reply, is that mean I still need to add tags
Cc: stable(a)vger.kernel.org
if I want to submit this patch to the stable kernel tree?
Is my understanding correct or missing?
Wei
On 04/23, Oleg wrote:
>On 04/23, weizhenliang wrote:
>>
>> Last time Oleg suggested using SIG_DFL as the third parameter, but its type is 'void (*)(int)', but not expected 'struct k_sigaction *'.
>
>Yes I misread the signature of TRACE_EVENT(signal_deliver), and I thought you at least compiled the kernel with your patch applied ;)
>
>> How about
>> trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, &sighand->action[signr
>> - 1]); ?
>
>sure, this should fix the problem.
Sorry about that, I will pay more attention to it in the future.
And thanks for your reply, I will re-adjust the patch later.
Wei.
On Tue, Apr 23, 2019 at 09:41 PM Christian wrote:
>On Tue, Apr 23, 2019 at 01:10:52PM +0000, weizhenliang wrote:
>> On Mon, Apr 22, 2019 at 11:25 PM Oleg Nesterov <oleg(a)redhat.com> wrote:
>> >On 04/22, Zhenliang Wei wrote:
>> >>
>> >> --- a/kernel/signal.c
>> >> +++ b/kernel/signal.c
>> >> @@ -2441,6 +2441,7 @@ bool get_signal(struct ksignal *ksig)
>> >> if (signal_group_exit(signal)) {
>> >> ksig->info.si_signo = signr = SIGKILL;
>> >> sigdelset(¤t->pending.signal, SIGKILL);
>> >> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
>> >> recalc_sigpending();
>> >> goto fatal;
>> >> }
>> >
>> >Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
>>
>> Last time Oleg suggested using SIG_DFL as the third parameter, but its type is 'void (*)(int)', but not expected 'struct k_sigaction *'.
>
>Sigh, I should've caught that in the first commit.
>Although it suggests you didn't even compile your patch...
>
>>
>> How about
>> trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, &sighand->action[signr
>> - 1]); ?
>
>That should work, yes.
Sorry about that, I was too focused on the functionality and didn't notice the compile warning.
And thanks for your reply, I will pay more attention to it in the future.
Zhenliang Wei
On Mon, Apr 22, 2019 at 11:25 PM Oleg Nesterov <oleg(a)redhat.com> wrote:
>On 04/22, Zhenliang Wei wrote:
>>
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2441,6 +2441,7 @@ bool get_signal(struct ksignal *ksig)
>> if (signal_group_exit(signal)) {
>> ksig->info.si_signo = signr = SIGKILL;
>> sigdelset(¤t->pending.signal, SIGKILL);
>> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
>> recalc_sigpending();
>> goto fatal;
>> }
>
>Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Last time Oleg suggested using SIG_DFL as the third parameter, but its type is 'void (*)(int)', but not expected 'struct k_sigaction *'.
How about
trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, &sighand->action[signr - 1]);
?
Although we know that action[SIGKILL] must be SIG_DFL, in order to adapt to the parameter type, I suggest to use the original variable value.
Zhenliang Wei.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 442601e87a4769a8daba4976ec3afa5222ca211d Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Date: Fri, 8 Feb 2019 18:30:59 +0200
Subject: [PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is
incomplete
Return -E2BIG when the transfer is incomplete. The upper layer does
not retry, so not doing that is incorrect behaviour.
Cc: stable(a)vger.kernel.org
Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c
index 32a8e27c5382..cc4e642d3180 100644
--- a/drivers/char/tpm/tpm_i2c_atmel.c
+++ b/drivers/char/tpm/tpm_i2c_atmel.c
@@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len)
if (status < 0)
return status;
+ /* The upper layer does not support incomplete sends. */
+ if (status != len)
+ return -E2BIG;
+
return 0;
}
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 442601e87a4769a8daba4976ec3afa5222ca211d Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Date: Fri, 8 Feb 2019 18:30:59 +0200
Subject: [PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is
incomplete
Return -E2BIG when the transfer is incomplete. The upper layer does
not retry, so not doing that is incorrect behaviour.
Cc: stable(a)vger.kernel.org
Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c
index 32a8e27c5382..cc4e642d3180 100644
--- a/drivers/char/tpm/tpm_i2c_atmel.c
+++ b/drivers/char/tpm/tpm_i2c_atmel.c
@@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len)
if (status < 0)
return status;
+ /* The upper layer does not support incomplete sends. */
+ if (status != len)
+ return -E2BIG;
+
return 0;
}
In the event that the start address of the initrd is not aligned, but
has an aligned size, the base + size will not cover the entire initrd
image and there is a chance that the kernel will corrupt the tail of the
image.
By aligning the end of the initrd to a page boundary and then
subtracting the adjusted start address the memblock reservation will
cover all pages that contains the initrd.
Fixes: c756c592e442 ("arm64: Utilize phys_initrd_start/phys_initrd_size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
---
arch/arm64/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6bc135042f5e..7cae155e81a5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -363,7 +363,7 @@ void __init arm64_memblock_init(void)
* Otherwise, this is a no-op
*/
u64 base = phys_initrd_start & PAGE_MASK;
- u64 size = PAGE_ALIGN(phys_initrd_size);
+ u64 size = PAGE_ALIGN(phys_initrd_start + phys_initrd_size) - base;
/*
* We can only add back the initrd memory if we don't end up
--
2.18.0
From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
We introduced a bug that prevented this old device from
working. The driver would simply not be able to complete
the INIT flow while spewing this warning:
CSR addresses aren't configured
WARNING: CPU: 0 PID: 819 at drivers/net/wireless/intel/iwlwifi/pcie/drv.c:917
iwl_pci_probe+0x160/0x1e0 [iwlwifi]
Cc: stable(a)vger.kernel.org # v4.18+
Fixes: a8cbb46f831d ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Fixes: c8f1b51e506d ("iwlwifi: allow different csr flags for different device families")
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/cfg/5000.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/5000.c b/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
index 575a7022d045..3846064d51a5 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Copyright(c) 2007 - 2014 Intel Corporation. All rights reserved.
- * Copyright(c) 2018 Intel Corporation
+ * Copyright(c) 2018 - 2019 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of version 2 of the GNU General Public License as
@@ -136,6 +136,7 @@ const struct iwl_cfg iwl5350_agn_cfg = {
.ht_params = &iwl5000_ht_params,
.led_mode = IWL_LED_BLINK,
.internal_wimax_coex = true,
+ .csr = &iwl_csr_v1,
};
#define IWL_DEVICE_5150 \
--
2.20.1
Current journal_max_cmp() and journal_min_cmp() assume that smaller fifo
index indicating elder journal entries, but this is only true when fifo
index is not swapped.
Fifo structure journal.pin is implemented by a cycle buffer, if the head
index reaches highest location of the cycle buffer, it will be swapped
to 0. Once the swapping happens, it means a smaller fifo index might be
associated to a newer journal entry. So the btree node with oldest
journal entry won't be selected by btree_flush_write() to flush out to
cache device. The result is, the oldest journal entries may always has
no chance to be written into cache device, and after a reboot
bch_journal_replay() may complain some journal entries are missing.
This patch handles the fifo index swapping conditions properly, then in
btree_flush_write() the btree node with oldest journal entry can be
slected from c->flush_btree correctly.
Cc: stable(a)vger.kernel.org
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/journal.c | 47 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index bdb6f9cefe48..bc0e01151155 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -464,12 +464,47 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list)
}
/* Journalling */
-#define journal_max_cmp(l, r) \
- (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \
- fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
-#define journal_min_cmp(l, r) \
- (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \
- fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
+#define journal_max_cmp(l, r) \
+({ \
+ int l_idx, r_idx, f_idx, b_idx; \
+ bool _ret = true; \
+ \
+ l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \
+ r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \
+ f_idx = c->journal.pin.front; \
+ b_idx = c->journal.pin.back; \
+ \
+ _ret = (l_idx < r_idx); \
+ /* in case fifo back pointer is swapped */ \
+ if (b_idx < f_idx) { \
+ if (l_idx <= b_idx && r_idx >= f_idx) \
+ _ret = false; \
+ else if (l_idx >= f_idx && r_idx <= b_idx) \
+ _ret = true; \
+ } \
+ _ret; \
+})
+
+#define journal_min_cmp(l, r) \
+({ \
+ int l_idx, r_idx, f_idx, b_idx; \
+ bool _ret = true; \
+ \
+ l_idx = fifo_idx(&c->journal.pin, btree_current_write(l)->journal); \
+ r_idx = fifo_idx(&c->journal.pin, btree_current_write(r)->journal); \
+ f_idx = c->journal.pin.front; \
+ b_idx = c->journal.pin.back; \
+ \
+ _ret = (l_idx > r_idx); \
+ /* in case fifo back pointer is swapped */ \
+ if (b_idx < f_idx) { \
+ if (l_idx <= b_idx && r_idx >= f_idx) \
+ _ret = true; \
+ else if (l_idx >= f_idx && r_idx <= b_idx) \
+ _ret = false; \
+ } \
+ _ret; \
+})
static void btree_flush_write(struct cache_set *c)
{
--
2.16.4
When bcache journal initiates during running cache set, cache set
journal.blocks_free is initiated as 0. Then during journal replay if
journal_meta() is called and an empty jset is written to cache device,
journal_reclaim() is called. If there is available journal bucket to
reclaim, c->journal.blocks_free is set to numbers of blocks of a journal
bucket, which is c->sb.bucket_size >> c->block_bits.
Most of time the above process works correctly, expect the condtion
when journal space is almost full. "Almost full" means there is no free
journal bucket, but there are still free blocks in last available
bucket indexed by ja->cur_idx.
If system crashes or reboots when journal space is almost full, problem
comes. During cache set reload after the reboot, c->journal.blocks_free
is initialized as 0, when jouranl replay process writes bcache jouranl,
journal_reclaim() will be called to reclaim available journal bucket and
set c->journal.blocks_free to c->sb.bucket_size >> c->block_bits. But
there is no fully free bucket to reclaim in journal_reclaim(), so value
of c->journal.blocks_free will keep 0. If the first journal entry
processed by journal_replay() causes btree split and requires writing
journal space by journal_meta(), journal_meta() has to go into an
infinite loop to reclaim jouranl bucket, and blocks the whole cache set
to run.
Such buggy situation can be solved if we do following things before
journal replay starts,
- Recover previous value of c->journal.blocks_free in last run time,
and set it to current c->journal.blocks_free as initial value.
- Recover previous value of ja->cur_idx in last run time, and set it to
KEY_PTR of current c->journal.key as initial value.
After c->journal.blocks_free and c->journal.key are recovered, in
condition when jouranl space is almost full and cache set is reloaded,
meta journal entry from journal reply can be written into free blocks of
the last available journal bucket, then old jouranl entries can be
replayed and reclaimed for further journaling request.
This patch adds bch_journal_key_reload() to recover journal blocks_free
and key ptr value for above purpose. bch_journal_key_reload() is called
in bch_journal_read() before replying journal by bch_journal_replay().
Cc: stable(a)vger.kernel.org
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/journal.c | 87 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 5180bed911ef..a6deb16c15c8 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -143,6 +143,89 @@ reread: left = ca->sb.bucket_size - offset;
return ret;
}
+static int bch_journal_key_reload(struct cache_set *c)
+{
+ struct cache *ca;
+ unsigned int iter, n = 0;
+ struct bkey *k = &c->journal.key;
+ int ret = 0;
+
+ for_each_cache(ca, c, iter) {
+ struct journal_device *ja = &ca->journal;
+ struct bio *bio = &ja->bio;
+ struct jset *j, *data = c->journal.w[0].data;
+ struct closure cl;
+ unsigned int len, left;
+ unsigned int offset = 0, used_blocks = 0;
+ sector_t bucket = bucket_to_sector(c, ca->sb.d[ja->cur_idx]);
+
+ closure_init_stack(&cl);
+
+ while (offset < ca->sb.bucket_size) {
+reread: left = ca->sb.bucket_size - offset;
+ len = min_t(unsigned int,
+ left, PAGE_SECTORS << JSET_BITS);
+
+ bio_reset(bio);
+ bio->bi_iter.bi_sector = bucket + offset;
+ bio_set_dev(bio, ca->bdev);
+ bio->bi_iter.bi_size = len << 9;
+
+ bio->bi_end_io = journal_read_endio;
+ bio->bi_private = &cl;
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+ bch_bio_map(bio, data);
+
+ closure_bio_submit(c, bio, &cl);
+ closure_sync(&cl);
+
+ j = data;
+ while (len) {
+ size_t blocks, bytes = set_bytes(j);
+
+ if (j->magic != jset_magic(&ca->sb))
+ goto out;
+
+ if (bytes > left << 9 ||
+ bytes > PAGE_SIZE << JSET_BITS) {
+ pr_err("jset may be correpted: too big");
+ ret = -EIO;
+ goto err;
+ }
+
+ if (bytes > len << 9)
+ goto reread;
+
+ if (j->csum != csum_set(j)) {
+ pr_err("jset may be corrupted: bad csum");
+ ret = -EIO;
+ goto err;
+ }
+
+ blocks = set_blocks(j, block_bytes(c));
+ used_blocks += blocks;
+
+ offset += blocks * ca->sb.block_size;
+ len -= blocks * ca->sb.block_size;
+ j = ((void *) j) + blocks * block_bytes(ca);
+ }
+ }
+out:
+ c->journal.blocks_free =
+ (c->sb.bucket_size >> c->block_bits) -
+ used_blocks;
+
+ k->ptr[n++] = MAKE_PTR(0, bucket, ca->sb.nr_this_dev);
+ }
+
+ BUG_ON(n == 0);
+ bkey_init(k);
+ SET_KEY_PTRS(k, n);
+
+err:
+ return ret;
+}
+
int bch_journal_read(struct cache_set *c, struct list_head *list)
{
#define read_bucket(b) \
@@ -268,6 +351,10 @@ int bch_journal_read(struct cache_set *c, struct list_head *list)
struct journal_replay,
list)->j.seq;
+ /* Initial value of c->journal.blocks_free should be 0 */
+ BUG_ON(c->journal.blocks_free != 0);
+ ret = bch_journal_key_reload(c);
+
return ret;
#undef read_bucket
}
--
2.16.4
In journal_reclaim() ja->cur_idx of each cache will be update to
reclaim available journal buckets. Variable 'int n' is used to count how
many cache is successfully reclaimed, then n is set to c->journal.key
by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
loop will write the jset data onto each cache.
The problem is, if all jouranl buckets on each cache is full, the
following code in journal_reclaim(),
529 for_each_cache(ca, c, iter) {
530 struct journal_device *ja = &ca->journal;
531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
532
533 /* No space available on this device */
534 if (next == ja->discard_idx)
535 continue;
536
537 ja->cur_idx = next;
538 k->ptr[n++] = MAKE_PTR(0,
539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
540 ca->sb.nr_this_dev);
541 }
542
543 bkey_init(k);
544 SET_KEY_PTRS(k, n);
If there is no available bucket to reclaim, the if() condition at line
534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
will set KEY_PTRS field of c->journal.key to 0.
Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
journal_write_unlocked() the journal data is written in following loop,
649 for (i = 0; i < KEY_PTRS(k); i++) {
650-671 submit journal data to cache device
672 }
If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
won't be written to cache device here. If system crahed or rebooted
before bkeys of the lost journal entries written into btree nodes, data
corruption will be reported during bcache reload after rebooting the
system.
Indeed there is only one cache in a cache set, there is no need to set
KEY_PTRS field in journal_reclaim() at all. But in order to keep the
for_each_cache() logic consistent for now, this patch fixes the above
problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
available to reclaim.
Cc: stable(a)vger.kernel.org
Signed-off-by: Coly Li <colyli(a)suse.de>
---
drivers/md/bcache/journal.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 6e18057d1d82..5180bed911ef 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -541,11 +541,11 @@ static void journal_reclaim(struct cache_set *c)
ca->sb.nr_this_dev);
}
- bkey_init(k);
- SET_KEY_PTRS(k, n);
-
- if (n)
+ if (n) {
+ bkey_init(k);
+ SET_KEY_PTRS(k, n);
c->journal.blocks_free = c->sb.bucket_size >> c->block_bits;
+ }
out:
if (!journal_full(&c->journal))
__closure_wake_up(&c->journal.wait);
@@ -672,6 +672,9 @@ static void journal_write_unlocked(struct closure *cl)
ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
}
+ /* If KEY_PTRS(k) == 0, this jset gets lost in air */
+ BUG_ON(i == 0);
+
atomic_dec_bug(&fifo_back(&c->journal.pin));
bch_journal_next(&c->journal);
journal_reclaim(c);
--
2.16.4
Having a cyclic DMA, a residue 0 is not an indication of a completed
DMA. In case of cyclic DMA make sure that dma_set_residue() is called
and with this a residue of 0 is forwarded correctly to the caller.
Fixes: 3544d2878817 ("dmaengine: rcar-dmac: use result of updated get_residue in tx_status")
Signed-off-by: Dirk Behme <dirk.behme(a)de.bosch.com>
Signed-off-by: Achim Dahlhoff <Achim.Dahlhoff(a)de.bosch.com>
Signed-off-by: Hiroyuki Yokoyama <hiroyuki.yokoyama.vx(a)renesas.com>
Signed-off-by: Yao Lihua <ylhuajnu(a)outlook.com>
Cc: <stable(a)vger.kernel.org> # v4.8+
---
Note: Patch done against mainline v5.0
Changes in v2: None
Changes in v3: Move reading rchan into the spin lock protection.
drivers/dma/sh/rcar-dmac.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index 2b4f25698169..54810ffd95e2 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -1368,6 +1368,7 @@ static enum dma_status rcar_dmac_tx_status(struct dma_chan *chan,
enum dma_status status;
unsigned long flags;
unsigned int residue;
+ bool cyclic;
status = dma_cookie_status(chan, cookie, txstate);
if (status == DMA_COMPLETE || !txstate)
@@ -1375,10 +1376,11 @@ static enum dma_status rcar_dmac_tx_status(struct dma_chan *chan,
spin_lock_irqsave(&rchan->lock, flags);
residue = rcar_dmac_chan_get_residue(rchan, cookie);
+ cyclic = rchan->desc.running ? rchan->desc.running->cyclic : false;
spin_unlock_irqrestore(&rchan->lock, flags);
/* if there's no residue, the cookie is complete */
- if (!residue)
+ if (!residue && !cyclic)
return DMA_COMPLETE;
dma_set_residue(txstate, residue);
--
2.20.0
In the fixes commit, removing SIGKILL from each thread signal mask
and executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the SIGKILL
signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal"
after executing sigdelset.
Note: The action[SIGKILL] must be SIG_DFL, and SEND_SIG_NOINFO matches
the fact that SIGKILL doesn't have any info.
Acked-by: Christian Brauner <christian(a)brauner.io>
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
---
kernel/signal.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/signal.c b/kernel/signal.c
index 227ba170298e..0f69ada376ef 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2441,6 +2441,7 @@ bool get_signal(struct ksignal *ksig)
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
recalc_sigpending();
goto fatal;
}
--
2.14.1.windows.1
The patch titled
Subject: signal: trace_signal_deliver when signal_group_exit
has been removed from the -mm tree. Its filename was
signal-trace_signal_deliver-when-signal_group_exit.patch
This patch was dropped because it had testing failures
------------------------------------------------------
From: Zhenliang Wei <weizhenliang(a)huawei.com>
Subject: signal: trace_signal_deliver when signal_group_exit
In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.
Note: The action[SIGKILL] must be SIG_DFL, and SEND_SIG_NOINFO matches the
fact that SIGKILL doesn't have any info.
Link: http://lkml.kernel.org/r/20190422145950.78056-1-weizhenliang@huawei.com
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
Acked-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Ivan Delalande <colona(a)arista.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/signal.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/signal.c~signal-trace_signal_deliver-when-signal_group_exit
+++ a/kernel/signal.c
@@ -2441,6 +2441,7 @@ relock:
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
recalc_sigpending();
goto fatal;
}
_
Patches currently in -mm which might be from weizhenliang(a)huawei.com are
Currently in case of mixed slow bus topologie and all i2c devices
support FM+ speed, the i3c subsystem limite the SCL to FM speed.
Also in case on mixed slow bus mode the max speed for both
i2c or i3c transfers is FM or FM+.
This patch fix the definition of i2c and i3c scl rate based on bus
topologie and LVR[4] if no user input.
In case of mixed slow mode the i3c scl rate is overridden.
Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure")
Signed-off-by: Vitor Soares <vitor.soares(a)synopsys.com>
Cc: Boris Brezillon <bbrezillon(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
---
drivers/i3c/master.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/drivers/i3c/master.c b/drivers/i3c/master.c
index 909c2ad..1c4a86a 100644
--- a/drivers/i3c/master.c
+++ b/drivers/i3c/master.c
@@ -564,20 +564,30 @@ static const struct device_type i3c_masterdev_type = {
.groups = i3c_masterdev_groups,
};
-int i3c_bus_set_mode(struct i3c_bus *i3cbus, enum i3c_bus_mode mode)
+int i3c_bus_set_mode(struct i3c_bus *i3cbus, enum i3c_bus_mode mode,
+ unsigned long i2c_scl_rate)
{
i3cbus->mode = mode;
- if (!i3cbus->scl_rate.i3c)
- i3cbus->scl_rate.i3c = I3C_BUS_TYP_I3C_SCL_RATE;
-
- if (!i3cbus->scl_rate.i2c) {
- if (i3cbus->mode == I3C_BUS_MODE_MIXED_SLOW)
- i3cbus->scl_rate.i2c = I3C_BUS_I2C_FM_SCL_RATE;
- else
- i3cbus->scl_rate.i2c = I3C_BUS_I2C_FM_PLUS_SCL_RATE;
+ switch (i3cbus->mode) {
+ case I3C_BUS_MODE_PURE:
+ if (!i3cbus->scl_rate.i3c)
+ i3cbus->scl_rate.i3c = I3C_BUS_TYP_I3C_SCL_RATE;
+ break;
+ case I3C_BUS_MODE_MIXED_FAST:
+ if (!i3cbus->scl_rate.i3c)
+ i3cbus->scl_rate.i3c = I3C_BUS_TYP_I3C_SCL_RATE;
+ if (!i3cbus->scl_rate.i2c)
+ i3cbus->scl_rate.i2c = i2c_scl_rate;
+ break;
+ case I3C_BUS_MODE_MIXED_SLOW:
+ if (!i3cbus->scl_rate.i2c)
+ i3cbus->scl_rate.i2c = i2c_scl_rate;
+ i3cbus->scl_rate.i3c = i3cbus->scl_rate.i2c;
+ break;
+ default:
+ return -EINVAL;
}
-
/*
* I3C/I2C frequency may have been overridden, check that user-provided
* values are not exceeding max possible frequency.
@@ -1980,9 +1990,6 @@ of_i3c_master_add_i2c_boardinfo(struct i3c_master_controller *master,
/* LVR is encoded in reg[2]. */
boardinfo->lvr = reg[2];
- if (boardinfo->lvr & I3C_LVR_I2C_FM_MODE)
- master->bus.scl_rate.i2c = I3C_BUS_I2C_FM_SCL_RATE;
-
list_add_tail(&boardinfo->node, &master->boardinfo.i2c);
of_node_get(node);
@@ -2432,6 +2439,7 @@ int i3c_master_register(struct i3c_master_controller *master,
const struct i3c_master_controller_ops *ops,
bool secondary)
{
+ unsigned long i2c_scl_rate = I3C_BUS_I2C_FM_PLUS_SCL_RATE;
struct i3c_bus *i3cbus = i3c_master_get_bus(master);
enum i3c_bus_mode mode = I3C_BUS_MODE_PURE;
struct i2c_dev_boardinfo *i2cbi;
@@ -2481,9 +2489,12 @@ int i3c_master_register(struct i3c_master_controller *master,
ret = -EINVAL;
goto err_put_dev;
}
+
+ if (i2cbi->lvr & I3C_LVR_I2C_FM_MODE)
+ i2c_scl_rate = I3C_BUS_I2C_FM_SCL_RATE;
}
- ret = i3c_bus_set_mode(i3cbus, mode);
+ ret = i3c_bus_set_mode(i3cbus, mode, i2c_scl_rate);
if (ret)
goto err_put_dev;
--
2.7.4
From: Helen Koike <helen.koike(a)collabora.com>
[ Upstream commit 544e784188f1dd7c797c70b213385e67d92005b6 ]
Raspberry pi board model B revison 2 have the hot plug detector gpio
active high (and not low as it was in the dts).
Signed-off-by: Helen Koike <helen.koike(a)collabora.com>
Fixes: 49ac67e0c39c ("ARM: bcm2835: Add VC4 to the device tree.")
Reviewed-by: Eric Anholt <eric(a)anholt.net>
Signed-off-by: Eric Anholt <eric(a)anholt.net>
Signed-off-by: Sasha Levin (Microsoft) <sashal(a)kernel.org>
---
arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts b/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts
index 84df85ea6296..7efde03daadd 100644
--- a/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts
+++ b/arch/arm/boot/dts/bcm2835-rpi-b-rev2.dts
@@ -26,5 +26,5 @@
};
&hdmi {
- hpd-gpios = <&gpio 46 GPIO_ACTIVE_LOW>;
+ hpd-gpios = <&gpio 46 GPIO_ACTIVE_HIGH>;
};
--
2.19.1
The patch titled
Subject: signal: trace_signal_deliver when signal_group_exit
has been added to the -mm tree. Its filename is
signal-trace_signal_deliver-when-signal_group_exit.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/signal-trace_signal_deliver-when-s…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/signal-trace_signal_deliver-when-s…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Zhenliang Wei <weizhenliang(a)huawei.com>
Subject: signal: trace_signal_deliver when signal_group_exit
In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.
Note: The action[SIGKILL] must be SIG_DFL, and SEND_SIG_NOINFO matches the
fact that SIGKILL doesn't have any info.
Link: http://lkml.kernel.org/r/20190422145950.78056-1-weizhenliang@huawei.com
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
Acked-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Ivan Delalande <colona(a)arista.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/signal.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/signal.c~signal-trace_signal_deliver-when-signal_group_exit
+++ a/kernel/signal.c
@@ -2441,6 +2441,7 @@ relock:
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, SIG_DFL);
recalc_sigpending();
goto fatal;
}
_
Patches currently in -mm which might be from weizhenliang(a)huawei.com are
signal-trace_signal_deliver-when-signal_group_exit.patch
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: seco-cec: fix building with RC_CORE=m
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Wed Mar 13 17:18:07 2019 -0400
I previously added an RC_CORE dependency here, but missed the corner
case of CONFIG_VIDEO_SECO_CEC=y with CONFIG_RC_CORE=m, which still
causes a link error:
drivers/media/platform/seco-cec/seco-cec.o: In function `secocec_probe':
seco-cec.c:(.text+0x1b8): undefined reference to `devm_rc_allocate_device'
seco-cec.c:(.text+0x2e8): undefined reference to `devm_rc_register_device'
drivers/media/platform/seco-cec/seco-cec.o: In function `secocec_irq_handler':
seco-cec.c:(.text+0xa2c): undefined reference to `rc_keydown'
Refine the dependency to disallow building the RC subdriver in this case.
This is the same logic we apply in other drivers like it.
Fixes: f27dd0ad6885 ("media: seco-cec: fix RC_CORE dependency")
Cc: <stable(a)vger.kernel.org> # 5.1
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/platform/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 1611ec23d733..9908ecde2585 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -649,7 +649,7 @@ config VIDEO_SECO_CEC
config VIDEO_SECO_RC
bool "SECO Boards IR RC5 support"
depends on VIDEO_SECO_CEC
- depends on RC_CORE
+ depends on RC_CORE=y || RC_CORE = VIDEO_SECO_CEC
help
If you say yes here you will get support for the
SECO Boards Consumer-IR in seco-cec driver.
On 4/22/19, Sasha Levin <sashal(a)kernel.org> wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.0.9, v4.19.36, v4.14.113,
> v4.9.170, v4.4.178, v3.18.138.
>
> v5.0.9: Build OK!
> v4.19.36: Build OK!
> v4.14.113: Failed to apply! Possible dependencies:
[...]
> f2a13e7cba9e ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB
> offloads")
>
>
> How should we proceed with this patch?
>
Same as with the aes-ctr fix. I guess, I should have added
Fixes: f2a13e7cba9e ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB
and OFB offloads")
Regards,
Christian
On 4/22/19, Sasha Levin <sashal(a)kernel.org> wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.0.9, v4.19.36, v4.14.113,
> v4.9.170, v4.4.178, v3.18.138.
>
> v5.0.9: Build OK!
> v4.19.36: Build OK!
> v4.14.113: Failed to apply! Possible dependencies:
> fc340115ffb8 ("crypto: crypto4xx - properly set IV after de- and
> encrypt")
>
> v4.9.170: Failed to apply! Possible dependencies:
> fc340115ffb8 ("crypto: crypto4xx - properly set IV after de- and
> encrypt")
>
> v4.4.178: Failed to apply! Possible dependencies:
> fc340115ffb8 ("crypto: crypto4xx - properly set IV after de- and
> encrypt")
>
> v3.18.138: Failed to apply! Possible dependencies:
> fc340115ffb8 ("crypto: crypto4xx - properly set IV after de- and
> encrypt")
>
>
> How should we proceed with this patch?
I only added AES-CTR support around a year ago.
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stabl…
(so, this is another dependency)
In my opinion back-porting the patch to 4.14 and older would not do
much since the mode is not available there. But, I'm glad that the
patches got a bit of attention.
Cheers,
Christian