As detailed in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519 and
https://wiki.debian.org/BoottimeEntropyStarvation, lack of boot-time entropy
can contribute to multi-minute pauses during system initialization in some
hardware configurations. While userspace workarounds, e.g. haveged, are
documented, the in-kernel jitter entropy collector eliminates the need for such
workarounds.
It cherry-picks cleanly to 4.19.y and 4.14.y. I'm particularly interested
in the former.
Thanks for considering this.
noah
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).
[ May be the fix for this bug ]
Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com
Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.…
Cc: stable(a)vger.kernel.org
CC: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
CC: Joel Fernandes (Google) <joel(a)joelfernandes.org>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Paul E. McKenney <paulmck(a)linux.ibm.com>
Reported-by: syzbot+774fddf07b7ab29a1e55(a)syzkaller.appspotmail.com
Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace_sched_switch.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index e288168661e1..e304196d7c28 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -89,8 +89,10 @@ static void tracing_sched_unregister(void)
static void tracing_start_sched_switch(int ops)
{
- bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
+ bool sched_register;
+
mutex_lock(&sched_register_mutex);
+ sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
switch (ops) {
case RECORD_CMDLINE:
--
2.24.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 07bfd9bdf568a38d9440c607b72342036011f727 Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Tue, 19 Nov 2019 17:41:31 +0800
Subject: [PATCH] crypto: pcrypt - Fix user-after-free on module unload
On module unload of pcrypt we must unregister the crypto algorithms
first and then tear down the padata structure. As otherwise the
crypto algorithms are still alive and can be used while the padata
structure is being freed.
Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto...")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 543792e0ebf0..81bbea7f2ba6 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -362,11 +362,12 @@ static int __init pcrypt_init(void)
static void __exit pcrypt_exit(void)
{
+ crypto_unregister_template(&pcrypt_tmpl);
+
pcrypt_fini_padata(pencrypt);
pcrypt_fini_padata(pdecrypt);
kset_unregister(pcrypt_kset);
- crypto_unregister_template(&pcrypt_tmpl);
}
subsys_initcall(pcrypt_init);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 07bfd9bdf568a38d9440c607b72342036011f727 Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Tue, 19 Nov 2019 17:41:31 +0800
Subject: [PATCH] crypto: pcrypt - Fix user-after-free on module unload
On module unload of pcrypt we must unregister the crypto algorithms
first and then tear down the padata structure. As otherwise the
crypto algorithms are still alive and can be used while the padata
structure is being freed.
Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto...")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 543792e0ebf0..81bbea7f2ba6 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -362,11 +362,12 @@ static int __init pcrypt_init(void)
static void __exit pcrypt_exit(void)
{
+ crypto_unregister_template(&pcrypt_tmpl);
+
pcrypt_fini_padata(pencrypt);
pcrypt_fini_padata(pdecrypt);
kset_unregister(pcrypt_kset);
- crypto_unregister_template(&pcrypt_tmpl);
}
subsys_initcall(pcrypt_init);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 07bfd9bdf568a38d9440c607b72342036011f727 Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Tue, 19 Nov 2019 17:41:31 +0800
Subject: [PATCH] crypto: pcrypt - Fix user-after-free on module unload
On module unload of pcrypt we must unregister the crypto algorithms
first and then tear down the padata structure. As otherwise the
crypto algorithms are still alive and can be used while the padata
structure is being freed.
Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto...")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 543792e0ebf0..81bbea7f2ba6 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -362,11 +362,12 @@ static int __init pcrypt_init(void)
static void __exit pcrypt_exit(void)
{
+ crypto_unregister_template(&pcrypt_tmpl);
+
pcrypt_fini_padata(pencrypt);
pcrypt_fini_padata(pdecrypt);
kset_unregister(pcrypt_kset);
- crypto_unregister_template(&pcrypt_tmpl);
}
subsys_initcall(pcrypt_init);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e93cd35101b61e4c79149be2cfc927c4b28dc60c Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 28 Nov 2019 18:22:00 +0100
Subject: [PATCH] rsi: fix use-after-free on failed probe and unbind
Make sure to stop both URBs before returning after failed probe as well
as on disconnect to avoid use-after-free in the completion handler.
Reported-by: syzbot+b563b7f8dbe8223a51e8(a)syzkaller.appspotmail.com
Fixes: a4302bff28e2 ("rsi: add bluetooth rx endpoint")
Fixes: dad0d04fa7ba ("rsi: Add RS9113 wireless driver")
Cc: stable <stable(a)vger.kernel.org> # 3.15
Cc: Siva Rebbagondla <siva.rebbagondla(a)redpinesignals.com>
Cc: Prameela Rani Garnepudi <prameela.j04cs(a)gmail.com>
Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com>
Cc: Fariya Fatima <fariyaf(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c
index 53f41fc2cadf..30bed719486e 100644
--- a/drivers/net/wireless/rsi/rsi_91x_usb.c
+++ b/drivers/net/wireless/rsi/rsi_91x_usb.c
@@ -292,6 +292,15 @@ static void rsi_rx_done_handler(struct urb *urb)
dev_kfree_skb(rx_cb->rx_skb);
}
+static void rsi_rx_urb_kill(struct rsi_hw *adapter, u8 ep_num)
+{
+ struct rsi_91x_usbdev *dev = (struct rsi_91x_usbdev *)adapter->rsi_dev;
+ struct rx_usb_ctrl_block *rx_cb = &dev->rx_cb[ep_num - 1];
+ struct urb *urb = rx_cb->rx_urb;
+
+ usb_kill_urb(urb);
+}
+
/**
* rsi_rx_urb_submit() - This function submits the given URB to the USB stack.
* @adapter: Pointer to the adapter structure.
@@ -823,10 +832,13 @@ static int rsi_probe(struct usb_interface *pfunction,
if (adapter->priv->coex_mode > 1) {
status = rsi_rx_urb_submit(adapter, BT_EP);
if (status)
- goto err1;
+ goto err_kill_wlan_urb;
}
return 0;
+
+err_kill_wlan_urb:
+ rsi_rx_urb_kill(adapter, WLAN_EP);
err1:
rsi_deinit_usb_interface(adapter);
err:
@@ -857,6 +869,10 @@ static void rsi_disconnect(struct usb_interface *pfunction)
adapter->priv->bt_adapter = NULL;
}
+ if (adapter->priv->coex_mode > 1)
+ rsi_rx_urb_kill(adapter, BT_EP);
+ rsi_rx_urb_kill(adapter, WLAN_EP);
+
rsi_reset_card(adapter);
rsi_deinit_usb_interface(adapter);
rsi_91x_deinit(adapter);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e93cd35101b61e4c79149be2cfc927c4b28dc60c Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 28 Nov 2019 18:22:00 +0100
Subject: [PATCH] rsi: fix use-after-free on failed probe and unbind
Make sure to stop both URBs before returning after failed probe as well
as on disconnect to avoid use-after-free in the completion handler.
Reported-by: syzbot+b563b7f8dbe8223a51e8(a)syzkaller.appspotmail.com
Fixes: a4302bff28e2 ("rsi: add bluetooth rx endpoint")
Fixes: dad0d04fa7ba ("rsi: Add RS9113 wireless driver")
Cc: stable <stable(a)vger.kernel.org> # 3.15
Cc: Siva Rebbagondla <siva.rebbagondla(a)redpinesignals.com>
Cc: Prameela Rani Garnepudi <prameela.j04cs(a)gmail.com>
Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com>
Cc: Fariya Fatima <fariyaf(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c
index 53f41fc2cadf..30bed719486e 100644
--- a/drivers/net/wireless/rsi/rsi_91x_usb.c
+++ b/drivers/net/wireless/rsi/rsi_91x_usb.c
@@ -292,6 +292,15 @@ static void rsi_rx_done_handler(struct urb *urb)
dev_kfree_skb(rx_cb->rx_skb);
}
+static void rsi_rx_urb_kill(struct rsi_hw *adapter, u8 ep_num)
+{
+ struct rsi_91x_usbdev *dev = (struct rsi_91x_usbdev *)adapter->rsi_dev;
+ struct rx_usb_ctrl_block *rx_cb = &dev->rx_cb[ep_num - 1];
+ struct urb *urb = rx_cb->rx_urb;
+
+ usb_kill_urb(urb);
+}
+
/**
* rsi_rx_urb_submit() - This function submits the given URB to the USB stack.
* @adapter: Pointer to the adapter structure.
@@ -823,10 +832,13 @@ static int rsi_probe(struct usb_interface *pfunction,
if (adapter->priv->coex_mode > 1) {
status = rsi_rx_urb_submit(adapter, BT_EP);
if (status)
- goto err1;
+ goto err_kill_wlan_urb;
}
return 0;
+
+err_kill_wlan_urb:
+ rsi_rx_urb_kill(adapter, WLAN_EP);
err1:
rsi_deinit_usb_interface(adapter);
err:
@@ -857,6 +869,10 @@ static void rsi_disconnect(struct usb_interface *pfunction)
adapter->priv->bt_adapter = NULL;
}
+ if (adapter->priv->coex_mode > 1)
+ rsi_rx_urb_kill(adapter, BT_EP);
+ rsi_rx_urb_kill(adapter, WLAN_EP);
+
rsi_reset_card(adapter);
rsi_deinit_usb_interface(adapter);
rsi_91x_deinit(adapter);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e93cd35101b61e4c79149be2cfc927c4b28dc60c Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 28 Nov 2019 18:22:00 +0100
Subject: [PATCH] rsi: fix use-after-free on failed probe and unbind
Make sure to stop both URBs before returning after failed probe as well
as on disconnect to avoid use-after-free in the completion handler.
Reported-by: syzbot+b563b7f8dbe8223a51e8(a)syzkaller.appspotmail.com
Fixes: a4302bff28e2 ("rsi: add bluetooth rx endpoint")
Fixes: dad0d04fa7ba ("rsi: Add RS9113 wireless driver")
Cc: stable <stable(a)vger.kernel.org> # 3.15
Cc: Siva Rebbagondla <siva.rebbagondla(a)redpinesignals.com>
Cc: Prameela Rani Garnepudi <prameela.j04cs(a)gmail.com>
Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com>
Cc: Fariya Fatima <fariyaf(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c
index 53f41fc2cadf..30bed719486e 100644
--- a/drivers/net/wireless/rsi/rsi_91x_usb.c
+++ b/drivers/net/wireless/rsi/rsi_91x_usb.c
@@ -292,6 +292,15 @@ static void rsi_rx_done_handler(struct urb *urb)
dev_kfree_skb(rx_cb->rx_skb);
}
+static void rsi_rx_urb_kill(struct rsi_hw *adapter, u8 ep_num)
+{
+ struct rsi_91x_usbdev *dev = (struct rsi_91x_usbdev *)adapter->rsi_dev;
+ struct rx_usb_ctrl_block *rx_cb = &dev->rx_cb[ep_num - 1];
+ struct urb *urb = rx_cb->rx_urb;
+
+ usb_kill_urb(urb);
+}
+
/**
* rsi_rx_urb_submit() - This function submits the given URB to the USB stack.
* @adapter: Pointer to the adapter structure.
@@ -823,10 +832,13 @@ static int rsi_probe(struct usb_interface *pfunction,
if (adapter->priv->coex_mode > 1) {
status = rsi_rx_urb_submit(adapter, BT_EP);
if (status)
- goto err1;
+ goto err_kill_wlan_urb;
}
return 0;
+
+err_kill_wlan_urb:
+ rsi_rx_urb_kill(adapter, WLAN_EP);
err1:
rsi_deinit_usb_interface(adapter);
err:
@@ -857,6 +869,10 @@ static void rsi_disconnect(struct usb_interface *pfunction)
adapter->priv->bt_adapter = NULL;
}
+ if (adapter->priv->coex_mode > 1)
+ rsi_rx_urb_kill(adapter, BT_EP);
+ rsi_rx_urb_kill(adapter, WLAN_EP);
+
rsi_reset_card(adapter);
rsi_deinit_usb_interface(adapter);
rsi_91x_deinit(adapter);
The f6783319737f appears to fix a reproducible crash that we have
been experiencing while stress-testing creation and deletion of
containers.
The fix seems to apply to 4.19.y more easily if you first apply
also 5d299eabea5a ('sched/fair: Add tmp_alone_branch assertion').
Please consider including into 4.19 upstream commits
ba5d73851e71847ba7f7f4c27a1a6e1f5ab91c79
("block: cleanup __blkdev_issue_discard()")
and
4800bf7bc8c725e955fcbc6191cc872f43f506d3
("block: fix 32 bit overflow in __blkdev_issue_discard()")
Overflow of unsigned long "req_sects" (fixed in second patch)
actually exist here much longer.
And 4.19 commit 744889b7cbb56a64f957e65ade7cb65fe3f35714
("block: don't deal with discard limit in blkdev_issue_discard()")
make it worse by replacing
req_sects = min_t(sector_t, nr_sects, q->limits.max_discard_sectors);
with
unsigned int req_sects = nr_sects;
because now discard length isn't cut by max_discard_sectors it easily overflows.
As a result BLKDISCARD fails unexpectedly:
ioctl(3, BLKDISCARD, [0, 0x20000000000]) = -1 EOPNOTSUPP (Operation not supported)
From: Mark Rutland <mark.rutland(a)arm.com>
Confusingly, there are three SPSR layouts that a kernel may need to deal
with:
(1) An AArch64 SPSR_ELx view of an AArch64 pstate
(2) An AArch64 SPSR_ELx view of an AArch32 pstate
(3) An AArch32 SPSR_* view of an AArch32 pstate
When the KVM AArch32 support code deals with SPSR_{EL2,HYP}, it's either
dealing with #2 or #3 consistently. On arm64 the PSR_AA32_* definitions
match the AArch64 SPSR_ELx view, and on arm the PSR_AA32_* definitions
match the AArch32 SPSR_* view.
However, when we inject an exception into an AArch32 guest, we have to
synthesize the AArch32 SPSR_* that the guest will see. Thus, an AArch64
host needs to synthesize layout #3 from layout #2.
This patch adds a new host_spsr_to_spsr32() helper for this, and makes
use of it in the KVM AArch32 support code. For arm64 we need to shuffle
the DIT bit around, and remove the SS bit, while for arm we can use the
value as-is.
I've open-coded the bit manipulation for now to avoid having to rework
the existing PSR_* definitions into PSR64_AA32_* and PSR32_AA32_*
definitions. I hope to perform a more thorough refactoring in future so
that we can handle pstate view manipulation more consistently across the
kernel tree.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20200108134324.46500-4-mark.rutland@arm.com
---
arch/arm/include/asm/kvm_emulate.h | 5 +++++
arch/arm64/include/asm/kvm_emulate.h | 32 ++++++++++++++++++++++++++++
virt/kvm/arm/aarch32.c | 6 +++---
3 files changed, 40 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index c488c629e6c8..08d9805f613b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -53,6 +53,11 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
*__vcpu_spsr(vcpu) = v;
}
+static inline unsigned long host_spsr_to_spsr32(unsigned long spsr)
+{
+ return spsr;
+}
+
static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
u8 reg_num)
{
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f407b6bdad2e..53ea7637b7b2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -219,6 +219,38 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
}
+/*
+ * The layout of SPSR for an AArch32 state is different when observed from an
+ * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
+ * view given an AArch64 view.
+ *
+ * In ARM DDI 0487E.a see:
+ *
+ * - The AArch64 view (SPSR_EL2) in section C5.2.18, page C5-426
+ * - The AArch32 view (SPSR_abt) in section G8.2.126, page G8-6256
+ * - The AArch32 view (SPSR_und) in section G8.2.132, page G8-6280
+ *
+ * Which show the following differences:
+ *
+ * | Bit | AA64 | AA32 | Notes |
+ * +-----+------+------+-----------------------------|
+ * | 24 | DIT | J | J is RES0 in ARMv8 |
+ * | 21 | SS | DIT | SS doesn't exist in AArch32 |
+ *
+ * ... and all other bits are (currently) common.
+ */
+static inline unsigned long host_spsr_to_spsr32(unsigned long spsr)
+{
+ const unsigned long overlap = BIT(24) | BIT(21);
+ unsigned long dit = !!(spsr & PSR_AA32_DIT_BIT);
+
+ spsr &= ~overlap;
+
+ spsr |= dit << 21;
+
+ return spsr;
+}
+
static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
{
u32 mode;
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index 773cf1439081..631d397ac81b 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -129,15 +129,15 @@ static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
- unsigned long new_spsr_value = *vcpu_cpsr(vcpu);
- bool is_thumb = (new_spsr_value & PSR_AA32_T_BIT);
+ unsigned long spsr = *vcpu_cpsr(vcpu);
+ bool is_thumb = (spsr & PSR_AA32_T_BIT);
u32 return_offset = return_offsets[vect_offset >> 2][is_thumb];
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
/* Note: These now point to the banked copies */
- vcpu_write_spsr(vcpu, new_spsr_value);
+ vcpu_write_spsr(vcpu, host_spsr_to_spsr32(spsr));
*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
/* Branch to exception vector */
--
2.20.1
From: Mark Rutland <mark.rutland(a)arm.com>
When KVM injects an exception into a guest, it generates the CPSR value
from scratch, configuring CPSR.{M,A,I,T,E}, and setting all other
bits to zero.
This isn't correct, as the architecture specifies that some CPSR bits
are (conditionally) cleared or set upon an exception, and others are
unchanged from the original context.
This patch adds logic to match the architectural behaviour. To make this
simple to follow/audit/extend, documentation references are provided,
and bits are configured in order of their layout in SPSR_EL2. This
layout can be seen in the diagram on ARM DDI 0487E.a page C5-426.
Note that this code is used by both arm and arm64, and is intended to
fuction with the SPSR_EL2 and SPSR_HYP layouts.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20200108134324.46500-3-mark.rutland@arm.com
---
arch/arm/include/asm/kvm_emulate.h | 12 ++++
arch/arm64/include/asm/ptrace.h | 1 +
virt/kvm/arm/aarch32.c | 111 ++++++++++++++++++++++++++---
3 files changed, 114 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index fe55d8737a11..c488c629e6c8 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -14,13 +14,25 @@
#include <asm/cputype.h>
/* arm64 compatibility macros */
+#define PSR_AA32_MODE_FIQ FIQ_MODE
+#define PSR_AA32_MODE_SVC SVC_MODE
#define PSR_AA32_MODE_ABT ABT_MODE
#define PSR_AA32_MODE_UND UND_MODE
#define PSR_AA32_T_BIT PSR_T_BIT
+#define PSR_AA32_F_BIT PSR_F_BIT
#define PSR_AA32_I_BIT PSR_I_BIT
#define PSR_AA32_A_BIT PSR_A_BIT
#define PSR_AA32_E_BIT PSR_E_BIT
#define PSR_AA32_IT_MASK PSR_IT_MASK
+#define PSR_AA32_GE_MASK 0x000f0000
+#define PSR_AA32_DIT_BIT 0x00200000
+#define PSR_AA32_PAN_BIT 0x00400000
+#define PSR_AA32_SSBS_BIT 0x00800000
+#define PSR_AA32_Q_BIT PSR_Q_BIT
+#define PSR_AA32_V_BIT PSR_V_BIT
+#define PSR_AA32_C_BIT PSR_C_BIT
+#define PSR_AA32_Z_BIT PSR_Z_BIT
+#define PSR_AA32_N_BIT PSR_N_BIT
unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index fbebb411ae20..bf57308fcd63 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -62,6 +62,7 @@
#define PSR_AA32_I_BIT 0x00000080
#define PSR_AA32_A_BIT 0x00000100
#define PSR_AA32_E_BIT 0x00000200
+#define PSR_AA32_PAN_BIT 0x00400000
#define PSR_AA32_SSBS_BIT 0x00800000
#define PSR_AA32_DIT_BIT 0x01000000
#define PSR_AA32_Q_BIT 0x08000000
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index c4c57ba99e90..773cf1439081 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -10,6 +10,7 @@
* Author: Christoffer Dall <c.dall(a)virtualopensystems.com>
*/
+#include <linux/bits.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
@@ -28,22 +29,112 @@ static const u8 return_offsets[8][2] = {
[7] = { 4, 4 }, /* FIQ, unused */
};
+/*
+ * When an exception is taken, most CPSR fields are left unchanged in the
+ * handler. However, some are explicitly overridden (e.g. M[4:0]).
+ *
+ * The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
+ * either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
+ * obsoleted by the ARMv7 virtualization extensions and is RES0.
+ *
+ * For the SPSR layout seen from AArch32, see:
+ * - ARM DDI 0406C.d, page B1-1148
+ * - ARM DDI 0487E.a, page G8-6264
+ *
+ * For the SPSR_ELx layout for AArch32 seen from AArch64, see:
+ * - ARM DDI 0487E.a, page C5-426
+ *
+ * Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
+ * MSB to LSB.
+ */
+static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
+{
+ u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
+ unsigned long old, new;
+
+ old = *vcpu_cpsr(vcpu);
+ new = 0;
+
+ new |= (old & PSR_AA32_N_BIT);
+ new |= (old & PSR_AA32_Z_BIT);
+ new |= (old & PSR_AA32_C_BIT);
+ new |= (old & PSR_AA32_V_BIT);
+ new |= (old & PSR_AA32_Q_BIT);
+
+ // CPSR.IT[7:0] are set to zero upon any exception
+ // See ARM DDI 0487E.a, section G1.12.3
+ // See ARM DDI 0406C.d, section B1.8.3
+
+ new |= (old & PSR_AA32_DIT_BIT);
+
+ // CPSR.SSBS is set to SCTLR.DSSBS upon any exception
+ // See ARM DDI 0487E.a, page G8-6244
+ if (sctlr & BIT(31))
+ new |= PSR_AA32_SSBS_BIT;
+
+ // CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
+ // SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
+ // See ARM DDI 0487E.a, page G8-6246
+ new |= (old & PSR_AA32_PAN_BIT);
+ if (!(sctlr & BIT(23)))
+ new |= PSR_AA32_PAN_BIT;
+
+ // SS does not exist in AArch32, so ignore
+
+ // CPSR.IL is set to zero upon any exception
+ // See ARM DDI 0487E.a, page G1-5527
+
+ new |= (old & PSR_AA32_GE_MASK);
+
+ // CPSR.IT[7:0] are set to zero upon any exception
+ // See prior comment above
+
+ // CPSR.E is set to SCTLR.EE upon any exception
+ // See ARM DDI 0487E.a, page G8-6245
+ // See ARM DDI 0406C.d, page B4-1701
+ if (sctlr & BIT(25))
+ new |= PSR_AA32_E_BIT;
+
+ // CPSR.A is unchanged upon an exception to Undefined, Supervisor
+ // CPSR.A is set upon an exception to other modes
+ // See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+ // See ARM DDI 0406C.d, page B1-1182
+ new |= (old & PSR_AA32_A_BIT);
+ if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
+ new |= PSR_AA32_A_BIT;
+
+ // CPSR.I is set upon any exception
+ // See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+ // See ARM DDI 0406C.d, page B1-1182
+ new |= PSR_AA32_I_BIT;
+
+ // CPSR.F is set upon an exception to FIQ
+ // CPSR.F is unchanged upon an exception to other modes
+ // See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+ // See ARM DDI 0406C.d, page B1-1182
+ new |= (old & PSR_AA32_F_BIT);
+ if (mode == PSR_AA32_MODE_FIQ)
+ new |= PSR_AA32_F_BIT;
+
+ // CPSR.T is set to SCTLR.TE upon any exception
+ // See ARM DDI 0487E.a, page G8-5514
+ // See ARM DDI 0406C.d, page B1-1181
+ if (sctlr & BIT(30))
+ new |= PSR_AA32_T_BIT;
+
+ new |= mode;
+
+ return new;
+}
+
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
- unsigned long cpsr;
unsigned long new_spsr_value = *vcpu_cpsr(vcpu);
bool is_thumb = (new_spsr_value & PSR_AA32_T_BIT);
u32 return_offset = return_offsets[vect_offset >> 2][is_thumb];
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
- cpsr = mode | PSR_AA32_I_BIT;
-
- if (sctlr & (1 << 30))
- cpsr |= PSR_AA32_T_BIT;
- if (sctlr & (1 << 25))
- cpsr |= PSR_AA32_E_BIT;
-
- *vcpu_cpsr(vcpu) = cpsr;
+ *vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
/* Note: These now point to the banked copies */
vcpu_write_spsr(vcpu, new_spsr_value);
@@ -84,7 +175,7 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
fsr = &vcpu_cp15(vcpu, c5_DFSR);
}
- prepare_fault32(vcpu, PSR_AA32_MODE_ABT | PSR_AA32_A_BIT, vect_offset);
+ prepare_fault32(vcpu, PSR_AA32_MODE_ABT, vect_offset);
*far = addr;
--
2.20.1
From: Mark Rutland <mark.rutland(a)arm.com>
When KVM injects an exception into a guest, it generates the PSTATE
value from scratch, configuring PSTATE.{M[4:0],DAIF}, and setting all
other bits to zero.
This isn't correct, as the architecture specifies that some PSTATE bits
are (conditionally) cleared or set upon an exception, and others are
unchanged from the original context.
This patch adds logic to match the architectural behaviour. To make this
simple to follow/audit/extend, documentation references are provided,
and bits are configured in order of their layout in SPSR_EL2. This
layout can be seen in the diagram on ARM DDI 0487E.a page C5-429.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20200108134324.46500-2-mark.rutland@arm.com
---
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kvm/inject_fault.c | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 7ed9294e2004..d1bb5b69f1ce 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -49,6 +49,7 @@
#define PSR_SSBS_BIT 0x00001000
#define PSR_PAN_BIT 0x00400000
#define PSR_UAO_BIT 0x00800000
+#define PSR_DIT_BIT 0x01000000
#define PSR_V_BIT 0x10000000
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index ccdb6a051ab2..6aafc2825c1c 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -14,9 +14,6 @@
#include <asm/kvm_emulate.h>
#include <asm/esr.h>
-#define PSTATE_FAULT_BITS_64 (PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
- PSR_I_BIT | PSR_D_BIT)
-
#define CURRENT_EL_SP_EL0_VECTOR 0x0
#define CURRENT_EL_SP_ELx_VECTOR 0x200
#define LOWER_EL_AArch64_VECTOR 0x400
@@ -50,6 +47,69 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
}
+/*
+ * When an exception is taken, most PSTATE fields are left unchanged in the
+ * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
+ * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
+ * layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
+ *
+ * For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
+ * For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
+ *
+ * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
+ * MSB to LSB.
+ */
+static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
+{
+ unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+ unsigned long old, new;
+
+ old = *vcpu_cpsr(vcpu);
+ new = 0;
+
+ new |= (old & PSR_N_BIT);
+ new |= (old & PSR_Z_BIT);
+ new |= (old & PSR_C_BIT);
+ new |= (old & PSR_V_BIT);
+
+ // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+
+ new |= (old & PSR_DIT_BIT);
+
+ // PSTATE.UAO is set to zero upon any exception to AArch64
+ // See ARM DDI 0487E.a, page D5-2579.
+
+ // PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
+ // SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
+ // See ARM DDI 0487E.a, page D5-2578.
+ new |= (old & PSR_PAN_BIT);
+ if (!(sctlr & SCTLR_EL1_SPAN))
+ new |= PSR_PAN_BIT;
+
+ // PSTATE.SS is set to zero upon any exception to AArch64
+ // See ARM DDI 0487E.a, page D2-2452.
+
+ // PSTATE.IL is set to zero upon any exception to AArch64
+ // See ARM DDI 0487E.a, page D1-2306.
+
+ // PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
+ // See ARM DDI 0487E.a, page D13-3258
+ if (sctlr & SCTLR_ELx_DSSBS)
+ new |= PSR_SSBS_BIT;
+
+ // PSTATE.BTYPE is set to zero upon any exception to AArch64
+ // See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
+
+ new |= PSR_D_BIT;
+ new |= PSR_A_BIT;
+ new |= PSR_I_BIT;
+ new |= PSR_F_BIT;
+
+ new |= PSR_MODE_EL1h;
+
+ return new;
+}
+
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
{
unsigned long cpsr = *vcpu_cpsr(vcpu);
@@ -59,7 +119,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
- *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
+ *vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
vcpu_write_spsr(vcpu, cpsr);
vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
@@ -94,7 +154,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
- *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
+ *vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
vcpu_write_spsr(vcpu, cpsr);
/*
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 09ed259fac621634d51cd986aa8d65f035662658 Mon Sep 17 00:00:00 2001
From: Bin Liu <b-liu(a)ti.com>
Date: Wed, 11 Dec 2019 10:10:03 -0600
Subject: [PATCH] usb: dwc3: turn off VBUS when leaving host mode
VBUS should be turned off when leaving the host mode.
Set GCTL_PRTCAP to device mode in teardown to de-assert DRVVBUS pin to
turn off VBUS power.
Fixes: 5f94adfeed97 ("usb: dwc3: core: refactor mode initialization to its own function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bin Liu <b-liu(a)ti.com>
Signed-off-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f561c6c9e8a9..1d85c42b9c67 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1246,6 +1246,9 @@ static void dwc3_core_exit_mode(struct dwc3 *dwc)
/* do nothing */
break;
}
+
+ /* de-assert DRVVBUS for HOST and OTG mode */
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE);
}
static void dwc3_get_properties(struct dwc3 *dwc)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f1f27ad74557e39f67a8331a808b860f89254f2d Mon Sep 17 00:00:00 2001
From: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Date: Thu, 23 Jan 2020 17:09:06 +0100
Subject: [PATCH] CIFS: Fix task struct use-after-free on reconnect
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().
This leads to a use-after-free of the task struct in cifs_wake_up_task:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
Read of size 8 at addr ffff8880103e3a68 by task cifsd/630
CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x8e/0xcb
print_address_description.constprop.5+0x1d3/0x3c0
? __lock_acquire+0x31a0/0x3270
__kasan_report+0x152/0x1aa
? __lock_acquire+0x31a0/0x3270
? __lock_acquire+0x31a0/0x3270
kasan_report+0xe/0x20
__lock_acquire+0x31a0/0x3270
? __wake_up_common+0x1dc/0x630
? _raw_spin_unlock_irqrestore+0x4c/0x60
? mark_held_locks+0xf0/0xf0
? _raw_spin_unlock_irqrestore+0x39/0x60
? __wake_up_common_lock+0xd5/0x130
? __wake_up_common+0x630/0x630
lock_acquire+0x13f/0x330
? try_to_wake_up+0xa3/0x19e0
_raw_spin_lock_irqsave+0x38/0x50
? try_to_wake_up+0xa3/0x19e0
try_to_wake_up+0xa3/0x19e0
? cifs_compound_callback+0x178/0x210
? set_cpus_allowed_ptr+0x10/0x10
cifs_reconnect+0xa1c/0x15d0
? generic_ip_connect+0x1860/0x1860
? rwlock_bug.part.0+0x90/0x90
cifs_readv_from_socket+0x479/0x690
cifs_read_from_socket+0x9d/0xe0
? cifs_readv_from_socket+0x690/0x690
? mempool_resize+0x690/0x690
? rwlock_bug.part.0+0x90/0x90
? memset+0x1f/0x40
? allocate_buffers+0xff/0x340
cifs_demultiplex_thread+0x388/0x2a50
? cifs_handle_standard+0x610/0x610
? rcu_read_lock_held_common+0x120/0x120
? mark_lock+0x11b/0xc00
? __lock_acquire+0x14ed/0x3270
? __kthread_parkme+0x78/0x100
? lockdep_hardirqs_on+0x3e8/0x560
? lock_downgrade+0x6a0/0x6a0
? lockdep_hardirqs_on+0x3e8/0x560
? _raw_spin_unlock_irqrestore+0x39/0x60
? cifs_handle_standard+0x610/0x610
kthread+0x2bb/0x3a0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50
Allocated by task 649:
save_stack+0x19/0x70
__kasan_kmalloc.constprop.5+0xa6/0xf0
kmem_cache_alloc+0x107/0x320
copy_process+0x17bc/0x5370
_do_fork+0x103/0xbf0
__x64_sys_clone+0x168/0x1e0
do_syscall_64+0x9b/0xec0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
save_stack+0x19/0x70
__kasan_slab_free+0x11d/0x160
kmem_cache_free+0xb5/0x3d0
rcu_core+0x52f/0x1230
__do_softirq+0x24d/0x962
The buggy address belongs to the object at ffff8880103e32c0
which belongs to the cache task_struct of size 6016
The buggy address is located 1960 bytes inside of
6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
The buggy address belongs to the page:
page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
index:0xffff8880103e4c00 compound_mapcount: 0
raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:
spin_unlock(&GlobalMid_Lock);
mutex_unlock(&server->srv_mutex);
+ msleep(10000);
+
cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
list_for_each_safe(tmp, tmp2, &retry_list) {
mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
Fix this by holding a reference to the task struct until the MID is
freed.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc(a)cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 40705e862451..239338d57086 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1588,6 +1588,7 @@ struct mid_q_entry {
mid_callback_t *callback; /* call completion callback */
mid_handle_t *handle; /* call handle mid callback */
void *callback_data; /* general purpose pointer for callback */
+ struct task_struct *creator;
void *resp_buf; /* pointer to received SMB header */
unsigned int resp_buf_size;
int mid_state; /* wish this were enum but can not pass to wait_event */
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index 387c88704c52..fe6acfce3390 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -685,6 +685,8 @@ smb2_mid_entry_alloc(const struct smb2_sync_hdr *shdr,
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 3d2e11f85cba..cb3ee916f527 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -76,6 +76,8 @@ AllocMidQEntry(const struct smb_hdr *smb_buffer, struct TCP_Server_Info *server)
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
@@ -158,6 +160,7 @@ static void _cifs_mid_q_entry_release(struct kref *refcount)
}
}
#endif
+ put_task_struct(midEntry->creator);
mempool_free(midEntry, cifs_mid_poolp);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f1f27ad74557e39f67a8331a808b860f89254f2d Mon Sep 17 00:00:00 2001
From: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Date: Thu, 23 Jan 2020 17:09:06 +0100
Subject: [PATCH] CIFS: Fix task struct use-after-free on reconnect
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().
This leads to a use-after-free of the task struct in cifs_wake_up_task:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
Read of size 8 at addr ffff8880103e3a68 by task cifsd/630
CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x8e/0xcb
print_address_description.constprop.5+0x1d3/0x3c0
? __lock_acquire+0x31a0/0x3270
__kasan_report+0x152/0x1aa
? __lock_acquire+0x31a0/0x3270
? __lock_acquire+0x31a0/0x3270
kasan_report+0xe/0x20
__lock_acquire+0x31a0/0x3270
? __wake_up_common+0x1dc/0x630
? _raw_spin_unlock_irqrestore+0x4c/0x60
? mark_held_locks+0xf0/0xf0
? _raw_spin_unlock_irqrestore+0x39/0x60
? __wake_up_common_lock+0xd5/0x130
? __wake_up_common+0x630/0x630
lock_acquire+0x13f/0x330
? try_to_wake_up+0xa3/0x19e0
_raw_spin_lock_irqsave+0x38/0x50
? try_to_wake_up+0xa3/0x19e0
try_to_wake_up+0xa3/0x19e0
? cifs_compound_callback+0x178/0x210
? set_cpus_allowed_ptr+0x10/0x10
cifs_reconnect+0xa1c/0x15d0
? generic_ip_connect+0x1860/0x1860
? rwlock_bug.part.0+0x90/0x90
cifs_readv_from_socket+0x479/0x690
cifs_read_from_socket+0x9d/0xe0
? cifs_readv_from_socket+0x690/0x690
? mempool_resize+0x690/0x690
? rwlock_bug.part.0+0x90/0x90
? memset+0x1f/0x40
? allocate_buffers+0xff/0x340
cifs_demultiplex_thread+0x388/0x2a50
? cifs_handle_standard+0x610/0x610
? rcu_read_lock_held_common+0x120/0x120
? mark_lock+0x11b/0xc00
? __lock_acquire+0x14ed/0x3270
? __kthread_parkme+0x78/0x100
? lockdep_hardirqs_on+0x3e8/0x560
? lock_downgrade+0x6a0/0x6a0
? lockdep_hardirqs_on+0x3e8/0x560
? _raw_spin_unlock_irqrestore+0x39/0x60
? cifs_handle_standard+0x610/0x610
kthread+0x2bb/0x3a0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50
Allocated by task 649:
save_stack+0x19/0x70
__kasan_kmalloc.constprop.5+0xa6/0xf0
kmem_cache_alloc+0x107/0x320
copy_process+0x17bc/0x5370
_do_fork+0x103/0xbf0
__x64_sys_clone+0x168/0x1e0
do_syscall_64+0x9b/0xec0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
save_stack+0x19/0x70
__kasan_slab_free+0x11d/0x160
kmem_cache_free+0xb5/0x3d0
rcu_core+0x52f/0x1230
__do_softirq+0x24d/0x962
The buggy address belongs to the object at ffff8880103e32c0
which belongs to the cache task_struct of size 6016
The buggy address is located 1960 bytes inside of
6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
The buggy address belongs to the page:
page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
index:0xffff8880103e4c00 compound_mapcount: 0
raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:
spin_unlock(&GlobalMid_Lock);
mutex_unlock(&server->srv_mutex);
+ msleep(10000);
+
cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
list_for_each_safe(tmp, tmp2, &retry_list) {
mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
Fix this by holding a reference to the task struct until the MID is
freed.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc(a)cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 40705e862451..239338d57086 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1588,6 +1588,7 @@ struct mid_q_entry {
mid_callback_t *callback; /* call completion callback */
mid_handle_t *handle; /* call handle mid callback */
void *callback_data; /* general purpose pointer for callback */
+ struct task_struct *creator;
void *resp_buf; /* pointer to received SMB header */
unsigned int resp_buf_size;
int mid_state; /* wish this were enum but can not pass to wait_event */
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index 387c88704c52..fe6acfce3390 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -685,6 +685,8 @@ smb2_mid_entry_alloc(const struct smb2_sync_hdr *shdr,
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 3d2e11f85cba..cb3ee916f527 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -76,6 +76,8 @@ AllocMidQEntry(const struct smb_hdr *smb_buffer, struct TCP_Server_Info *server)
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
@@ -158,6 +160,7 @@ static void _cifs_mid_q_entry_release(struct kref *refcount)
}
}
#endif
+ put_task_struct(midEntry->creator);
mempool_free(midEntry, cifs_mid_poolp);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f1f27ad74557e39f67a8331a808b860f89254f2d Mon Sep 17 00:00:00 2001
From: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Date: Thu, 23 Jan 2020 17:09:06 +0100
Subject: [PATCH] CIFS: Fix task struct use-after-free on reconnect
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().
This leads to a use-after-free of the task struct in cifs_wake_up_task:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
Read of size 8 at addr ffff8880103e3a68 by task cifsd/630
CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x8e/0xcb
print_address_description.constprop.5+0x1d3/0x3c0
? __lock_acquire+0x31a0/0x3270
__kasan_report+0x152/0x1aa
? __lock_acquire+0x31a0/0x3270
? __lock_acquire+0x31a0/0x3270
kasan_report+0xe/0x20
__lock_acquire+0x31a0/0x3270
? __wake_up_common+0x1dc/0x630
? _raw_spin_unlock_irqrestore+0x4c/0x60
? mark_held_locks+0xf0/0xf0
? _raw_spin_unlock_irqrestore+0x39/0x60
? __wake_up_common_lock+0xd5/0x130
? __wake_up_common+0x630/0x630
lock_acquire+0x13f/0x330
? try_to_wake_up+0xa3/0x19e0
_raw_spin_lock_irqsave+0x38/0x50
? try_to_wake_up+0xa3/0x19e0
try_to_wake_up+0xa3/0x19e0
? cifs_compound_callback+0x178/0x210
? set_cpus_allowed_ptr+0x10/0x10
cifs_reconnect+0xa1c/0x15d0
? generic_ip_connect+0x1860/0x1860
? rwlock_bug.part.0+0x90/0x90
cifs_readv_from_socket+0x479/0x690
cifs_read_from_socket+0x9d/0xe0
? cifs_readv_from_socket+0x690/0x690
? mempool_resize+0x690/0x690
? rwlock_bug.part.0+0x90/0x90
? memset+0x1f/0x40
? allocate_buffers+0xff/0x340
cifs_demultiplex_thread+0x388/0x2a50
? cifs_handle_standard+0x610/0x610
? rcu_read_lock_held_common+0x120/0x120
? mark_lock+0x11b/0xc00
? __lock_acquire+0x14ed/0x3270
? __kthread_parkme+0x78/0x100
? lockdep_hardirqs_on+0x3e8/0x560
? lock_downgrade+0x6a0/0x6a0
? lockdep_hardirqs_on+0x3e8/0x560
? _raw_spin_unlock_irqrestore+0x39/0x60
? cifs_handle_standard+0x610/0x610
kthread+0x2bb/0x3a0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50
Allocated by task 649:
save_stack+0x19/0x70
__kasan_kmalloc.constprop.5+0xa6/0xf0
kmem_cache_alloc+0x107/0x320
copy_process+0x17bc/0x5370
_do_fork+0x103/0xbf0
__x64_sys_clone+0x168/0x1e0
do_syscall_64+0x9b/0xec0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
save_stack+0x19/0x70
__kasan_slab_free+0x11d/0x160
kmem_cache_free+0xb5/0x3d0
rcu_core+0x52f/0x1230
__do_softirq+0x24d/0x962
The buggy address belongs to the object at ffff8880103e32c0
which belongs to the cache task_struct of size 6016
The buggy address is located 1960 bytes inside of
6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
The buggy address belongs to the page:
page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
index:0xffff8880103e4c00 compound_mapcount: 0
raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:
spin_unlock(&GlobalMid_Lock);
mutex_unlock(&server->srv_mutex);
+ msleep(10000);
+
cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
list_for_each_safe(tmp, tmp2, &retry_list) {
mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
Fix this by holding a reference to the task struct until the MID is
freed.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc(a)cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 40705e862451..239338d57086 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1588,6 +1588,7 @@ struct mid_q_entry {
mid_callback_t *callback; /* call completion callback */
mid_handle_t *handle; /* call handle mid callback */
void *callback_data; /* general purpose pointer for callback */
+ struct task_struct *creator;
void *resp_buf; /* pointer to received SMB header */
unsigned int resp_buf_size;
int mid_state; /* wish this were enum but can not pass to wait_event */
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index 387c88704c52..fe6acfce3390 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -685,6 +685,8 @@ smb2_mid_entry_alloc(const struct smb2_sync_hdr *shdr,
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 3d2e11f85cba..cb3ee916f527 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -76,6 +76,8 @@ AllocMidQEntry(const struct smb_hdr *smb_buffer, struct TCP_Server_Info *server)
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
@@ -158,6 +160,7 @@ static void _cifs_mid_q_entry_release(struct kref *refcount)
}
}
#endif
+ put_task_struct(midEntry->creator);
mempool_free(midEntry, cifs_mid_poolp);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f1f27ad74557e39f67a8331a808b860f89254f2d Mon Sep 17 00:00:00 2001
From: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Date: Thu, 23 Jan 2020 17:09:06 +0100
Subject: [PATCH] CIFS: Fix task struct use-after-free on reconnect
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().
This leads to a use-after-free of the task struct in cifs_wake_up_task:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
Read of size 8 at addr ffff8880103e3a68 by task cifsd/630
CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x8e/0xcb
print_address_description.constprop.5+0x1d3/0x3c0
? __lock_acquire+0x31a0/0x3270
__kasan_report+0x152/0x1aa
? __lock_acquire+0x31a0/0x3270
? __lock_acquire+0x31a0/0x3270
kasan_report+0xe/0x20
__lock_acquire+0x31a0/0x3270
? __wake_up_common+0x1dc/0x630
? _raw_spin_unlock_irqrestore+0x4c/0x60
? mark_held_locks+0xf0/0xf0
? _raw_spin_unlock_irqrestore+0x39/0x60
? __wake_up_common_lock+0xd5/0x130
? __wake_up_common+0x630/0x630
lock_acquire+0x13f/0x330
? try_to_wake_up+0xa3/0x19e0
_raw_spin_lock_irqsave+0x38/0x50
? try_to_wake_up+0xa3/0x19e0
try_to_wake_up+0xa3/0x19e0
? cifs_compound_callback+0x178/0x210
? set_cpus_allowed_ptr+0x10/0x10
cifs_reconnect+0xa1c/0x15d0
? generic_ip_connect+0x1860/0x1860
? rwlock_bug.part.0+0x90/0x90
cifs_readv_from_socket+0x479/0x690
cifs_read_from_socket+0x9d/0xe0
? cifs_readv_from_socket+0x690/0x690
? mempool_resize+0x690/0x690
? rwlock_bug.part.0+0x90/0x90
? memset+0x1f/0x40
? allocate_buffers+0xff/0x340
cifs_demultiplex_thread+0x388/0x2a50
? cifs_handle_standard+0x610/0x610
? rcu_read_lock_held_common+0x120/0x120
? mark_lock+0x11b/0xc00
? __lock_acquire+0x14ed/0x3270
? __kthread_parkme+0x78/0x100
? lockdep_hardirqs_on+0x3e8/0x560
? lock_downgrade+0x6a0/0x6a0
? lockdep_hardirqs_on+0x3e8/0x560
? _raw_spin_unlock_irqrestore+0x39/0x60
? cifs_handle_standard+0x610/0x610
kthread+0x2bb/0x3a0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50
Allocated by task 649:
save_stack+0x19/0x70
__kasan_kmalloc.constprop.5+0xa6/0xf0
kmem_cache_alloc+0x107/0x320
copy_process+0x17bc/0x5370
_do_fork+0x103/0xbf0
__x64_sys_clone+0x168/0x1e0
do_syscall_64+0x9b/0xec0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
save_stack+0x19/0x70
__kasan_slab_free+0x11d/0x160
kmem_cache_free+0xb5/0x3d0
rcu_core+0x52f/0x1230
__do_softirq+0x24d/0x962
The buggy address belongs to the object at ffff8880103e32c0
which belongs to the cache task_struct of size 6016
The buggy address is located 1960 bytes inside of
6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
The buggy address belongs to the page:
page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
index:0xffff8880103e4c00 compound_mapcount: 0
raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:
spin_unlock(&GlobalMid_Lock);
mutex_unlock(&server->srv_mutex);
+ msleep(10000);
+
cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
list_for_each_safe(tmp, tmp2, &retry_list) {
mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
Fix this by holding a reference to the task struct until the MID is
freed.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
CC: Stable <stable(a)vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc(a)cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov(a)microsoft.com>
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 40705e862451..239338d57086 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1588,6 +1588,7 @@ struct mid_q_entry {
mid_callback_t *callback; /* call completion callback */
mid_handle_t *handle; /* call handle mid callback */
void *callback_data; /* general purpose pointer for callback */
+ struct task_struct *creator;
void *resp_buf; /* pointer to received SMB header */
unsigned int resp_buf_size;
int mid_state; /* wish this were enum but can not pass to wait_event */
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index 387c88704c52..fe6acfce3390 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -685,6 +685,8 @@ smb2_mid_entry_alloc(const struct smb2_sync_hdr *shdr,
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 3d2e11f85cba..cb3ee916f527 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -76,6 +76,8 @@ AllocMidQEntry(const struct smb_hdr *smb_buffer, struct TCP_Server_Info *server)
* The default is for the mid to be synchronous, so the
* default callback just wakes up the current task.
*/
+ get_task_struct(current);
+ temp->creator = current;
temp->callback = cifs_wake_up_task;
temp->callback_data = current;
@@ -158,6 +160,7 @@ static void _cifs_mid_q_entry_release(struct kref *refcount)
}
}
#endif
+ put_task_struct(midEntry->creator);
mempool_free(midEntry, cifs_mid_poolp);
}
Dear Ben Skeggs,
Please find attached a patch solving a blocking issue I encountered:
https://bugzilla.kernel.org/show_bug.cgi?id=206299
Basically, running at least a RTX2080TI on Xen makes a bad mmio error
which causes having 'mthd' pointer to be NULL in 'channv50.c'. From the
code, it's assumed to be not NULL by accessing directly 'mthd->data[0]'
which is the reason of the kernel panic. I simply check if the pointer
is not NULL before continuing.
Best regards,
Frédéric Pierret
Since commit 49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page). This was an unintentional change that hasn't been noticed
except for LTP tests which checked for the documented behavior.
There are two ways to go around this change. We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able
to migrate pages due to non-fatal reasons. Another option would be to
simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).
This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error handling
as the caller knows that it is worth to retry when err > 0.
But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of non-attempted
pages in the return value too.
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
v3: Rephrased the commit log per Michal and added Michal's Acked-by
v2: Rebased on top of the latest mainline kernel per Andrew
mm/migrate.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 86873b6..9b8eb5d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
start = i;
} else if (node != current_node) {
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ /*
+ * Possitive err means the number of failed
+ * pages to migrate. Since we are going to
+ * abort and return the number of non-migrated
+ * pages, so need incude the rest of the
+ * nr_pages that have not attempted as well.
+ */
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
err = store_status(status, start, current_node, i - start);
if (err)
goto out;
@@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
goto out_flush;
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
if (i > start) {
err = store_status(status, start, current_node, i - start);
if (err)
@@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
/* Make sure we do not overwrite the existing error */
err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+ /*
+ * Don't have to report non-attempted pages here since:
+ * - If the above loop is done gracefully there is not non-attempted
+ * page.
+ * - If the above loop is aborted to it means more fatal error
+ * happened, should return err.
+ */
if (!err1)
err1 = store_status(status, start, current_node, i - start);
if (!err)
--
1.8.3.1
Make sure to only add the size of the auth tag to the source mapping
for encryption if it is an in-place operation. Failing to do this
previously caused us to try and map auth size len bytes from a NULL
mapping and crashing if both the cryptlen and assoclen are zero.
Reported-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
Cc: stable(a)vger.kernel.org # v4.19+
---
drivers/crypto/ccree/cc_buffer_mgr.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/crypto/ccree/cc_buffer_mgr.c b/drivers/crypto/ccree/cc_buffer_mgr.c
index b938ceae7ae7..885347b5b372 100644
--- a/drivers/crypto/ccree/cc_buffer_mgr.c
+++ b/drivers/crypto/ccree/cc_buffer_mgr.c
@@ -1109,9 +1109,11 @@ int cc_map_aead_request(struct cc_drvdata *drvdata, struct aead_request *req)
}
size_to_map = req->cryptlen + areq_ctx->assoclen;
- if (areq_ctx->gen_ctx.op_type == DRV_CRYPTO_DIRECTION_ENCRYPT)
+ /* If we do in-place encryption, we also need the auth tag */
+ if ((areq_ctx->gen_ctx.op_type == DRV_CRYPTO_DIRECTION_ENCRYPT) &&
+ (req->src == req->dst)) {
size_to_map += authsize;
-
+ }
if (is_gcm4543)
size_to_map += crypto_aead_ivsize(tfm);
rc = cc_map_sg(dev, req->src, size_to_map, DMA_BIDIRECTIONAL,
--
2.25.0
To avoid context corruption on some gens, we need to hold forcewake
for long periods of time. This leads to increased energy expenditure
for mostly idle workloads.
To combat the increased power consumption, park GPU more hastily.
As the HW isn't so quick to end up in rc6, this software mechanism
supplements it. So we can apply it, across all gens.
Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Testcase: igt/i915_pm_rc6_residency/rc6-idle
References: "Add RC6 CTX corruption WA"
Cc: Stable <stable(a)vger.kernel.org> # v4.19+
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/intel_lrc.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c7d05ac7af3c..6ddc75098b28 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -197,7 +197,7 @@ void i915_gem_park(struct drm_i915_private *i915)
return;
/* Defer the actual call to __i915_gem_park() to prevent ping-pongs */
- mod_delayed_work(i915->wq, &i915->gt.idle_work, msecs_to_jiffies(100));
+ mod_delayed_work(i915->wq, &i915->gt.idle_work, 1);
}
void i915_gem_unpark(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 13e97faabaa7..284ffdfd8840 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -388,7 +388,12 @@ execlists_user_begin(struct intel_engine_execlists *execlists,
inline void
execlists_user_end(struct intel_engine_execlists *execlists)
{
+ struct intel_engine_cs *engine =
+ container_of(execlists, typeof(*engine), execlists);
+
execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
+
+ mod_delayed_work(engine->i915->wq, &engine->i915->gt.retire_work, 0);
}
static inline void
--
2.17.1
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 003461559ef7a9bd0239bae35a22ad8924d6e9ad
Gitweb: https://git.kernel.org/tip/003461559ef7a9bd0239bae35a22ad8924d6e9ad
Author: Song Liu <songliubraving(a)fb.com>
AuthorDate: Thu, 23 Jan 2020 10:11:46 -08:00
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitterDate: Tue, 28 Jan 2020 21:20:18 +01:00
perf/core: Fix mlock accounting in perf_mmap()
Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of
a perf ring buffer may lead to an integer underflow in locked memory
accounting. This may lead to the undesired behaviors, such as failures in
BPF map creation.
Address this by adjusting the accounting logic to take into account the
possibility that the amount of already locked memory may exceed the
current limit.
Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again")
Suggested-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
---
kernel/events/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2173c23..2d9aeba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5916,7 +5916,15 @@ accounting:
*/
user_lock_limit *= num_online_cpus();
- user_locked = atomic_long_read(&user->locked_vm) + user_extra;
+ user_locked = atomic_long_read(&user->locked_vm);
+
+ /*
+ * sysctl_perf_event_mlock may have changed, so that
+ * user->locked_vm > user_lock_limit
+ */
+ if (user_locked > user_lock_limit)
+ user_locked = user_lock_limit;
+ user_locked += user_extra;
if (user_locked > user_lock_limit) {
/*
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 07c5972951f088094776038006a0592a46d14bbc
Gitweb: https://git.kernel.org/tip/07c5972951f088094776038006a0592a46d14bbc
Author: Song Liu <songliubraving(a)fb.com>
AuthorDate: Wed, 22 Jan 2020 11:50:27 -08:00
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitterDate: Tue, 28 Jan 2020 21:20:19 +01:00
perf/cgroups: Install cgroup events to correct cpuctx
cgroup events are always installed in the cpuctx. However, when it is not
installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
CPU, which triggers list corruption:
[] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.
To reproduce this, we can simply run:
# perf stat -e cs -a &
# perf stat -e cs -G anycgroup
Fix this by installing it to cpuctx that contains event->ctx, and the
proper cgrp_cpuctx_list.
Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
Suggested-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20200122195027.2112449-1-songliubraving@fb.com
---
kernel/events/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d9aeba..fdb7f7e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -951,9 +951,9 @@ list_update_cgroup_event(struct perf_event *event,
/*
* Because cgroup events are always per-cpu events,
- * this will always be called from the right CPU.
+ * @ctx == &cpuctx->ctx.
*/
- cpuctx = __get_cpu_context(ctx);
+ cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
/*
* Since setting cpuctx->cgrp is conditional on the current @cgrp
@@ -979,7 +979,8 @@ list_update_cgroup_event(struct perf_event *event,
cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
if (add)
- list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
+ list_add(cpuctx_entry,
+ per_cpu_ptr(&cgrp_cpuctx_list, event->cpu));
else
list_del(cpuctx_entry);
}
From: Anurag Kumar Vulisha <anurag.kumar.vulisha(a)xilinx.com>
The current code in dwc3_gadget_ep_reclaim_completed_trb() will
check for IOC/LST bit in the event->status and returns if
IOC/LST bit is set. This logic doesn't work if multiple TRBs
are queued per request and the IOC/LST bit is set on the last
TRB of that request.
Consider an example where a queued request has multiple queued
TRBs and IOC/LST bit is set only for the last TRB. In this case,
the core generates XferComplete/XferInProgress events only for
the last TRB (since IOC/LST are set only for the last TRB). As
per the logic in dwc3_gadget_ep_reclaim_completed_trb()
event->status is checked for IOC/LST bit and returns on the
first TRB. This leaves the remaining TRBs left unhandled.
Similarly, if the gadget function enqueues an unaligned request
with sglist already in it, it should fail the same way, since we
will append another TRB to something that already uses more than
one TRB.
To aviod this, this patch changes the code to check for IOC/LST
bits in TRB->ctrl instead.
At a practical level, this patch resolves USB transfer stalls seen
with adb on dwc3 based HiKey960 after functionfs gadget added
scatter-gather support around v4.20.
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Yang Fei <fei.yang(a)intel.com>
Cc: Thinh Nguyen <thinhn(a)synopsys.com>
Cc: Tejas Joglekar <tejas.joglekar(a)synopsys.com>
Cc: Andrzej Pietrasiewicz <andrzej.p(a)collabora.com>
Cc: Jack Pham <jackp(a)codeaurora.org>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Linux USB List <linux-usb(a)vger.kernel.org>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Tejas Joglekar <tejas.joglekar(a)synopsys.com>
Reviewed-by: Thinh Nguyen <thinhn(a)synopsys.com>
Signed-off-by: Anurag Kumar Vulisha <anurag.kumar.vulisha(a)xilinx.com>
[jstultz: forward ported to mainline, reworded commit log, reworked
to only check trb->ctrl as suggested by Felipe]
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
---
v2:
* Rework to only check trb->ctrl as suggested by Felipe
* Reword the commit message to include more of Felipe's assessment
---
drivers/usb/dwc3/gadget.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 154f3f3e8cff..9a085eee1ae3 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2420,7 +2420,8 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep,
if (event->status & DEPEVT_STATUS_SHORT && !chain)
return 1;
- if (event->status & DEPEVT_STATUS_IOC)
+ if ((trb->ctrl & DWC3_TRB_CTRL_IOC) ||
+ (trb->ctrl & DWC3_TRB_CTRL_LST))
return 1;
return 0;
--
2.17.1
[ this is a fix specific to 4.4.y and 4.9.y stable trees;
4.14.y and older already contain the right fix ]
The stable 4.4.y and 4.9.y backports of the upstream commit
add9d56d7b37 ("ALSA: pcm: Avoid possible info leaks from PCM stream
buffers") dropped the check of substream->ops->copy_user as copy_user
is a new member that isn't present in the older kernels.
Although upstream drivers should work without this NULL check, it may
cause a regression with a downstream driver that sets some
inaccessible address to runtime->dma_area, leading to a crash at
worst.
Since such drivers must have ops->copy member on older kernels instead
of ops->copy_user, this patch adds the missing check of ops->copy for
fixing the regression.
Reported-and-tested-by: Andreas Schneider <asn(a)cryptomilk.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
sound/core/pcm_native.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index b9bfbf394959..59423576b1cc 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -588,7 +588,7 @@ static int snd_pcm_hw_params(struct snd_pcm_substream *substream,
runtime->boundary *= 2;
/* clear the buffer for avoiding possible kernel info leaks */
- if (runtime->dma_area)
+ if (runtime->dma_area && !substream->ops->copy)
memset(runtime->dma_area, 0, runtime->dma_bytes);
snd_pcm_timer_resolution_change(substream);
--
2.16.4
This is the backport of the following fixes for 4.19-stable:
- d84f2f5a7552 ("drivers/base/node.c: simplify
unregister_memory_block_under_nodes()")
-- Turned out to not only be a cleanup but also a fix
- 2c91f8fc6c99 ("mm/memory_hotplug: fix try_offline_node()")
-- Automatic stable backport failed due to missing dependencies.
- feee6b298916 ("mm/memory_hotplug: shrink zones when offlining memory")
-- Was marked as stable 5.0+ due to the backport complexity,, but it's also
relevant for 4.19/4.14. As I have to backport quite some cleanups
already ...
All tackle memory unplug issues, especially when memory was never
onlined (or onlining failed), paired with memory unplug. When trying to
access garbage memmaps we crash the kernel (e.g., because the derviced
pgdat pointer is broken)
To minimize manual code changes, I decided to pull in quite some cleanups.
Still some manual code changes are necessary (indicated in the individual
patches). Especially missing arm64 hot(un)plug, missing sub-section hotadd
support, and missing unification of mm/hmm.c and kernel/memremap.c requires
care.
Due to:
- 4e0d2e7ef14d ("mm, sparse: pass nid instead of pgdat to
sparse_add_one_section()")
I need:
- afe9b36ca890 ("mm/memunmap: don't access uninitialized memmap in
memunmap_pages()")
Please note that:
- 4c4b7f9ba948 ("mm/memory_hotplug: remove memory block devices
before arch_remove_memory()")
Makes big (e.g., 32TB) machines boot up slower (e.g., 2h vs 10m). There is
a performance fix in linux-next, but it does not seem to classify as a
fix for current RC / stable.
I did quite some testing with hot(un)plug, onlining/offlining of memory
blocks and memory-less/CPU-less NUMA nodes under x86_64 - the same set of
tests I run against upstream on a fairly regular basis. I compile-tested
on PowerPC, arm64, s390x, i386 and sh. I did not test any ZONE_DEVICE/HMM
thingies.
The 4.14 backport might take a bit - it would be quite a lot of patches
to backport and it is not that severely broken, so I am thinking about
simpler (less invasive) alternatives.
v2 -> v3:
- Fix inverted author information of two patches
v1 -> v2:
- Fix patch authors
- Dropped "mm/memory_hotplug: make __remove_pages() and
arch_remove_memory() never fail"
-- Only creates a minor conflict in another patch
- "mm/memory_hotplug: make remove_memory() take the device_hotplug_lock"
-- Fix wrong upstream commit id
- "mm/memory_hotplug: shrink zones when offlining memory"
- "mm/memunmap: don't access uninitialized memmap in memunmap_pages()"
-- Fix usage of wrong pfn
CCing only some people to minimize noise.
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Aneesh Kumar K.V (2):
powerpc/mm: Fix section mismatch warning
mm/memunmap: don't access uninitialized memmap in memunmap_pages()
Baoquan He (1):
drivers/base/memory.c: clean up relics in function parameters
Dan Carpenter (1):
mm, memory_hotplug: update a comment in unregister_memory()
Dan Williams (1):
mm/hotplug: kill is_dev_zone() usage in __remove_pages()
David Hildenbrand (15):
mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
mm/memory_hotplug: release memory resource after arch_remove_memory()
mm/memory_hotplug: make unregister_memory_section() never fail
mm/memory_hotplug: make __remove_section() never fail
mm/memory_hotplug: make __remove_pages() and arch_remove_memory()
never fail
s390x/mm: implement arch_remove_memory()
mm/memory_hotplug: allow arch_remove_memory() without
CONFIG_MEMORY_HOTREMOVE
drivers/base/memory: pass a block_id to init_memory_block()
mm/memory_hotplug: create memory block devices after arch_add_memory()
mm/memory_hotplug: remove memory block devices before
arch_remove_memory()
mm/memory_hotplug: make unregister_memory_block_under_nodes() never
fail
mm/memory_hotplug: remove "zone" parameter from
sparse_remove_one_section
drivers/base/node.c: simplify unregister_memory_block_under_nodes()
mm/memory_hotplug: fix try_offline_node()
mm/memory_hotplug: shrink zones when offlining memory
Oscar Salvador (1):
mm, memory_hotplug: add nid parameter to arch_remove_memory
Wei Yang (3):
mm, sparse: drop pgdat_resize_lock in sparse_add/remove_one_section()
mm, sparse: pass nid instead of pgdat to sparse_add_one_section()
drivers/base/memory.c: remove an unnecessary check on NR_MEM_SECTIONS
arch/ia64/mm/init.c | 15 +-
arch/powerpc/mm/mem.c | 25 +--
arch/powerpc/platforms/powernv/memtrace.c | 2 +-
.../platforms/pseries/hotplug-memory.c | 6 +-
arch/s390/mm/init.c | 16 +-
arch/sh/mm/init.c | 15 +-
arch/x86/mm/init_32.c | 9 +-
arch/x86/mm/init_64.c | 17 +-
drivers/acpi/acpi_memhotplug.c | 2 +-
drivers/base/memory.c | 203 +++++++++++-------
drivers/base/node.c | 52 ++---
include/linux/memory.h | 8 +-
include/linux/memory_hotplug.h | 22 +-
include/linux/mmzone.h | 3 +-
include/linux/node.h | 7 +-
kernel/memremap.c | 12 +-
mm/hmm.c | 8 +-
mm/memory_hotplug.c | 166 +++++++-------
mm/sparse.c | 27 +--
19 files changed, 303 insertions(+), 312 deletions(-)
--
2.24.1
Vipul,
vipul kumar <vipulk0511(a)gmail.com> writes:
> On Tue, Jan 21, 2020 at 11:15 PM Thomas Gleixner <tglx(a)linutronix.de> wrote:
> Measurement with the existing code:
> $ echo -n "SystemTime: " ; TZ=UTC date -Iseconds ; echo -n "RTC Time:
> " ; TZ=UTC hwclock -r ; echo -n "Uptime: " ; uptime -p
> SystemTime: 2019-12-05T17:18:37+00:00
> RTC Time: 2019-12-05 17:18:07.255341+0000
> Uptime: up 1 day, 7 minutes
>
> This sample shows a difference of 30 seconds after 1 day.
>
> Measurement with this patch:
> SystemTime: 2019-12-11T12:06:19+00:00
> RTC Time: 2019-12-11 12:06:19.083127+0000
> Uptime: up 1 day, 3 minutes
>
> With this patch, no time drift issue is observed. and tsc clocksource
> get calibration (tsc: Refined TSC clocksource calibration: 1833.333 MHz)
> which is missing
> with the existing implementation.
What's the frequency which is determined from the MSR? Something like
this in dmesg:
tsc: Detected NNN MHz processor
or
tsc: Detected NNN MHz TSC
Also please apply the debug patch below and provide a _full_ dmesg after
boot.
>> > +config X86_FEATURE_TSC_UNKNOWN_FREQ
>> > + bool "Support to skip tsc known frequency flag"
>> > + help
>> > + Include support to skip X86_FEATURE_TSC_KNOWN_FREQ flag
>> > +
>> > + X86_FEATURE_TSC_KNOWN_FREQ flag is causing time-drift on
>> Valleyview/
>> > + Baytrail SoC.
>> > + By selecting this option, user can skip
>> X86_FEATURE_TSC_KNOWN_FREQ
>> > + flag to use refine tsc freq calibration.
>>
>> This is exactly the same problem as before. How does anyone aside of you
>> know whether to enable this or not?
>>
> Go through the Documentation/kbuild/kconfig-language.rst but didn't
> find related to make
> config known to everyone. Could you please point to documentation?
Right. And there is no proper answer to this, which makes it clear that
a config option is not the right tool to use.
>> And if someone enables this option then _ALL_ platforms which utilize
>> cpu_khz_from_msr() are affected. How is that any different from your
>> previous approach? This works on local kernels where you build for a
>> specific platform and you know exactly what you're doing, but not for
>> general consumption. What should a distro do with this option?
>>
>>
> TSC frequency is already calculated in cpu_khz_from_msr() function
> before setting these flags.
Your mail client does some horrible formatting this zig-zag is
unreadable. I'm reformatting the paragraphs below.
> This patch return the same calculated TSC frequency but skipping
> those two flags. On the basis of these flags, we decide whether we
> skip the refined calibration and directly register it as a clocksource
> or use refine tsc freq calibration in init_tsc_clocksource()
> function. By default this config is disabled and if user wants to use
> refine tsc freq calibration() then only user will enable it otherwise
> it will go with existing implementations. So, I don't think so it will
> break for other ATOM SoC.
It does. I explained most of the following to you in an earlier mail
already. Let me try again.
If X86_FEATURE_TSC_RELIABLE is not set, then the TSC requires a watchdog
clocksource. But some of those SoCs do not have anything else than TSC,
so there is no watchdog available. As a consequence the TSC is not
usable for high resolution timers and NOHZ. That breaks existing systems
whether you like it or not.
If X86_FEATURE_TSC_KNOWN_FREQUENCY is not set, then this delays the
usability of the TSC for high res and NOHZ until the refined calibration
figured out that it can't calibrate. And no, we can't know that it does
not work upfront when the early TSC init happens. Clearing this flag
will not break functionality, but it changes the behaviour on boot-time
optimized systems which can obviously be considered breakage.
So no, having a config knob which might be turned on and turning working
systems into trainwrecks is simply not the way to go.
What can be done is to have a command line option which enforces refined
calibration and that option can turn off X86_FEATURE_TSC_KNOWN_FREQUENCY,
but nothing else.
> Check the cpu_khz_from_msr() function.
I know that code.
> In cpu_khz_from_msr() function we are setting
> X86_FEATURE_TSC_KNOWN_FREQ and X86_FEATURE_TSC_RELIABLE for all the
> SoC's but in native_calibrate_tsc(), we check for vendor == INTEL and
> CPUID > 0x15 and then at the end of function, will enable
> X86_FEATURE_TSC_RELIABLE for INTEL_FAM6_ATOM_GOLDMONT SoC.
>
> Do we need to set the same flag in two different functions as it will be
> set in cpu_khz_from_msr() for all SoCs ?
cpu_khz_from_msr() does not handle INTEL_FAM6_ATOM_GOLDMONT or can you
find that in tsc_msr_cpu_ids[]? Making half informed claims is not
solving anything.
Thanks,
tglx
8<------------------
--- a/arch/x86/kernel/tsc_msr.c
+++ b/arch/x86/kernel/tsc_msr.c
@@ -94,16 +94,20 @@ unsigned long cpu_khz_from_msr(void)
if (freq_desc->msr_plat) {
rdmsr(MSR_PLATFORM_INFO, lo, hi);
ratio = (lo >> 8) & 0xff;
+ pr_info("MSR_PINFO: %08x%08x -> %u\n", hi, lo, ratio);
} else {
rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
ratio = (hi >> 8) & 0x1f;
+ pr_info("MSR_PSTAT: %08x%08x -> %u\n", hi, lo, ratio);
}
/* Get FSB FREQ ID */
rdmsr(MSR_FSB_FREQ, lo, hi);
+ pr_info("MSR_FSBF: %08x%08x\n", hi, lo);
/* Map CPU reference clock freq ID(0-7) to CPU reference clock freq(KHz) */
freq = freq_desc->freqs[lo & 0x7];
+ pr_info("REF_CLOCK: %08x\n", freq);
/* TSC frequency = maximum resolved freq * maximum resolved bus ratio */
res = freq * ratio;
Linus,
[ This is not a merge window pull, I was waiting on a tested-by from
the reporter ]
Kprobe events added "ustring" to distinguish reading strings
from kernel space or user space. But the creating of the event format
file only checks for "string" to display string formats. "ustring" must
also be handled.
Please pull the latest trace-v5.5-rc7 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v5.5-rc7
Tag SHA1: 83993af56b45a4c0b34401ee5bdd96c94180d77d
Head SHA1: 20279420ae3a8ef4c5d9fedc360a2c37a1dbdf1b
Steven Rostedt (VMware) (1):
tracing/kprobes: Have uname use __get_str() in print_fmt
----
kernel/trace/trace_probe.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
---------------------------
commit 20279420ae3a8ef4c5d9fedc360a2c37a1dbdf1b
Author: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Date: Fri Jan 24 10:07:42 2020 -0500
tracing/kprobes: Have uname use __get_str() in print_fmt
Thomas Richter reported:
> Test case 66 'Use vfs_getname probe to get syscall args filenames'
> is broken on s390, but works on x86. The test case fails with:
>
> [root@m35lp76 perf]# perf test -F 66
> 66: Use vfs_getname probe to get syscall args filenames
> :Recording open file:
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.004 MB /tmp/__perf_test.perf.data.TCdYj\
> (20 samples) ]
> Looking at perf.data file for vfs_getname records for the file we touched:
> FAILED!
> [root@m35lp76 perf]#
The root cause was the print_fmt of the kprobe event that referenced the
"ustring"
> Setting up the kprobe event using perf command:
>
> # ./perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring"
>
> generates this format file:
> [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/probe/\
> vfs_getname/format
> name: vfs_getname
> ID: 1172
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:unsigned long __probe_ip; offset:8; size:8; signed:0;
> field:__data_loc char[] pathname; offset:16; size:4; signed:1;
>
> print fmt: "(%lx) pathname=\"%s\"", REC->__probe_ip, REC->pathname
Instead of using "__get_str(pathname)" it referenced it directly.
Link: http://lkml.kernel.org/r/20200124100742.4050c15e@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Reported-by: Thomas Richter <tmricht(a)linux.ibm.com>
Tested-by: Thomas Richter <tmricht(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9ae87be422f2..ab8b6436d53f 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -876,7 +876,8 @@ static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
for (i = 0; i < tp->nr_args; i++) {
parg = tp->args + i;
if (parg->count) {
- if (strcmp(parg->type->name, "string") == 0)
+ if ((strcmp(parg->type->name, "string") == 0) ||
+ (strcmp(parg->type->name, "ustring") == 0))
fmt = ", __get_str(%s[%d])";
else
fmt = ", REC->%s[%d]";
@@ -884,7 +885,8 @@ static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
pos += snprintf(buf + pos, LEN_OR_ZERO,
fmt, parg->name, j);
} else {
- if (strcmp(parg->type->name, "string") == 0)
+ if ((strcmp(parg->type->name, "string") == 0) ||
+ (strcmp(parg->type->name, "ustring") == 0))
fmt = ", __get_str(%s)";
else
fmt = ", REC->%s";
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 24aa6bde1f9e - Linux 5.4.16-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/408028
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns transport
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests: ext4
⚡⚡⚡ xfstests: xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
Host 3:
✅ Boot test
✅ xfstests: ext4
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests: ext4
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests: ext4
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns transport
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ❌ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 3:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 4:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running are marked with ⏱. Reports for non-upstream kernels have
a Beaker recipe linked to next to each host.
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 4acf9f18a8fe - Linux 5.4.16-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/408132
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns transport
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests: ext4
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests: ext4
✅ xfstests: xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 2:
✅ Boot test
✅ Podman system integration test (as root)
✅ Podman system integration test (as user)
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func: local
✅ Networking route_func: forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns transport
✅ Networking ipsec: basic netns tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests: ext4
⚡⚡⚡ xfstests: xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 4:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests: ext4
⚡⚡⚡ xfstests: xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests: ext4
⚡⚡⚡ xfstests: xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running are marked with ⏱. Reports for non-upstream kernels have
a Beaker recipe linked to next to each host.
Commit d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert")
missed a test for use_ptemod when calling mmu_interval_read_begin(). Fix
that.
Fixes: d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert")
CC: stable(a)vger.kernel.org # 5.5
Reported-by: Ilpo Järvinen <ilpo.jarvinen(a)cs.helsinki.fi>
Tested-by: Ilpo Järvinen <ilpo.jarvinen(a)cs.helsinki.fi>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
---
drivers/xen/gntdev.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4fc83e3f5ad3..0258415ca0b2 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1006,19 +1006,19 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
}
mutex_unlock(&priv->lock);
- /*
- * gntdev takes the address of the PTE in find_grant_ptes() and passes
- * it to the hypervisor in gntdev_map_grant_pages(). The purpose of
- * the notifier is to prevent the hypervisor pointer to the PTE from
- * going stale.
- *
- * Since this vma's mappings can't be touched without the mmap_sem,
- * and we are holding it now, there is no need for the notifier_range
- * locking pattern.
- */
- mmu_interval_read_begin(&map->notifier);
-
if (use_ptemod) {
+ /*
+ * gntdev takes the address of the PTE in find_grant_ptes() and
+ * passes it to the hypervisor in gntdev_map_grant_pages(). The
+ * purpose of the notifier is to prevent the hypervisor pointer
+ * to the PTE from going stale.
+ *
+ * Since this vma's mappings can't be touched without the
+ * mmap_sem, and we are holding it now, there is no need for
+ * the notifier_range locking pattern.
+ */
+ mmu_interval_read_begin(&map->notifier);
+
map->pages_vm_start = vma->vm_start;
err = apply_to_page_range(vma->vm_mm, vma->vm_start,
vma->vm_end - vma->vm_start,
--
2.17.1
Vipul,
vipul kumar <vipulk0511(a)gmail.com> writes:
Please see https://people.kernel.org/tglx/notes-about-netiquette
and search for Top-posting.
> Please find attached logs with mainline kernel version 5.4.15 with
> patch.
Which patch? I'm not seing any of the debug prints from my patch in that
dmesg.
I assume it's your patch, right?
> [ 5.736689] tsc: Refined TSC clocksource calibration: 1833.333 MHz
Otherwise this would not show up.
Try again please with your patch removed an my debug patch applied.
Thanks,
tglx
Hi,
On 28-01-2020 12:47, vipul kumar wrote:
> Hi Thomas,
>
> Please find attached logs with mainline kernel version 5.4.15 with patch.
So the suggested change seems to not work, that is strange.
Can you double check you are running the correct kernel and
add the following change and gather debug output ? :
--- a/arch/x86/kernel/tsc_msr.c
+++ b/arch/x86/kernel/tsc_msr.c
@@ -102,6 +103,8 @@ unsigned long cpu_khz_from_msr(void)
/* Get FSB FREQ ID */
+ pr_err("tsc msr id match %ld lo 0x%02x\n", id - tsc_msr_cpu_ids, lo);
+
/* Map CPU reference clock freq ID(0-7) to CPU reference clock freq(KHz) */
freq = freq_desc->freqs[lo & 0x7];
Regards,
Hans
> On Fri, Jan 24, 2020 at 9:24 PM Hans de Goede <hdegoede(a)redhat.com <mailto:hdegoede@redhat.com>> wrote:
>
> Hi,
>
> On 1/24/20 12:55 PM, Thomas Gleixner wrote:
> > Hans,
> >
> > Hans de Goede <hdegoede(a)redhat.com <mailto:hdegoede@redhat.com>> writes:
> >> On 1/24/20 9:35 AM, Thomas Gleixner wrote:
> >>> Where does that number come from? Just math?
> >>
> >> Yes just math, but perhaps the Intel folks can see if they can find some
> >> datasheet to back this up ?
> >
> > Can you observe the issue on one of the machines in your zoo as well?
>
> I haven't tried yet. Looking at the thread sofar the problem was noticed on
> a system with a Celeron N2930, I don't have access to one of those, I
> do have access to a system with a closely related N2840 I will give that
> a try as well as see if I can reproduce this on one of the tablet
> oriented Z3735x SoCs.
>
> I'll report back when I have had a chance to test this.
>
> Regards,
>
> Hans
>
stable-rc 4.9 build failed due to these build error,
drivers/md/bitmap.c:1702:13: error: conflicting types for 'bitmap_free'
static void bitmap_free(struct bitmap *bitmap)
^~~~~~~~~~~
include/linux/bitmap.h:94:13: note: previous declaration of
'bitmap_free' was here
extern void bitmap_free(const unsigned long *bitmap);
^~~~~~~~~~~
scripts/Makefile.build:304: recipe for target 'drivers/md/bitmap.o' failed
suspecting this patch causing this build failure on stable-rc 4.9
bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
commit c42b65e363ce97a828f81b59033c3558f8fa7f70 upstream.
A lot of code become ugly because of open coding allocations for bitmaps.
Introduce three helpers to allow users be more clear of intention
and keep their code neat.
Note, due to multiple circular dependencies we may not provide
the helpers as inliners. For now we keep them exported and, perhaps,
at some point in the future we will sort out header inclusion and
inheritance.
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
--
Linaro LKFT
https://lkft.linaro.org
On Sun, 1 Jul 2018 at 01:49, Andy Shevchenko
<andriy.shevchenko(a)linux.intel.com> wrote:
>
> A lot of code become ugly because of open coding allocations for bitmaps.
>
> Introduce three helpers to allow users be more clear of intention
> and keep their code neat.
>
> Note, due to multiple circular dependencies we may not provide
> the helpers as inliners. For now we keep them exported and, perhaps,
> at some point in the future we will sort out header inclusion and
> inheritance.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
> ---
> include/linux/bitmap.h | 8 ++++++++
> lib/bitmap.c | 19 +++++++++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 1ee46f492267..acf5e8df3504 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -104,6 +104,14 @@
> * contain all bit positions from 0 to 'bits' - 1.
> */
>
> +/*
> + * Allocation and deallocation of bitmap.
> + * Provided in lib/bitmap.c to avoid circular dependency.
> + */
> +extern unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags);
> +extern unsigned long *bitmap_zalloc(unsigned int nbits, gfp_t flags);
> +extern void bitmap_free(const unsigned long *bitmap);
> +
> /*
> * lib/bitmap.c provides these functions:
> */
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index 33e95cd359a2..09acf2fd6a35 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -13,6 +13,7 @@
> #include <linux/bitops.h>
> #include <linux/bug.h>
> #include <linux/kernel.h>
> +#include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/uaccess.h>
>
> @@ -1125,6 +1126,24 @@ void bitmap_copy_le(unsigned long *dst, const unsigned long *src, unsigned int n
> EXPORT_SYMBOL(bitmap_copy_le);
> #endif
>
> +unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags)
> +{
> + return kmalloc_array(BITS_TO_LONGS(nbits), sizeof(unsigned long), flags);
> +}
> +EXPORT_SYMBOL(bitmap_alloc);
> +
> +unsigned long *bitmap_zalloc(unsigned int nbits, gfp_t flags)
> +{
> + return bitmap_alloc(nbits, flags | __GFP_ZERO);
> +}
> +EXPORT_SYMBOL(bitmap_zalloc);
> +
> +void bitmap_free(const unsigned long *bitmap)
> +{
> + kfree(bitmap);
> +}
> +EXPORT_SYMBOL(bitmap_free);
> +
> #if BITS_PER_LONG == 64
> /**
> * bitmap_from_arr32 - copy the contents of u32 array of bits to bitmap
stable-rc 4.14 build failed due to these build error,
lib/bitmap.c: In function 'bitmap_from_u32array':
lib/bitmap.c:1133:1: warning: ISO C90 forbids mixed declarations and
code [-Wdeclaration-after-statement]
unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags)
^~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1138:15: error: non-static declaration of 'bitmap_alloc'
follows static declaration
EXPORT_SYMBOL(bitmap_alloc);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1138:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_alloc);
^~~~~~~~~~~~~
lib/bitmap.c:1133:16: note: previous definition of 'bitmap_alloc' was here
unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags)
^~~~~~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1144:15: error: non-static declaration of 'bitmap_zalloc'
follows static declaration
EXPORT_SYMBOL(bitmap_zalloc);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_zalloc);
^~~~~~~~~~~~~
lib/bitmap.c:1140:16: note: previous definition of 'bitmap_zalloc' was here
unsigned long *bitmap_zalloc(unsigned int nbits, gfp_t flags)
^~~~~~~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1150:15: error: non-static declaration of 'bitmap_free'
follows static declaration
EXPORT_SYMBOL(bitmap_free);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1150:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_free);
^~~~~~~~~~~~~
lib/bitmap.c:1146:6: note: previous definition of 'bitmap_free' was here
void bitmap_free(const unsigned long *bitmap)
^~~~~~~~~~~
CC drivers/char/random.o
scripts/Makefile.build:326: recipe for target 'lib/bitmap.o' failed
make[3]: *** [lib/bitmap.o] Error 1
Makefile:1052: recipe for target 'lib' failed
make[2]: *** [lib] Error 2
stable-rc 4.14 build failed due to these build error,
lib/bitmap.c: In function 'bitmap_from_u32array':
lib/bitmap.c:1133:1: warning: ISO C90 forbids mixed declarations and
code [-Wdeclaration-after-statement]
unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags)
^~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1138:15: error: non-static declaration of 'bitmap_alloc'
follows static declaration
EXPORT_SYMBOL(bitmap_alloc);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1138:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_alloc);
^~~~~~~~~~~~~
lib/bitmap.c:1133:16: note: previous definition of 'bitmap_alloc' was here
unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags)
^~~~~~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1144:15: error: non-static declaration of 'bitmap_zalloc'
follows static declaration
EXPORT_SYMBOL(bitmap_zalloc);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1144:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_zalloc);
^~~~~~~~~~~~~
lib/bitmap.c:1140:16: note: previous definition of 'bitmap_zalloc' was here
unsigned long *bitmap_zalloc(unsigned int nbits, gfp_t flags)
^~~~~~~~~~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work-shared/intel-corei7-64/kernel-source/lib/bitmap.c:8:0:
lib/bitmap.c:1150:15: error: non-static declaration of 'bitmap_free'
follows static declaration
EXPORT_SYMBOL(bitmap_free);
^
include/linux/export.h:65:21: note: in definition of macro '___EXPORT_SYMBOL'
extern typeof(sym) sym; \
^~~
lib/bitmap.c:1150:1: note: in expansion of macro 'EXPORT_SYMBOL'
EXPORT_SYMBOL(bitmap_free);
^~~~~~~~~~~~~
lib/bitmap.c:1146:6: note: previous definition of 'bitmap_free' was here
void bitmap_free(const unsigned long *bitmap)
^~~~~~~~~~~
CC drivers/char/random.o
scripts/Makefile.build:326: recipe for target 'lib/bitmap.o' failed
make[3]: *** [lib/bitmap.o] Error 1
Makefile:1052: recipe for target 'lib' failed
make[2]: *** [lib] Error 2
--
Linaro LKFT
https://lkft.linaro.org
<gregkh(a)linuxfoundation.org> writes:
> This is a note to let you know that I've just added the patch titled
>
> net-sysfs: Fix reference count leak
>
> to the 4.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-sysfs-fix-reference-count-leak.patch
> and it can be found in the queue-4.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This patch shouldn't be taken into 4.4 or 4.9 stable branches. Memory
leak it's fixing doesn't exist in 4.4 or 4.9. It's introduced by these two
patches which are not merged into 4.4 or 4.9 branches:
commit e331c9066901dfe40bea4647521b86e9fb9901bb
Author: YueHaibing <yuehaibing(a)huawei.com>
Date: Tue Mar 19 10:16:53 2019 +0800
net-sysfs: call dev_hold if kobject_init_and_add success
[ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ]
In netdev_queue_add_kobject and rx_queue_add_kobject,
if sysfs_create_group failed, kobject_put will call
netdev_queue_release to decrease dev refcont, however
dev_hold has not be called. So we will see this while
unregistering dev:
unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: d0d668371679 ("net: don't decrement kobj reference count on init fail
ure")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
commit d0d6683716791b2a2761a1bb025c613eb73da6c3
Author: stephen hemminger <stephen(a)networkplumber.org>
Date: Fri Aug 18 13:46:19 2017 -0700
net: don't decrement kobj reference count on init failure
If kobject_init_and_add failed, then the failure path would
decrement the reference count of the queue kobject whose reference
count was already zero.
Fixes: 114cf5802165 ("bql: Byte queue limits")
Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
>
>
> From foo@baz Mon 27 Jan 2020 04:14:17 PM CET
> From: Jouni Hogander <jouni.hogander(a)unikie.com>
> Date: Mon, 20 Jan 2020 09:51:03 +0200
> Subject: net-sysfs: Fix reference count leak
>
> From: Jouni Hogander <jouni.hogander(a)unikie.com>
>
> [ Upstream commit cb626bf566eb4433318d35681286c494f04fedcc ]
>
> Netdev_register_kobject is calling device_initialize. In case of error
> reference taken by device_initialize is not given up.
>
> Drivers are supposed to call free_netdev in case of error. In non-error
> case the last reference is given up there and device release sequence
> is triggered. In error case this reference is kept and the release
> sequence is never started.
>
> Fix this by setting reg_state as NETREG_UNREGISTERED if registering
> fails.
>
> This is the rootcause for couple of memory leaks reported by Syzkaller:
>
> BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256):
> comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280
> [<000000002340019b>] device_add+0x882/0x1750
> [<000000001d588c3a>] netdev_register_kobject+0x128/0x380
> [<0000000011ef5535>] register_netdevice+0xa1b/0xf00
> [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0
> [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40
> [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510
> [<00000000fba062ea>] ksys_ioctl+0x99/0xb0
> [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0
> [<00000000984cabb9>] do_syscall_64+0x16f/0x580
> [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<00000000e6ca2d9f>] 0xffffffffffffffff
>
> BUG: memory leak
> unreferenced object 0xffff8880668ba588 (size 8):
> comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s)
> hex dump (first 8 bytes):
> 6e 72 30 00 cc be df 2b nr0....+
> backtrace:
> [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290
> [<00000000236fd26b>] kstrdup+0x3e/0x70
> [<00000000dd4a2815>] kstrdup_const+0x3e/0x50
> [<0000000049a377fc>] kvasprintf_const+0x10e/0x160
> [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140
> [<0000000019eeab06>] dev_set_name+0xc0/0xf0
> [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320
> [<00000000f2e83732>] register_netdevice+0xa1b/0xf00
> [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0
> [<000000009c560784>] tun_chr_ioctl+0x2f/0x40
> [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510
> [<00000000351d7c31>] ksys_ioctl+0x99/0xb0
> [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0
> [<0000000052d196b7>] do_syscall_64+0x16f/0x580
> [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<00000000bc384531>] 0xffffffffffffffff
>
> v3 -> v4:
> Set reg_state to NETREG_UNREGISTERED if registering fails
>
> v2 -> v3:
> * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
>
> v1 -> v2:
> * Relying on driver calling free_netdev rather than calling
> put_device directly in error path
>
> Reported-by: syzbot+ad8ca40ecd77896d51e2(a)syzkaller.appspotmail.com
> Cc: David Miller <davem(a)davemloft.net>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Cc: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
> Signed-off-by: Jouni Hogander <jouni.hogander(a)unikie.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> net/core/dev.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6806,8 +6806,10 @@ int register_netdevice(struct net_device
> goto err_uninit;
>
> ret = netdev_register_kobject(dev);
> - if (ret)
> + if (ret) {
> + dev->reg_state = NETREG_UNREGISTERED;
> goto err_uninit;
> + }
> dev->reg_state = NETREG_REGISTERED;
>
> __netdev_update_features(dev);
>
>
> Patches currently in stable-queue which might be from jouni.hogander(a)unikie.com are
>
> queue-4.4/net-sysfs-fix-reference-count-leak.patch
BR,
Jouni Högander
[ Upstream commit 730766bae3280a25d40ea76a53dc6342e84e6513 ]
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # v4.9 to v4.19
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
---
drivers/hwtracing/coresight/coresight-etb10.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 0dad8626bcfb..6cf28b049635 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -275,9 +275,7 @@ static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (cpu == -1) ? NUMA_NO_NODE : cpu_to_node(cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
--
2.24.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 883f616530692d81cb70f8a32d85c0d2afc05f69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lars=20M=C3=B6llendorf?= <lars.moellendorf(a)plating.de>
Date: Fri, 13 Dec 2019 14:50:55 +0100
Subject: [PATCH] iio: buffer: align the size of scan bytes to size of the
largest element
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previous versions of `iio_compute_scan_bytes` only aligned each element
to its own length (i.e. its own natural alignment). Because multiple
consecutive sets of scan elements are buffered this does not work in
case the computed scan bytes do not align with the natural alignment of
the first scan element in the set.
This commit fixes this by aligning the scan bytes to the natural
alignment of the largest scan element in the set.
Fixes: 959d2952d124 ("staging:iio: make iio_sw_buffer_preenable much more general.")
Signed-off-by: Lars Möllendorf <lars.moellendorf(a)plating.de>
Reviewed-by: Lars-Peter Clausen <lars(a)metafoo.de>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index c193d64e5217..112225c0e486 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -566,7 +566,7 @@ static int iio_compute_scan_bytes(struct iio_dev *indio_dev,
const unsigned long *mask, bool timestamp)
{
unsigned bytes = 0;
- int length, i;
+ int length, i, largest = 0;
/* How much space will the demuxed element take? */
for_each_set_bit(i, mask,
@@ -574,13 +574,17 @@ static int iio_compute_scan_bytes(struct iio_dev *indio_dev,
length = iio_storage_bytes_for_si(indio_dev, i);
bytes = ALIGN(bytes, length);
bytes += length;
+ largest = max(largest, length);
}
if (timestamp) {
length = iio_storage_bytes_for_timestamp(indio_dev);
bytes = ALIGN(bytes, length);
bytes += length;
+ largest = max(largest, length);
}
+
+ bytes = ALIGN(bytes, largest);
return bytes;
}
From: Will Deacon <will.deacon(a)arm.com>
commit 2a355ec25729053bb9a1a89b6c1d1cdd6c3b3fb1 upstream.
While the CSV3 field of the ID_AA64_PFR0 CPU ID register can be checked
to see if a CPU is susceptible to Meltdown and therefore requires kpti
to be enabled, existing CPUs do not implement this field.
We therefore whitelist all unaffected Cortex-A CPUs that do not implement
the CSV3 field.
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
[florian: adjust whilelist location and table to stable-4.9.y]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
arch/arm64/kernel/cpufeature.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9a8e45dc36bd..8cf001baee21 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -789,6 +789,11 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
case MIDR_CAVIUM_THUNDERX2:
case MIDR_BRCM_VULCAN:
+ case MIDR_CORTEX_A53:
+ case MIDR_CORTEX_A55:
+ case MIDR_CORTEX_A57:
+ case MIDR_CORTEX_A72:
+ case MIDR_CORTEX_A73:
return false;
}
--
2.17.1
From: Jeremy Linton <jeremy.linton(a)arm.com>
commit de19055564c8f8f9d366f8db3395836da0b2176c upstream
For a while Arm64 has been capable of force enabling
or disabling the kpti mitigations. Lets make sure the
documentation reflects that.
Signed-off-by: Jeremy Linton <jeremy.linton(a)arm.com>
Reviewed-by: Andre Przywara <andre.przywara(a)arm.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
[florian: patch the correct file]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
Documentation/kernel-parameters.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1bc12619bedd..b2d2f4539a3f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
+ kpti= [ARM64] Control page table isolation of user
+ and kernel address spaces.
+ Default: enabled on cores which need mitigation.
+ 0: force disabled
+ 1: force enabled
+
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
--
2.17.1
ZBC/ZAC report zones command may return less bytes than requested if the
number of matching zones for the report request is small. However, unlike
read or write commands, the remainder of incomplete report zones commands
cannot be automatically requested by the block layer: the start sector of
the next report cannot be known, and the report reply may not be 512B
aligned for SAS drives (a report zone reply size is always a multiple of
64B). The regular request completion code executing bio_advance() and
restart of the command remainder part currently causes invalid zone
descriptor data to be reported to the caller if the report zone size is
smaller than 512B (a case that can happen easily for a report of the last
zones of a SAS drive for example).
Since blkdev_report_zones() handles report zone command processing in a
loop until completion (no more zones are being reported), we can safely
avoid that the block layer performs an incorrect bio_advance() call and
restart of the remainder of incomplete report zone BIOs. To do so, always
indicate a full completion of REQ_OP_ZONE_REPORT by setting good_bytes to
the request buffer size and by setting the command resid to 0. This does
not affect the post processing of the report zone reply done by
sd_zbc_complete() since the reply header indicates the number of zones
reported.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: <stable(a)vger.kernel.org> # 4.19
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Masato Suzuki <masato.suzuki(a)wdc.com>
---
drivers/scsi/sd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2955b856e9ec..e8c2afbb82e9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1981,9 +1981,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
}
break;
case REQ_OP_ZONE_REPORT:
+ /* To avoid that the block layer performs an incorrect
+ * bio_advance() call and restart of the remainder of
+ * incomplete report zone BIOs, always indicate a full
+ * completion of REQ_OP_ZONE_REPORT.
+ */
if (!result) {
- good_bytes = scsi_bufflen(SCpnt)
- - scsi_get_resid(SCpnt);
+ good_bytes = scsi_bufflen(SCpnt);
scsi_set_resid(SCpnt, 0);
} else {
good_bytes = 0;
--
2.24.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 20 Jan 2020 13:07:31 -0500
Subject: [PATCH] tracing: Fix histogram code when expression has same var as
value
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:
# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
# echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
Would not display any histograms in the sched_switch histogram side.
But if I were to swap the location of
"delta=common_timestamp-$start" with "start2=$start"
Such that the last line had:
# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
The histogram works as expected.
What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:
delta=common_timestamp-$start
The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.
When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.
>From Tom Zanussi:
"Just to clarify some more about what the problem was is that without
your patch, we would have two separate references to the same variable,
and during resolve_var_refs(), they'd both want to be resolved
separately, so in this case, since the first reference to start wasn't
part of an expression, it wouldn't get the read-once flag set, so would
be read normally, and then the second reference would do the read-once
read and also be read but using read-once. So everything worked and
you didn't see a problem:
from: start2=$start,delta=common_timestamp-$start
In the second case, when you switched them around, the first reference
would be resolved by doing the read-once, and following that the second
reference would try to resolve and see that the variable had already
been read, so failed as unset, which caused it to short-circuit out and
not do the trigger action to generate the synthetic event:
to: delta=common_timestamp-$start,start2=$start
With your patch, we only have the single resolution which happens
correctly the one time it's resolved, so this can't happen."
Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi(a)kernel.org>
Tested-by: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index d33b046f985a..6ac35b9e195d 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -116,6 +116,7 @@ struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
hist_field_fn_t fn;
+ unsigned int ref;
unsigned int size;
unsigned int offset;
unsigned int is_signed;
@@ -2427,8 +2428,16 @@ static int contains_operator(char *str)
return field_op;
}
+static void get_hist_field(struct hist_field *hist_field)
+{
+ hist_field->ref++;
+}
+
static void __destroy_hist_field(struct hist_field *hist_field)
{
+ if (--hist_field->ref > 1)
+ return;
+
kfree(hist_field->var.name);
kfree(hist_field->name);
kfree(hist_field->type);
@@ -2470,6 +2479,8 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (!hist_field)
return NULL;
+ hist_field->ref = 1;
+
hist_field->hist_data = hist_data;
if (flags & HIST_FIELD_FL_EXPR || flags & HIST_FIELD_FL_ALIAS)
@@ -2665,6 +2676,17 @@ static struct hist_field *create_var_ref(struct hist_trigger_data *hist_data,
{
unsigned long flags = HIST_FIELD_FL_VAR_REF;
struct hist_field *ref_field;
+ int i;
+
+ /* Check if the variable already exists */
+ for (i = 0; i < hist_data->n_var_refs; i++) {
+ ref_field = hist_data->var_refs[i];
+ if (ref_field->var.idx == var_field->var.idx &&
+ ref_field->var.hist_data == var_field->hist_data) {
+ get_hist_field(ref_field);
+ return ref_field;
+ }
+ }
ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL);
if (ref_field) {
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page). This was an unintentional change that hasn't been noticed
except for LTP tests which checked for the documented behavior.
There are two ways to go around this change. We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able
to migrate pages due to non-fatal reasons. Another option would be to
simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).
This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error handling
as the caller knows that it is worth to retry when err > 0.
But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of non-attempted
pages in the return value too.
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
v4: Fixed some typo and grammar errors caught by Willy
v3: Rephrased the commit log per Michal and added Michal's Acked-by
v2: Rebased on top of the latest mainline kernel per Andrew
mm/migrate.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 86873b6..2530860 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1627,8 +1627,19 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
start = i;
} else if (node != current_node) {
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ /*
+ * Positive err means the number of failed
+ * pages to migrate. Since we are going to
+ * abort and return the number of non-migrated
+ * pages, so need to incude the rest of the
+ * nr_pages that have not been attempted as
+ * well.
+ */
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
err = store_status(status, start, current_node, i - start);
if (err)
goto out;
@@ -1659,8 +1670,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
goto out_flush;
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
if (i > start) {
err = store_status(status, start, current_node, i - start);
if (err)
@@ -1674,6 +1688,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
/* Make sure we do not overwrite the existing error */
err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+ /*
+ * Don't have to report non-attempted pages here since:
+ * - If the above loop is done gracefully all pages have been
+ * attempted.
+ * - If the above loop is aborted it means a fatal error
+ * happened, should return ret.
+ */
if (!err1)
err1 = store_status(status, start, current_node, i - start);
if (!err)
--
1.8.3.1
From: Suren Baghdasaryan <surenb(a)google.com>
When ashmem file is mmapped, the resulting vma->vm_file points to the
backing shmem file with the generic fops that do not check ashmem
permissions like fops of ashmem do. If an mremap is done on the ashmem
region, then the permission checks will be skipped. Fix that by disallowing
mapping operation on the backing shmem file.
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable <stable(a)vger.kernel.org> # 4.4,4.9,4.14,4.18,5.4
Signed-off-by: Todd Kjos <tkjos(a)google.com>
---
drivers/staging/android/ashmem.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
v2: update commit message as suggested by joelaf(a)google.com.
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 74d497d39c5a..c6695354b123 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -351,8 +351,23 @@ static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
_calc_vm_trans(prot, PROT_EXEC, VM_MAYEXEC);
}
+static int ashmem_vmfile_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ /* do not allow to mmap ashmem backing shmem file directly */
+ return -EPERM;
+}
+
+static unsigned long
+ashmem_vmfile_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
+{
+ return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
{
+ static struct file_operations vmfile_fops;
struct ashmem_area *asma = file->private_data;
int ret = 0;
@@ -393,6 +408,19 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
}
vmfile->f_mode |= FMODE_LSEEK;
asma->file = vmfile;
+ /*
+ * override mmap operation of the vmfile so that it can't be
+ * remapped which would lead to creation of a new vma with no
+ * asma permission checks. Have to override get_unmapped_area
+ * as well to prevent VM_BUG_ON check for f_ops modification.
+ */
+ if (!vmfile_fops.mmap) {
+ vmfile_fops = *vmfile->f_op;
+ vmfile_fops.mmap = ashmem_vmfile_mmap;
+ vmfile_fops.get_unmapped_area =
+ ashmem_vmfile_get_unmapped_area;
+ }
+ vmfile->f_op = &vmfile_fops;
}
get_file(asma->file);
--
2.25.0.341.g760bfbb309-goog