From: Vineet Gupta <vgupta(a)synopsys.com>
[ Upstream commit e42404fa10fd11fe72d0a0e149a321d10e577715 ]
To start stack unwinding (SP, PC and BLINK) are needed. When the
explicit execution context (pt_regs etc) is not available, unwinder
assumes the task is sleeping (in __switch_to()) and fetches SP and BLINK
from kernel mode stack.
But this assumption is not true, specially in a SMP system, when top
runs on 1 core, there may be active running processes on all cores.
So when unwinding non courrent tasks, ensure they are NOT running.
And while at it, handle the self unwinding case explicitly.
This came out of investigation of a customer reported hang with
rcutorture+top
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/31
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/kernel/stacktrace.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/arch/arc/kernel/stacktrace.c b/arch/arc/kernel/stacktrace.c
index 5401e2bab3da2..054511f571dfd 100644
--- a/arch/arc/kernel/stacktrace.c
+++ b/arch/arc/kernel/stacktrace.c
@@ -39,15 +39,15 @@
#ifdef CONFIG_ARC_DW2_UNWIND
-static void seed_unwind_frame_info(struct task_struct *tsk,
- struct pt_regs *regs,
- struct unwind_frame_info *frame_info)
+static int
+seed_unwind_frame_info(struct task_struct *tsk, struct pt_regs *regs,
+ struct unwind_frame_info *frame_info)
{
/*
* synchronous unwinding (e.g. dump_stack)
* - uses current values of SP and friends
*/
- if (tsk == NULL && regs == NULL) {
+ if (regs == NULL && (tsk == NULL || tsk == current)) {
unsigned long fp, sp, blink, ret;
frame_info->task = current;
@@ -66,11 +66,15 @@ static void seed_unwind_frame_info(struct task_struct *tsk,
frame_info->call_frame = 0;
} else if (regs == NULL) {
/*
- * Asynchronous unwinding of sleeping task
- * - Gets SP etc from task's pt_regs (saved bottom of kernel
- * mode stack of task)
+ * Asynchronous unwinding of a likely sleeping task
+ * - first ensure it is actually sleeping
+ * - if so, it will be in __switch_to, kernel mode SP of task
+ * is safe-kept and BLINK at a well known location in there
*/
+ if (tsk->state == TASK_RUNNING)
+ return -1;
+
frame_info->task = tsk;
frame_info->regs.r27 = TSK_K_FP(tsk);
@@ -104,6 +108,8 @@ static void seed_unwind_frame_info(struct task_struct *tsk,
frame_info->regs.r63 = regs->ret;
frame_info->call_frame = 0;
}
+
+ return 0;
}
#endif
@@ -117,7 +123,8 @@ arc_unwind_core(struct task_struct *tsk, struct pt_regs *regs,
unsigned int address;
struct unwind_frame_info frame_info;
- seed_unwind_frame_info(tsk, regs, &frame_info);
+ if (seed_unwind_frame_info(tsk, regs, &frame_info))
+ return 0;
while (1) {
address = UNW_PC(&frame_info);
--
2.27.0
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]
When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of
time.
To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable
time.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a1000…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++++++++++++++-----
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index e7b873018dca6..e1287c3421165 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1904,18 +1904,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr,
void *buf, int dwords)
{
unsigned long flags;
- int offs, ret = 0;
+ int offs = 0;
u32 *vals = buf;
- if (iwl_trans_grab_nic_access(trans, &flags)) {
- iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
- for (offs = 0; offs < dwords; offs++)
- vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
- iwl_trans_release_nic_access(trans, &flags);
- } else {
- ret = -EBUSY;
+ while (offs < dwords) {
+ /* limit the time we spin here under lock to 1/2s */
+ ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC);
+
+ if (iwl_trans_grab_nic_access(trans, &flags)) {
+ iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+ addr + 4 * offs);
+
+ while (offs < dwords) {
+ vals[offs] = iwl_read32(trans,
+ HBUS_TARG_MEM_RDAT);
+ offs++;
+
+ /* calling ktime_get is expensive so
+ * do it once in 128 reads
+ */
+ if (offs % 128 == 0 && ktime_after(ktime_get(),
+ timeout))
+ break;
+ }
+ iwl_trans_release_nic_access(trans, &flags);
+ } else {
+ return -EBUSY;
+ }
}
- return ret;
+
+ return 0;
}
static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
--
2.27.0
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]
When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of
time.
To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable
time.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a1000…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++++++++++++++-----
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 8a074a516fb26..910edd034fe3a 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1927,18 +1927,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr,
void *buf, int dwords)
{
unsigned long flags;
- int offs, ret = 0;
+ int offs = 0;
u32 *vals = buf;
- if (iwl_trans_grab_nic_access(trans, &flags)) {
- iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
- for (offs = 0; offs < dwords; offs++)
- vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
- iwl_trans_release_nic_access(trans, &flags);
- } else {
- ret = -EBUSY;
+ while (offs < dwords) {
+ /* limit the time we spin here under lock to 1/2s */
+ ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC);
+
+ if (iwl_trans_grab_nic_access(trans, &flags)) {
+ iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+ addr + 4 * offs);
+
+ while (offs < dwords) {
+ vals[offs] = iwl_read32(trans,
+ HBUS_TARG_MEM_RDAT);
+ offs++;
+
+ /* calling ktime_get is expensive so
+ * do it once in 128 reads
+ */
+ if (offs % 128 == 0 && ktime_after(ktime_get(),
+ timeout))
+ break;
+ }
+ iwl_trans_release_nic_access(trans, &flags);
+ } else {
+ return -EBUSY;
+ }
}
- return ret;
+
+ return 0;
}
static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
--
2.27.0
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]
When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of
time.
To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable
time.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a1000…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++++++++++++++-----
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 24da496151353..f48c7cac122e9 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2121,18 +2121,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr,
void *buf, int dwords)
{
unsigned long flags;
- int offs, ret = 0;
+ int offs = 0;
u32 *vals = buf;
- if (iwl_trans_grab_nic_access(trans, &flags)) {
- iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
- for (offs = 0; offs < dwords; offs++)
- vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
- iwl_trans_release_nic_access(trans, &flags);
- } else {
- ret = -EBUSY;
+ while (offs < dwords) {
+ /* limit the time we spin here under lock to 1/2s */
+ ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC);
+
+ if (iwl_trans_grab_nic_access(trans, &flags)) {
+ iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+ addr + 4 * offs);
+
+ while (offs < dwords) {
+ vals[offs] = iwl_read32(trans,
+ HBUS_TARG_MEM_RDAT);
+ offs++;
+
+ /* calling ktime_get is expensive so
+ * do it once in 128 reads
+ */
+ if (offs % 128 == 0 && ktime_after(ktime_get(),
+ timeout))
+ break;
+ }
+ iwl_trans_release_nic_access(trans, &flags);
+ } else {
+ return -EBUSY;
+ }
}
- return ret;
+
+ return 0;
}
static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
--
2.27.0
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]
When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of
time.
To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable
time.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a1000…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++++++++++++++-----
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index c76d26708e659..ef5a8ecabc60a 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2178,18 +2178,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr,
void *buf, int dwords)
{
unsigned long flags;
- int offs, ret = 0;
+ int offs = 0;
u32 *vals = buf;
- if (iwl_trans_grab_nic_access(trans, &flags)) {
- iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
- for (offs = 0; offs < dwords; offs++)
- vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
- iwl_trans_release_nic_access(trans, &flags);
- } else {
- ret = -EBUSY;
+ while (offs < dwords) {
+ /* limit the time we spin here under lock to 1/2s */
+ ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC);
+
+ if (iwl_trans_grab_nic_access(trans, &flags)) {
+ iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+ addr + 4 * offs);
+
+ while (offs < dwords) {
+ vals[offs] = iwl_read32(trans,
+ HBUS_TARG_MEM_RDAT);
+ offs++;
+
+ /* calling ktime_get is expensive so
+ * do it once in 128 reads
+ */
+ if (offs % 128 == 0 && ktime_after(ktime_get(),
+ timeout))
+ break;
+ }
+ iwl_trans_release_nic_access(trans, &flags);
+ } else {
+ return -EBUSY;
+ }
}
- return ret;
+
+ return 0;
}
static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
--
2.27.0