While I had thought I had fixed this issue in:
commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 24 +++++++++++++------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
index b4e7404fe660..a11637b0f6cc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
@@ -40,8 +40,7 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
u8 *ptr = msg->buf;
while (remaining) {
- u8 cnt = (remaining > 16) ? 16 : remaining;
- u8 cmd;
+ u8 cnt, retries, cmd;
if (msg->flags & I2C_M_RD)
cmd = 1;
@@ -51,10 +50,19 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
if (mcnt || remaining > 16)
cmd |= 4; /* MOT */
- ret = aux->func->xfer(aux, true, cmd, msg->addr, ptr, &cnt);
- if (ret < 0) {
- nvkm_i2c_aux_release(aux);
- return ret;
+ for (retries = 0, cnt = 0;
+ retries < 32 && !cnt;
+ retries++) {
+ cnt = min_t(u8, remaining, 16);
+ ret = aux->func->xfer(aux, true, cmd,
+ msg->addr, ptr, &cnt);
+ if (ret < 0)
+ goto out;
+ }
+ if (!cnt) {
+ AUX_TRACE(aux, "no data after 32 retries");
+ ret = -EIO;
+ goto out;
}
ptr += cnt;
@@ -64,8 +72,10 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
msg++;
}
+ ret = num;
+out:
nvkm_i2c_aux_release(aux);
- return num;
+ return ret;
}
static u32
--
2.21.0
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3bf9fbdad600b1e4335dd90979f8d6072e4f602 Mon Sep 17 00:00:00 2001
From: Greg Kurz <groug(a)kaod.org>
Date: Wed, 15 May 2019 12:05:01 +0200
Subject: [PATCH] powerpc/pseries: Fix xive=off command line
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On POWER9, if the hypervisor supports XIVE exploitation mode, the
guest OS will unconditionally requests for the XIVE interrupt mode
even if XIVE was deactivated with the kernel command line xive=off.
Later on, when the spapr XIVE init code handles xive=off, it disables
XIVE and tries to fall back on the legacy mode XICS.
This discrepency causes a kernel panic because the hypervisor is
configured to provide the XIVE interrupt mode to the guest :
kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
...
NIP xics_smp_probe+0x38/0x98
LR xics_smp_probe+0x2c/0x98
Call Trace:
xics_smp_probe+0x2c/0x98 (unreliable)
pSeries_smp_probe+0x40/0xa0
smp_prepare_cpus+0x62c/0x6ec
kernel_init_freeable+0x148/0x448
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x68
Look for xive=off during prom_init and don't ask for XIVE in this
case. One exception though: if the host only supports XIVE, we still
want to boot so we ignore xive=off.
Similarly, have the spapr XIVE init code to looking at the interrupt
mode negotiated during CAS, and ignore xive=off if the hypervisor only
supports XIVE.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable(a)vger.kernel.org # v4.20
Reported-by: Pavithra R. Prakash <pavrampu(a)in.ibm.com>
Signed-off-by: Greg Kurz <groug(a)kaod.org>
Reviewed-by: Cédric Le Goater <clg(a)kaod.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index bab79c51ba4f..17f1ae7fae2c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -172,6 +172,7 @@ static unsigned long __prombss prom_tce_alloc_end;
#ifdef CONFIG_PPC_PSERIES
static bool __prombss prom_radix_disable;
+static bool __prombss prom_xive_disable;
#endif
struct platform_support {
@@ -808,6 +809,12 @@ static void __init early_cmdline_parse(void)
}
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
+
+ opt = prom_strstr(prom_cmd_line, "xive=off");
+ if (opt) {
+ prom_xive_disable = true;
+ prom_debug("XIVE disabled from cmdline\n");
+ }
#endif /* CONFIG_PPC_PSERIES */
}
@@ -1216,10 +1223,17 @@ static void __init prom_parse_xive_model(u8 val,
switch (val) {
case OV5_FEAT(OV5_XIVE_EITHER): /* Either Available */
prom_debug("XIVE - either mode supported\n");
- support->xive = true;
+ support->xive = !prom_xive_disable;
break;
case OV5_FEAT(OV5_XIVE_EXPLOIT): /* Only Exploitation mode */
prom_debug("XIVE - exploitation mode supported\n");
+ if (prom_xive_disable) {
+ /*
+ * If we __have__ to do XIVE, we're better off ignoring
+ * the command line rather than not booting.
+ */
+ prom_printf("WARNING: Ignoring cmdline option xive=off\n");
+ }
support->xive = true;
break;
case OV5_FEAT(OV5_XIVE_LEGACY): /* Only Legacy mode */
diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 575db3b06a6b..2e2d1b8f810f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -20,6 +20,7 @@
#include <linux/cpumask.h>
#include <linux/mm.h>
#include <linux/delay.h>
+#include <linux/libfdt.h>
#include <asm/prom.h>
#include <asm/io.h>
@@ -663,6 +664,55 @@ static bool xive_get_max_prio(u8 *max_prio)
return true;
}
+static const u8 *get_vec5_feature(unsigned int index)
+{
+ unsigned long root, chosen;
+ int size;
+ const u8 *vec5;
+
+ root = of_get_flat_dt_root();
+ chosen = of_get_flat_dt_subnode_by_name(root, "chosen");
+ if (chosen == -FDT_ERR_NOTFOUND)
+ return NULL;
+
+ vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
+ if (!vec5)
+ return NULL;
+
+ if (size <= index)
+ return NULL;
+
+ return vec5 + index;
+}
+
+static bool xive_spapr_disabled(void)
+{
+ const u8 *vec5_xive;
+
+ vec5_xive = get_vec5_feature(OV5_INDX(OV5_XIVE_SUPPORT));
+ if (vec5_xive) {
+ u8 val;
+
+ val = *vec5_xive & OV5_FEAT(OV5_XIVE_SUPPORT);
+ switch (val) {
+ case OV5_FEAT(OV5_XIVE_EITHER):
+ case OV5_FEAT(OV5_XIVE_LEGACY):
+ break;
+ case OV5_FEAT(OV5_XIVE_EXPLOIT):
+ /* Hypervisor only supports XIVE */
+ if (xive_cmdline_disabled)
+ pr_warn("WARNING: Ignoring cmdline option xive=off\n");
+ return false;
+ default:
+ pr_warn("%s: Unknown xive support option: 0x%x\n",
+ __func__, val);
+ break;
+ }
+ }
+
+ return xive_cmdline_disabled;
+}
+
bool __init xive_spapr_init(void)
{
struct device_node *np;
@@ -675,7 +725,7 @@ bool __init xive_spapr_init(void)
const __be32 *reg;
int i;
- if (xive_cmdline_disabled)
+ if (xive_spapr_disabled())
return false;
pr_devel("%s()\n", __func__);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: hantro: Set DMA max segment size
Author: Francois Buergisser <fbuergisser(a)chromium.org>
Date: Thu Jul 25 10:17:50 2019 -0400
The Hantro codec is typically used in platforms with an IOMMU,
so we need to set a proper DMA segment size. Devices without an
IOMMU will still fallback to default 64KiB segments.
Cc: stable(a)vger.kernel.org
Fixes: 775fec69008d3 ("media: add Rockchip VPU JPEG encoder driver")
Signed-off-by: Francois Buergisser <fbuergisser(a)chromium.org>
Signed-off-by: Ezequiel Garcia <ezequiel(a)collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/staging/media/hantro/hantro_drv.c | 1 +
1 file changed, 1 insertion(+)
---
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index b71a06e9159e..4eae1dbb1ac8 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -731,6 +731,7 @@ static int hantro_probe(struct platform_device *pdev)
dev_err(vpu->dev, "Could not set DMA coherent mask.\n");
return ret;
}
+ vb2_dma_contig_set_max_seg_size(&pdev->dev, DMA_BIT_MASK(32));
for (i = 0; i < vpu->variant->num_irqs; i++) {
const char *irq_name = vpu->variant->irqs[i].name;