This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.
Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.
However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #5 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.
Patch #6 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.
[1] https://github.com/raspberrypi/linux/issues/6660
Best Regards,
- Maíra
---
Maíra Canal (6):
drm/v3d: Don't run jobs that have errors flagged in its fence
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Associate a V3D tech revision to all supported devices
dt-bindings: gpu: v3d: Add SMS to the registers' list
drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
dt-bindings: gpu: Add V3D driver maintainer as DT maintainer
.../devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 8 +--
drivers/gpu/drm/v3d/v3d_drv.c | 58 ++++++++++++++++++++--
drivers/gpu/drm/v3d/v3d_drv.h | 18 +++++++
drivers/gpu/drm/v3d/v3d_gem.c | 17 +++++++
drivers/gpu/drm/v3d/v3d_regs.h | 26 ++++++++++
drivers/gpu/drm/v3d/v3d_sched.c | 23 +++++++--
6 files changed, 140 insertions(+), 10 deletions(-)
---
base-commit: 099b79f94366f3110783301e20d8136d762247f8
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d
When src->freq_supported is not NULL but src->freq_supported_num is 0,
dst->freq_supported is equal to src->freq_supported.
In this case, if the subsequent kstrdup() fails, src->freq_supported may
be freed without being set to NULL, potentially leading to a
use-after-free or double-free error.
Fixes: 830ead5fb0c5 ("dpll: fix pin dump crash for rebound module")
Cc: <stable(a)vger.kernel.org> # v6.8+
Signed-off-by: Jiasheng Jiang <jiashengjiangcool(a)gmail.com>
---
drivers/dpll/dpll_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index 32019dc33cca..7d147adf8455 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -475,7 +475,8 @@ static int dpll_pin_prop_dup(const struct dpll_pin_properties *src,
err_panel_label:
kfree(dst->board_label);
err_board_label:
- kfree(dst->freq_supported);
+ if (src->freq_supported_num)
+ kfree(dst->freq_supported);
return -ENOMEM;
}
--
2.25.1
Currently we just leave it uninitialised, which at first looks harmless,
however we also don't zero out the pfn array, and with pfn_flags_mask
the idea is to be able set individual flags for a given range of pfn or
completely ignore them, outside of default_flags. So here we end up with
pfn[i] & pfn_flags_mask, and if both are uninitialised we might get back
an unexpected flags value, like asking for read only with default_flags,
but getting back write on top, leading to potentially bogus behaviour.
To fix this ensure we zero the pfn_flags_mask, such that hmm only
considers the default_flags and not also the initial pfn[i] value.
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
---
drivers/gpu/drm/xe/xe_hmm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index 089834467880..8c3cd65fa4b3 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -206,6 +206,7 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
goto free_pfns;
}
+ hmm_range.pfn_flags_mask = 0;
hmm_range.default_flags = flags;
hmm_range.hmm_pfns = pfns;
hmm_range.notifier = &userptr->notifier;
--
2.48.1
The return value of rio_add_net() should be checked. If it fails,
put_device() should be called to free the memory and give up the
reference initialized in rio_add_net().
Fixes: e6b585ca6e81 ("rapidio: move net allocation into core code")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
---
drivers/rapidio/rio-scan.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/rapidio/rio-scan.c b/drivers/rapidio/rio-scan.c
index fdcf742b2adb..b9daacc7f1ec 100644
--- a/drivers/rapidio/rio-scan.c
+++ b/drivers/rapidio/rio-scan.c
@@ -871,7 +871,10 @@ static struct rio_net *rio_scan_alloc_net(struct rio_mport *mport,
dev_set_name(&net->dev, "rnet_%d", net->id);
net->dev.parent = &mport->dev;
net->dev.release = rio_scan_release_dev;
- rio_add_net(net);
+ if (rio_add_net(net)) {
+ put_device(&net->dev);
+ net = NULL;
+ }
}
return net;
--
2.25.1
When a queue is stopped using the ndo queue API, before
destroying its page pool, the associated NAPI instance
needs to be unlinked to avoid warnings.
Handle this by calling page_pool_disable_direct_recycling()
when stopping a queue.
Cc: stable(a)vger.kernel.org
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy(a)google.com>
---
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
index 8ac0047f1ada..f0674a443567 100644
--- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
@@ -109,10 +109,12 @@ static void gve_rx_reset_ring_dqo(struct gve_priv *priv, int idx)
void gve_rx_stop_ring_dqo(struct gve_priv *priv, int idx)
{
int ntfy_idx = gve_rx_idx_to_ntfy(priv, idx);
+ struct gve_rx_ring *rx = &priv->rx[idx];
if (!gve_rx_was_added_to_block(priv, idx))
return;
+ page_pool_disable_direct_recycling(rx->dqo.page_pool);
gve_remove_napi(priv, ntfy_idx);
gve_rx_remove_from_block(priv, idx);
gve_rx_reset_ring_dqo(priv, idx);
--
2.48.1.658.g4767266eb4-goog
Hibernation assumes the memory layout after resume be the same as that
before sleep, but CONFIG_RANDOM_KMALLOC_CACHES breaks this assumption.
At least on LoongArch and ARM64 we observed failures of resuming from
hibernation (on LoongArch non-boot CPUs fail to bringup, on ARM64 some
devices are unusable).
software_resume_initcall(), the function which resume the target kernel
is a initcall function. So, move the random_kmalloc_seed initialisation
after all initcalls.
Cc: stable(a)vger.kernel.org
Fixes: 3c6152940584290668 ("Randomized slab caches for kmalloc()")
Reported-by: Yuli Wang <wangyuli(a)uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
init/main.c | 3 +++
mm/slab_common.c | 3 ---
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/init/main.c b/init/main.c
index 2a1757826397..1362957bdbe4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1458,6 +1458,9 @@ static int __ref kernel_init(void *unused)
/* need to finish all async __init code before freeing the memory */
async_synchronize_full();
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+ random_kmalloc_seed = get_random_u64();
+#endif
system_state = SYSTEM_FREEING_INITMEM;
kprobe_free_init_mem();
ftrace_free_init_mem();
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4030907b6b7d..23e324aee218 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -971,9 +971,6 @@ void __init create_kmalloc_caches(void)
for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
new_kmalloc_cache(i, type);
}
-#ifdef CONFIG_RANDOM_KMALLOC_CACHES
- random_kmalloc_seed = get_random_u64();
-#endif
/* Kmalloc array is now usable */
slab_state = UP;
--
2.47.1