A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before.
v2: - acquire the handle instead of the BO (Christian) - fix comment style (Christian) - drop the Fixes tag (Christian) - rename err_ gotos - add missing Link tag
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Anusha Srivatsa asrivats@redhat.com Cc: Christian König christian.koenig@amd.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: stable@vger.kernel.org --- drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini);
+static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); + + if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +} + +/** + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles + * @obj: GEM object + * + * Acquires a reference on the GEM buffer object's handle. Required + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() + * to release the reference. + */ +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + guard(mutex)(&dev->object_name_lock); + + drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */ + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); + /** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } }
-static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/** + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles + * @obj: GEM object + * + * Releases a reference on the GEM buffer object's handle. Possibly releases + * the GEM buffer object and associated dma-buf objects. + */ +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked);
/* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret;
WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj); + + drm_gem_object_handle_get(obj);
/* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i;
for (i = 0; i < fb->format->num_planes; i++) - drm_gem_object_put(fb->obj[i]); + drm_gem_object_handle_put_unlocked(fb->obj[i]);
drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } + drm_gem_object_handle_get_unlocked(objs[i]); + drm_gem_object_put(objs[i]);
min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } }
ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret) - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked;
return 0;
-err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i; - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev);
/* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep);
On 30.06.25 10:36, Thomas Zimmermann wrote:
A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Thomas Zimmermann tzimmermann@suse.de
Reviewed-by: Christian König christian.koenig@amd.com
But I strongly suggest to let the different CI systems take a look as well, we already had to much fun with that.
Regards, Christian.
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Anusha Srivatsa asrivats@redhat.com Cc: Christian König christian.koenig@amd.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: stable@vger.kernel.org
drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{
- struct drm_device *dev = obj->dev;
- drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
drm_gem_object_get(obj);
+}
+/**
- drm_gem_object_handle_get_unlocked - acquire reference on user-space handles
- @obj: GEM object
- Acquires a reference on the GEM buffer object's handle. Required
- to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked()
- to release the reference.
- */
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{
- struct drm_device *dev = obj->dev;
- guard(mutex)(&dev->object_name_lock);
- drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */
- drm_gem_object_handle_get(obj);
+} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked);
/**
- drm_gem_object_handle_free - release resources bound to userspace handles
- @obj: GEM object to clean up.
@@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/**
- drm_gem_object_handle_put_unlocked - releases reference on user-space handles
- @obj: GEM object
- Releases a reference on the GEM buffer object's handle. Possibly releases
- the GEM buffer object and associated dma-buf objects.
- */
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /*
- Called at device or object close to release the file's
@@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
drm_gem_object_get(obj);
- drm_gem_object_handle_get(obj);
/* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i; for (i = 0; i < fb->format->num_planes; i++)
drm_gem_object_put(fb->obj[i]);
drm_gem_object_handle_put_unlocked(fb->obj[i]);
drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT;
goto err_gem_object_put;
}goto err_gem_object_handle_put_unlocked;
drm_gem_object_handle_get_unlocked(objs[i]);
drm_gem_object_put(objs[i]);
min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i);
drm_gem_object_put(objs[i]);
drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL;
goto err_gem_object_put;
} }goto err_gem_object_handle_put_unlocked;
ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret)
goto err_gem_object_put;
goto err_gem_object_handle_put_unlocked;
return 0; -err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i;
drm_gem_object_put(objs[i]);
} return ret;drm_gem_object_handle_put_unlocked(objs[i]);
} diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep);
Hi
Am 30.06.25 um 10:49 schrieb Christian König:
On 30.06.25 10:36, Thomas Zimmermann wrote:
A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Thomas Zimmermann tzimmermann@suse.de
Reviewed-by: Christian König christian.koenig@amd.com
Thanks a lot
But I strongly suggest to let the different CI systems take a look as well, we already had to much fun with that.
I can wait a bit longer for reports, but the patch fixes a regression in v6.15. I'd rather see it merged soon-ish.
Best regards Thomas
Regards, Christian.
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Anusha Srivatsa asrivats@redhat.com Cc: Christian König christian.koenig@amd.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: stable@vger.kernel.org
drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{
- struct drm_device *dev = obj->dev;
- drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
drm_gem_object_get(obj);
+}
+/**
- drm_gem_object_handle_get_unlocked - acquire reference on user-space handles
- @obj: GEM object
- Acquires a reference on the GEM buffer object's handle. Required
- to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked()
- to release the reference.
- */
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{
- struct drm_device *dev = obj->dev;
- guard(mutex)(&dev->object_name_lock);
- drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */
- drm_gem_object_handle_get(obj);
+} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked);
- /**
- drm_gem_object_handle_free - release resources bound to userspace handles
- @obj: GEM object to clean up.
@@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/**
- drm_gem_object_handle_put_unlocked - releases reference on user-space handles
- @obj: GEM object
- Releases a reference on the GEM buffer object's handle. Possibly releases
- the GEM buffer object and associated dma-buf objects.
- */
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /*
- Called at device or object close to release the file's
@@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock));
- if (obj->handle_count++ == 0)
drm_gem_object_get(obj);
- drm_gem_object_handle_get(obj);
/* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i; for (i = 0; i < fb->format->num_planes; i++)
drm_gem_object_put(fb->obj[i]);
drm_gem_object_handle_put_unlocked(fb->obj[i]);
drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT;
goto err_gem_object_put;
}goto err_gem_object_handle_put_unlocked;
drm_gem_object_handle_get_unlocked(objs[i]);
drm_gem_object_put(objs[i]);
min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i);
drm_gem_object_put(objs[i]);
drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL;
goto err_gem_object_put;
} }goto err_gem_object_handle_put_unlocked;
ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret)
goto err_gem_object_put;
goto err_gem_object_handle_put_unlocked;
return 0; -err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i;
drm_gem_object_put(objs[i]);
} return ret; }drm_gem_object_handle_put_unlocked(objs[i]);
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep);
On 30.06.25 13:34, Thomas Zimmermann wrote:
Hi
Am 30.06.25 um 10:49 schrieb Christian König:
On 30.06.25 10:36, Thomas Zimmermann wrote:
A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König christian.koenig@amd.com Signed-off-by: Thomas Zimmermann tzimmermann@suse.de
Reviewed-by: Christian König christian.koenig@amd.com
Thanks a lot
But I strongly suggest to let the different CI systems take a look as well, we already had to much fun with that.
I can wait a bit longer for reports, but the patch fixes a regression in v6.15. I'd rather see it merged soon-ish.
Yeah, agree. I just want to make sure that we don't have a case where we never create a handle for a BO, but still try to have a FB for it.
I'm pretty sure such cases don't exists any more, but who knows?
Anyway feel free to push it to drm-misc-fixes as soon as possible, just keep it in the back of your mind to keep an eye on it.
Regards, Christian.
Best regards Thomas
Regards, Christian.
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Anusha Srivatsa asrivats@redhat.com Cc: Christian König christian.koenig@amd.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: stable@vger.kernel.org
drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev;
+ drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock));
+ if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +}
+/**
- drm_gem_object_handle_get_unlocked - acquire reference on user-space handles
- @obj: GEM object
- Acquires a reference on the GEM buffer object's handle. Required
- to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked()
- to release the reference.
- */
+void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev;
+ guard(mutex)(&dev->object_name_lock);
+ drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */ + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked);
/** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/**
- drm_gem_object_handle_put_unlocked - releases reference on user-space handles
- @obj: GEM object
- Releases a reference on the GEM buffer object's handle. Possibly releases
- the GEM buffer object and associated dma-buf objects.
- */
+void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj);
+ drm_gem_object_handle_get(obj); /* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i; for (i = 0; i < fb->format->num_planes; i++) - drm_gem_object_put(fb->obj[i]); + drm_gem_object_handle_put_unlocked(fb->obj[i]); drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } + drm_gem_object_handle_get_unlocked(objs[i]); + drm_gem_object_put(objs[i]); min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } } ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret) - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; return 0; -err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i; - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep);
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628 [ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
On 03.07.25 15:37, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Sure we can, but I would rather vote for fixing this at least for now. Those patches are not just cleanup, but are fixing rare occurring real world problems.
If we can't get it working in the next week or so we can still revert back to a working state.
What exactly is the issue? That cursors don't necessarily have GEM handles? If yes how we grab/drop handle refs when we have a DMA-buf?
Regards, Christian.
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
Hi
Am 03.07.25 um 15:45 schrieb Christian König:
On 03.07.25 15:37, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Sure we can, but I would rather vote for fixing this at least for now. Those patches are not just cleanup, but are fixing rare occurring real world problems.
If we can't get it working in the next week or so we can still revert back to a working state.
What exactly is the issue? That cursors don't necessarily have GEM handles? If yes how we grab/drop handle refs when we have a DMA-buf?
A dozen drivers apparently use drm_gem_fb_destroy() but not drm_gem_fb_init_with_funcs(). So they don't take the ref on the handle. That's what we're seeing here. Fixing this would mean to go through all affected drivers and take the handle refs an needed. The shortcut would be to take the handle refs in drm_framebuffer_init() and put them in drm_framebuffer_cleanup(). Those are the minimal calls for all implementations. But there's the fbdev code of some drivers that does magic hackery on framebuffer and object allocation. so whatever we do, it's likely not a quick fixup. Best regards Thomas
Regards, Christian.
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
On 03.07.25 15:54, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 15:45 schrieb Christian König:
On 03.07.25 15:37, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Sure we can, but I would rather vote for fixing this at least for now. Those patches are not just cleanup, but are fixing rare occurring real world problems.
If we can't get it working in the next week or so we can still revert back to a working state.
What exactly is the issue? That cursors don't necessarily have GEM handles? If yes how we grab/drop handle refs when we have a DMA-buf?
A dozen drivers apparently use drm_gem_fb_destroy() but not drm_gem_fb_init_with_funcs(). So they don't take the ref on the handle. That's what we're seeing here. Fixing this would mean to go through all affected drivers and take the handle refs an needed. The shortcut would be to take the handle refs in drm_framebuffer_init() and put them in drm_framebuffer_cleanup(). Those are the minimal calls for all implementations. But there's the fbdev code of some drivers that does magic hackery on framebuffer and object allocation. so whatever we do, it's likely not a quick fixup. Best regards Thomas
Ok that sounds worse than I thought it would be. Feel free to add my Acked-by to a revert for now.
Thanks, Christian.
Regards, Christian.
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
Hi
Am 03.07.25 um 15:56 schrieb Christian König:
On 03.07.25 15:54, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 15:45 schrieb Christian König:
On 03.07.25 15:37, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Sure we can, but I would rather vote for fixing this at least for now. Those patches are not just cleanup, but are fixing rare occurring real world problems.
If we can't get it working in the next week or so we can still revert back to a working state.
What exactly is the issue? That cursors don't necessarily have GEM handles? If yes how we grab/drop handle refs when we have a DMA-buf?
A dozen drivers apparently use drm_gem_fb_destroy() but not drm_gem_fb_init_with_funcs(). So they don't take the ref on the handle. That's what we're seeing here. Fixing this would mean to go through all affected drivers and take the handle refs an needed. The shortcut would be to take the handle refs in drm_framebuffer_init() and put them in drm_framebuffer_cleanup(). Those are the minimal calls for all implementations. But there's the fbdev code of some drivers that does magic hackery on framebuffer and object allocation. so whatever we do, it's likely not a quick fixup. Best regards Thomas
Ok that sounds worse than I thought it would be. Feel free to add my Acked-by to a revert for now.
Right now, the problem with ->dma-buf being NULL apparently only happens with gem-shmem, which uses correct drm_gem_fb_create() correctly.
So an alternative would be to tie the use of drm_gem_fb_destroy() to drm_gem_fb_init_with_funcs() (or the drm_gem_fb_create functions). Any driver that does not use these would also not be allowed to use drm_gem_fb_destroy(). The affected drivers would get their own destroy code that keeps on putting objects instead of handles (as before). I guess we would see occasional bug reports about ->dma-buf being NULL, but we could address them one by one. It's a game of whack-a-mole though.
Best regards Thomas
Thanks, Christian.
Regards, Christian.
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
Hi,
before I give up on the issue, could you please test the attached patch?
Best regards Thomas
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628 [ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki
Am Donnerstag, dem 03.07.2025 um 18:09 +0200 schrieb Thomas Zimmermann:
Hi,
before I give up on the issue, could you please test the attached patch?
Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
I applied the patch on top of next-20250703
$ git log --oneline 18ee3ed3cb60 (HEAD -> drm_gem_object_handle_put) drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
and it solves the issue for me (i.e. no warnings).
Bert Karwatzki
Hi
Am 03.07.25 um 19:23 schrieb Bert Karwatzki:
Am Donnerstag, dem 03.07.2025 um 18:09 +0200 schrieb Thomas Zimmermann:
Hi,
before I give up on the issue, could you please test the attached patch?
Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
I applied the patch on top of next-20250703
$ git log --oneline 18ee3ed3cb60 (HEAD -> drm_gem_object_handle_put) drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
and it solves the issue for me (i.e. no warnings).
Great, thanks for testing. If nothing else, that's the minimal workaround.
Here's another patch, which should solve the problem for all drivers. Could you please revert the old fix and apply the new one and test again?
Best regards Thomas
Bert Karwatzki
Am Freitag, dem 04.07.2025 um 09:51 +0200 schrieb Thomas Zimmermann:
Hi
Am 03.07.25 um 19:23 schrieb Bert Karwatzki:
Am Donnerstag, dem 03.07.2025 um 18:09 +0200 schrieb Thomas Zimmermann:
Hi,
before I give up on the issue, could you please test the attached patch?
Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
I applied the patch on top of next-20250703
$ git log --oneline 18ee3ed3cb60 (HEAD -> drm_gem_object_handle_put) drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
and it solves the issue for me (i.e. no warnings).
Great, thanks for testing. If nothing else, that's the minimal workaround.
Here's another patch, which should solve the problem for all drivers. Could you please revert the old fix and apply the new one and test again?
Best regards Thomas
Bert Karwatzki
Applied your patch after reverting:
$ git log --oneline f4e557e3ae37 (HEAD -> drm_gem_object_handle_put) drm/framebuffer: Acquire internal references on GEM handles 49f9aa27dc15 Revert "drm/amdgpu: Provide custom framebuffer destroy function" 18ee3ed3cb60 drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
again everything works without warning.
Bert Karwatzki
Hi
Am 04.07.25 um 10:21 schrieb Bert Karwatzki:
Am Freitag, dem 04.07.2025 um 09:51 +0200 schrieb Thomas Zimmermann:
Hi
Am 03.07.25 um 19:23 schrieb Bert Karwatzki:
Am Donnerstag, dem 03.07.2025 um 18:09 +0200 schrieb Thomas Zimmermann:
Hi,
before I give up on the issue, could you please test the attached patch?
Best regards Thomas
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
I applied the patch on top of next-20250703
$ git log --oneline 18ee3ed3cb60 (HEAD -> drm_gem_object_handle_put) drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
and it solves the issue for me (i.e. no warnings).
Great, thanks for testing. If nothing else, that's the minimal workaround.
Here's another patch, which should solve the problem for all drivers. Could you please revert the old fix and apply the new one and test again?
Best regards Thomas
Bert Karwatzki
Applied your patch after reverting:
$ git log --oneline f4e557e3ae37 (HEAD -> drm_gem_object_handle_put) drm/framebuffer: Acquire internal references on GEM handles 49f9aa27dc15 Revert "drm/amdgpu: Provide custom framebuffer destroy function" 18ee3ed3cb60 drm/amdgpu: Provide custom framebuffer destroy function 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
again everything works without warning.
Thanks again. I'll submit this patch for review then.
Best regards Thomas
Bert Karwatzki
linux-stable-mirror@lists.linaro.org