Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed.
For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below.
[ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl]
In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping.
Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap.
Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code.
A discussion of various approaches to solving the problem is available at [1].
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann tzimmermann@suse.de Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse.... Cc: Shixiong Ou oushixiong@kylinos.cn Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@gmail.com Cc: Simona Vetter simona@ffwll.ch Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse.... # [1] --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 36 +++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 5743ebb2f1b7..5b33776eeece 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -285,6 +285,38 @@ static int amdgpu_dma_buf_begin_cpu_access(struct dma_buf *dma_buf, return ret; }
+static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + int ret; + + /* + * Pin to keep buffer in place while it's vmap'ed. The actual + * location is not important as long as it's mapable. + * + * This code is required for exporting to GEM-SHMEM without S/G table. + * Once GEM-SHMEM supports dynamic imports, it should be dropped. + */ + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK); + if (ret) + return ret; + ret = drm_gem_dmabuf_vmap(dma_buf, map); + if (ret) + amdgpu_bo_unpin(bo); + + return ret; +} + +static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + + drm_gem_dmabuf_vunmap(dma_buf, map); + amdgpu_bo_unpin(bo); +} + const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, - .vmap = drm_gem_dmabuf_vmap, - .vunmap = drm_gem_dmabuf_vunmap, + .vmap = amdgpu_dma_buf_vmap, + .vunmap = amdgpu_dma_buf_vunmap, };
/**
On 18.08.25 10:41, Thomas Zimmermann wrote:
Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed.
For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below.
[ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl]
In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping.
Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap.
Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code.
A discussion of various approaches to solving the problem is available at [1].
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann tzimmermann@suse.de Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse.... Cc: Shixiong Ou oushixiong@kylinos.cn Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@gmail.com Cc: Simona Vetter simona@ffwll.ch Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse.... # [1]
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 36 +++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 5743ebb2f1b7..5b33776eeece 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -285,6 +285,38 @@ static int amdgpu_dma_buf_begin_cpu_access(struct dma_buf *dma_buf, return ret; } +static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{
- struct drm_gem_object *obj = dma_buf->priv;
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
- int ret;
- /*
* Pin to keep buffer in place while it's vmap'ed. The actual
* location is not important as long as it's mapable.
Yeah, exactly that won't work here. Most of the locations are not CPU accessible.
You could use AMDGPU_GEM_DOMAIN_GTT, that should most likely work in all cases but isn't necessarily the most optimal solution.
Regards, Christian.
*
* This code is required for exporting to GEM-SHMEM without S/G table.
* Once GEM-SHMEM supports dynamic imports, it should be dropped.
*/
- ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK);
- if (ret)
return ret;
- ret = drm_gem_dmabuf_vmap(dma_buf, map);
- if (ret)
amdgpu_bo_unpin(bo);
- return ret;
+}
+static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{
- struct drm_gem_object *obj = dma_buf->priv;
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
- drm_gem_dmabuf_vunmap(dma_buf, map);
- amdgpu_bo_unpin(bo);
+}
const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap,
- .vmap = drm_gem_dmabuf_vmap,
- .vunmap = drm_gem_dmabuf_vunmap,
- .vmap = amdgpu_dma_buf_vmap,
- .vunmap = amdgpu_dma_buf_vunmap,
}; /**
Hi
Am 18.08.25 um 14:40 schrieb Christian König: [...]
+static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{
- struct drm_gem_object *obj = dma_buf->priv;
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
- int ret;
- /*
* Pin to keep buffer in place while it's vmap'ed. The actual
* location is not important as long as it's mapable.
Yeah, exactly that won't work here. Most of the locations are not CPU accessible.
You could use AMDGPU_GEM_DOMAIN_GTT, that should most likely work in all cases but isn't necessarily the most optimal solution.
No problem about that, but why not a bit more flexibility? When udl copies from the buffer, it is likely pinned to VRAM.
A bit mask of _CPU, _GTT, and _VRAM should work fine. The other domains are probably irrelevant for our use case.
Best regards Thomas
Regards, Christian.
*
* This code is required for exporting to GEM-SHMEM without S/G table.
* Once GEM-SHMEM supports dynamic imports, it should be dropped.
*/
- ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK);
- if (ret)
return ret;
- ret = drm_gem_dmabuf_vmap(dma_buf, map);
- if (ret)
amdgpu_bo_unpin(bo);
- return ret;
+}
+static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{
- struct drm_gem_object *obj = dma_buf->priv;
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
- drm_gem_dmabuf_vunmap(dma_buf, map);
- amdgpu_bo_unpin(bo);
+}
- const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin,
@@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap,
- .vmap = drm_gem_dmabuf_vmap,
- .vunmap = drm_gem_dmabuf_vunmap,
- .vmap = amdgpu_dma_buf_vmap,
- .vunmap = amdgpu_dma_buf_vunmap, };
/**
On 18.08.25 14:46, Thomas Zimmermann wrote:
Hi
Am 18.08.25 um 14:40 schrieb Christian König: [...]
+static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + int ret;
+ /* + * Pin to keep buffer in place while it's vmap'ed. The actual + * location is not important as long as it's mapable.
Yeah, exactly that won't work here. Most of the locations are not CPU accessible.
You could use AMDGPU_GEM_DOMAIN_GTT, that should most likely work in all cases but isn't necessarily the most optimal solution.
No problem about that, but why not a bit more flexibility? When udl copies from the buffer, it is likely pinned to VRAM.
A bit mask of _CPU, _GTT, and _VRAM should work fine. The other domains are probably irrelevant for our use case.
The problem is that as soon as you pin into this domain you get an error if you try to pin into another domain.
So if you try to use the same buffer with udl and amdgpu scanout and pin it into GTT -> error.
If you try to use the same buffer with udl and V4L and pin it into VRAM -> error.
There is not works for everybody case here and we need to guess. Pinning it into GTT is just what works most likely.
Regards, Christian.
Best regards Thomas
Regards, Christian.
+ * + * This code is required for exporting to GEM-SHMEM without S/G table. + * Once GEM-SHMEM supports dynamic imports, it should be dropped. + */ + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK); + if (ret) + return ret; + ret = drm_gem_dmabuf_vmap(dma_buf, map); + if (ret) + amdgpu_bo_unpin(bo);
+ return ret; +}
+static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+ drm_gem_dmabuf_vunmap(dma_buf, map); + amdgpu_bo_unpin(bo); +}
const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, - .vmap = drm_gem_dmabuf_vmap, - .vunmap = drm_gem_dmabuf_vunmap, + .vmap = amdgpu_dma_buf_vmap, + .vunmap = amdgpu_dma_buf_vunmap, }; /**
Hi Christian
Am 18.08.25 um 15:23 schrieb Christian König:
On 18.08.25 14:46, Thomas Zimmermann wrote:
Hi
Am 18.08.25 um 14:40 schrieb Christian König: [...]
+static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + int ret;
+ /* + * Pin to keep buffer in place while it's vmap'ed. The actual + * location is not important as long as it's mapable.
Yeah, exactly that won't work here. Most of the locations are not CPU accessible.
You could use AMDGPU_GEM_DOMAIN_GTT, that should most likely work in all cases but isn't necessarily the most optimal solution.
No problem about that, but why not a bit more flexibility? When udl copies from the buffer, it is likely pinned to VRAM.
A bit mask of _CPU, _GTT, and _VRAM should work fine. The other domains are probably irrelevant for our use case.
The problem is that as soon as you pin into this domain you get an error if you try to pin into another domain.
So if you try to use the same buffer with udl and amdgpu scanout and pin it into GTT -> error.
If you try to use the same buffer with udl and V4L and pin it into VRAM -> error.
What you're describing is exactly the current situation for udl plus amdgpu now. When udl tries to pin, amdgpu has already pinned the buffer to VRAM. So it doesn't work. There needs to be at least VRAM and GTT in the list. What does work is to try pinning to various domains in the preferred order.
Best regards Thomas
There is not works for everybody case here and we need to guess. Pinning it into GTT is just what works most likely.
Regards, Christian.
Best regards Thomas
Regards, Christian.
+ * + * This code is required for exporting to GEM-SHMEM without S/G table. + * Once GEM-SHMEM supports dynamic imports, it should be dropped. + */ + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK); + if (ret) + return ret; + ret = drm_gem_dmabuf_vmap(dma_buf, map); + if (ret) + amdgpu_bo_unpin(bo);
+ return ret; +}
+static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+ drm_gem_dmabuf_vunmap(dma_buf, map); + amdgpu_bo_unpin(bo); +}
const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, - .vmap = drm_gem_dmabuf_vmap, - .vunmap = drm_gem_dmabuf_vunmap, + .vmap = amdgpu_dma_buf_vmap, + .vunmap = amdgpu_dma_buf_vunmap, }; /**
linaro-mm-sig@lists.linaro.org