This small series adds support for non-coherent video capture buffers on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's allocated by dma-contig allocator. Patch 2 allows non-coherent allocations on the rkisp1 capture queue. Some timing measurements are provided in the commit message of patch 2.
Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com --- Changes in v3: - ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access - invalidate/flush kernel mappings as appropriate if they exist - use dma_sync_sgtable_* instead of dma_sync_sg_* - Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gm...
Changes in v2: - Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers. - Add cache management timing information to patch 2 commit message. - Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@gm...
--- Mikhail Rudenko (2): media: videobuf2: Fix dmabuf cache sync/flush in dma-contig media: rkisp1: Allow non-coherent video capture buffers
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ .../platform/rockchip/rkisp1/rkisp1-capture.c | 1 + 2 files changed, 23 insertions(+) --- base-commit: c4b7779abc6633677e6edb79e2809f4f61fde157 change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba
Best regards,
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible.
Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers.
Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com --- .../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28ce0641fe726e63 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { + struct vb2_dc_buf *buf = dbuf->priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!buf->non_coherent_mem) + return 0; + + if (buf->vaddr) + invalidate_kernel_vmap_range(buf->vaddr, buf->size); + + dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir); + return 0; }
@@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { + struct vb2_dc_buf *buf = dbuf->priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!buf->non_coherent_mem) + return 0; + + if (buf->vaddr) + flush_kernel_vmap_range(buf->vaddr, buf->size); + + dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir); + return 0; }
On Tue, Jan 28, 2025 at 9:36 PM Mikhail Rudenko mike.rudenko@gmail.com wrote:
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible.
Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers.
Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28ce0641fe726e63 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
invalidate_kernel_vmap_range(buf->vaddr, buf->size);
dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
return 0;
}
@@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
flush_kernel_vmap_range(buf->vaddr, buf->size);
dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
return 0;
}
I took some time (over)thinking the kernel vmap range synchronization, because these functions can be called both from the kernel space using respective dma_buf_*() kAPI and also from the user space using the DMA_BUF_SYNC IOCTLs, so we could in theory have the multiple invocations racing with each other, but then I realized that we don't really provide any guarantees for concurrent writes and reads from the CPU, so I believe this should work fine. Sorry for the delay.
Acked-by: Tomasz Figa tfiga@chromium.org
Let me add @Christoph Hellwig and @Robin Murphy just in case I'm wrong on that, though... Hans, let's give them some time to take a look before applying this.
Best regards, Tomasz
Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible.
Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers.
Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c e0641fe726e63 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
- struct vb2_dc_buf *buf = dbuf->priv;
- struct sg_table *sgt = buf->dma_sgt;
- if (!buf->non_coherent_mem)
return 0;
- if (buf->vaddr)
invalidate_kernel_vmap_range(buf->vaddr, buf->size);
Am I correct that this is mostly to prevent the kernel from reading back old data from the cache after an application or other driver did CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and DMA_BIDIRECTIONAL ?
As for pending kernel writes, they should have been flushed before the buffer is made available for dequeue. And any access while a buffer is queued is concurrent access, which is expected to have undefined behaviour.
- dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
Isn't there a link to make between buf->dma_dir and direcction before calling this ? Also, shouldn't we use direction insead of buf->dma_dir to possibly limit the scope ?
return 0; } @@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
- struct vb2_dc_buf *buf = dbuf->priv;
- struct sg_table *sgt = buf->dma_sgt;
- if (!buf->non_coherent_mem)
return 0;
- if (buf->vaddr)
flush_kernel_vmap_range(buf->vaddr, buf->size);
- dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
Similar questions for the end_cpu_access implementation.
Nicolas
return 0; }
On Fri, Feb 28, 2025 at 12:30 PM Nicolas Dufresne nicolas@ndufresne.ca wrote:
Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible.
Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers.
Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c e0641fe726e63 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
invalidate_kernel_vmap_range(buf->vaddr, buf->size);
Am I correct that this is mostly to prevent the kernel from reading back old data from the cache after an application or other driver did CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and DMA_BIDIRECTIONAL ?
Note that this function must also synchronize between the user-space and kernel mappings, where the DMA direction doesn't really matter. Also it's unlikely for it to be called when not needed - why would one begin a CPU access before the DMA, when the DMA is FROM_DEVICE?
As for pending kernel writes, they should have been flushed before the buffer is made available for dequeue.
There is no implicit flushing for imported DMA-bufs. All the flushing needs to be executed directly by the CPU accessors by surrounding the access with begin and end CPU access, be it in the kernel or userspace.
And any access while a buffer is queued is concurrent access, which is expected to have undefined behaviour.
Correct.
dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
Isn't there a link to make between buf->dma_dir and direcction before calling this ? Also, shouldn't we use direction insead of buf->dma_dir to possibly limit the scope ?
Oh, yes, that's a good catch. It should be |direction| passed here and not |buf->dma_dir|, since the former determines what CPU access will be done.
return 0;
}
@@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
flush_kernel_vmap_range(buf->vaddr, buf->size);
dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
Similar questions for the end_cpu_access implementation.
Yeah, same here.
Nicolas
return 0;
}
On 2025-02-28 at 19:25 +09, Tomasz Figa tfiga@chromium.org wrote:
On Fri, Feb 28, 2025 at 12:30 PM Nicolas Dufresne nicolas@ndufresne.ca wrote:
Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible.
Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers.
Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko mike.rudenko@gmail.com
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c e0641fe726e63 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
invalidate_kernel_vmap_range(buf->vaddr, buf->size);
Am I correct that this is mostly to prevent the kernel from reading back old data from the cache after an application or other driver did CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and DMA_BIDIRECTIONAL ?
Note that this function must also synchronize between the user-space and kernel mappings, where the DMA direction doesn't really matter. Also it's unlikely for it to be called when not needed - why would one begin a CPU access before the DMA, when the DMA is FROM_DEVICE?
As for pending kernel writes, they should have been flushed before the buffer is made available for dequeue.
There is no implicit flushing for imported DMA-bufs. All the flushing needs to be executed directly by the CPU accessors by surrounding the access with begin and end CPU access, be it in the kernel or userspace.
And any access while a buffer is queued is concurrent access, which is expected to have undefined behaviour.
Correct.
dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
Isn't there a link to make between buf->dma_dir and direcction before calling this ? Also, shouldn't we use direction insead of buf->dma_dir to possibly limit the scope ?
Oh, yes, that's a good catch.
Indeed, will fix this in v4.
It should be |direction| passed here and not |buf->dma_dir|, since the former determines what CPU access will be done.
return 0;
}
@@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) {
struct vb2_dc_buf *buf = dbuf->priv;
struct sg_table *sgt = buf->dma_sgt;
if (!buf->non_coherent_mem)
return 0;
if (buf->vaddr)
flush_kernel_vmap_range(buf->vaddr, buf->size);
dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
Similar questions for the end_cpu_access implementation.
Yeah, same here.
Noted, will fix.
Nicolas
return 0;
}
-- Best regards, Mikhail Rudenko
linux-stable-mirror@lists.linaro.org