Hi,
This small patchset adds three new IOCTLs that can be used to attach,
detach, or transfer from/to a DMABUF object.
This was surprisingly easy to add, as the functionfs code only uses
scatterlists for transfers and allows specifying the number of bytes to
transfer. The bulk of the code is then for general DMABUF accounting.
The patchset isn't tagged RFC but comments are very welcome, there are
some things I am not 100% sure about: ffs_dma_resv_lock (with no
ww_acquire_ctx), and I'm using pr_debug which feels wrong. Also, I
should probably add documentation? The current IOCTLs for functionfs
were not documented, as far as I can tell.
We use it with DMABUFs created with udmabuf, that we attach to the
functionfs interface and to IIO devices (with a DMABUF interface for
IIO, on its way to upstream too), to transfer samples from high-speed
transceivers to USB in a zero-copy fashion.
Cheers,
-Paul
Paul Cercueil (2):
usb: gadget: Support already-mapped DMA SGs
usb: gadget: functionfs: Add DMABUF import interface
drivers/usb/gadget/function/f_fs.c | 398 ++++++++++++++++++++++++++++
drivers/usb/gadget/udc/core.c | 7 +-
include/linux/usb/gadget.h | 2 +
include/uapi/linux/usb/functionfs.h | 14 +-
4 files changed, 419 insertions(+), 2 deletions(-)
--
2.39.2
'bulk' description taken from another in the same file.
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/ttm/ttm_bo.c:98: warning: Function parameter or member 'bulk' not described in 'ttm_bo_set_bulk_move'
drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Function parameter or member 'placement' not described in 'ttm_bo_mem_space'
drivers/gpu/drm/ttm/ttm_bo.c:768: warning: Excess function parameter 'proposed_placement' description in 'ttm_bo_mem_space'
Cc: Christian Koenig <christian.koenig(a)amd.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/gpu/drm/ttm/ttm_bo.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 459f1b4440daa..d056d28f8758a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -84,6 +84,7 @@ EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
* ttm_bo_set_bulk_move - update BOs bulk move object
*
* @bo: The buffer object.
+ * @bulk: bulk move structure
*
* Update the BOs bulk move object, making sure that resources are added/removed
* as well. A bulk move allows to move many resource on the LRU at once,
@@ -748,7 +749,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
*
* @bo: Pointer to a struct ttm_buffer_object. the data of which
* we want to allocate space for.
- * @proposed_placement: Proposed new placement for the buffer object.
+ * @placement: Proposed new placement for the buffer object.
* @mem: A struct ttm_resource.
* @ctx: if and how to sleep, lock buffers and alloc memory
*
--
2.40.0.rc1.284.g88254d51c5-goog
From: Rob Clark <robdclark(a)chromium.org>
Inspired by https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vetter@ff…
it seemed like a good idea to get rid of memory allocation in job_run()
by embedding the hw dma_fence in the job/submit struct.
Applies on top of https://patchwork.freedesktop.org/series/93035/ but I
can re-work it to swap the order. I think the first patch would be
useful to amdgpu and perhaps anyone else embedding the hw_fence in the
struct containing drm_sched_job.
Rob Clark (2):
dma-buf/dma-fence: Add dma_fence_init_noref()
drm/msm: Embed the hw_fence in msm_gem_submit
drivers/dma-buf/dma-fence.c | 43 +++++++++++++++++++-------
drivers/gpu/drm/msm/msm_fence.c | 45 +++++++++++-----------------
drivers/gpu/drm/msm/msm_fence.h | 2 +-
drivers/gpu/drm/msm/msm_gem.h | 10 +++----
drivers/gpu/drm/msm/msm_gem_submit.c | 8 ++---
drivers/gpu/drm/msm/msm_gpu.c | 4 +--
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 +--
include/linux/dma-fence.h | 2 ++
8 files changed, 66 insertions(+), 52 deletions(-)
--
2.39.2
From: Rob Clark <robdclark(a)chromium.org>
Inspired by https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vetter@ff…
it seemed like a good idea to get rid of memory allocation in job_run()
and use lockdep annotations to yell at us about anything that could
deadlock against shrinker/reclaim. Anything that can trigger reclaim,
or block on any other thread that has triggered reclaim, can block the
GPU shrinker from releasing memory if it is waiting the job to complete,
causing deadlock.
The first two patches avoid memory allocation for the hw_fence by
embedding it in the already allocated submit object. The next three
decouple various allocations that were done in the hw_init path, but
only the first time, to let lockdep see that they won't happen in the
job_run() path. (The hw_init() path re-initializes the GPU after runpm
resume, etc, which can happen in the job_run() path.)
The remaining patches clean up locking issues in various corners of PM
and interconnect which happen in the runpm path. These fixes can be
picked up independently by the various maintainers. In all cases I've
added lockdep annotations to help keep the runpm resume path deadlock-
free vs reclaim, but I've broken those out into their own patches.. it
is possible that these might find issues in other code-paths not hit on
the hw I have. (It is a bit tricky because of locks held across call-
backs, such as devfreq->lock held across devfreq_dev_profile callbacks.
I've audited these and other callbacks in icc, etc, to look for problems
and fixed one I found in smd-rpm. But that took me through a number of
drivers and subsystems that I am not familiar with so it is entirely
possible that I overlooked some problematic allocations.)
There is one remaining issue to resolve before we can enable the job_run
annotations, but it is entirely self contained in drm/msm/gem. So it
should not block review of these patches. So I figured it best to send
out what I have so far.
Rob Clark (13):
dma-buf/dma-fence: Add dma_fence_init_noref()
drm/msm: Embed the hw_fence in msm_gem_submit
drm/msm/gpu: Move fw loading out of hw_init() path
drm/msm/gpu: Move BO allocation out of hw_init
drm/msm/a6xx: Move ioremap out of hw_init path
PM / devfreq: Drop unneed locking to appease lockdep
PM / devfreq: Teach lockdep about locking order
PM / QoS: Fix constraints alloc vs reclaim locking
PM / QoS: Decouple request alloc from dev_pm_qos_mtx
PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order
soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
interconnect: Fix locking for runpm vs reclaim
interconnect: Teach lockdep about icc_bw_lock order
drivers/base/power/qos.c | 83 ++++++++++++++++------
drivers/devfreq/devfreq.c | 52 +++++++-------
drivers/dma-buf/dma-fence.c | 43 ++++++++---
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 48 ++++++-------
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 +++--
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 ++++++------
drivers/gpu/drm/msm/adreno/adreno_device.c | 6 ++
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 9 +--
drivers/gpu/drm/msm/msm_fence.c | 43 +++++------
drivers/gpu/drm/msm/msm_fence.h | 2 +-
drivers/gpu/drm/msm/msm_gem.h | 10 +--
drivers/gpu/drm/msm/msm_gem_submit.c | 8 +--
drivers/gpu/drm/msm/msm_gpu.c | 4 +-
drivers/gpu/drm/msm/msm_gpu.h | 6 ++
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 +-
drivers/interconnect/core.c | 18 ++++-
drivers/soc/qcom/smd-rpm.c | 2 +-
include/linux/dma-fence.h | 2 +
18 files changed, 237 insertions(+), 167 deletions(-)
--
2.39.2
Although there is usually not such a limitation (and when there is it is
often only because the driver forgot to change the super small default),
it is still correct here to break scatterlist element into chunks of
dma_max_mapping_size().
This might cause some issues for users with misbehaving drivers. If
bisecting has landed you on this commit, make sure your drivers both set
dma_set_max_seg_size() and are checking for contiguousness correctly.
Signed-off-by: Andrew Davis <afd(a)ti.com>
---
Changes from v2:
- Rebase v6.3-rc1
Changes from v1:
- Fixed mixed declarations and code warning
drivers/dma-buf/heaps/cma_heap.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 1131fb943992..579261a46fa3 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -53,16 +53,18 @@ static int cma_heap_attach(struct dma_buf *dmabuf,
{
struct cma_heap_buffer *buffer = dmabuf->priv;
struct dma_heap_attachment *a;
+ size_t max_segment;
int ret;
a = kzalloc(sizeof(*a), GFP_KERNEL);
if (!a)
return -ENOMEM;
- ret = sg_alloc_table_from_pages(&a->table, buffer->pages,
- buffer->pagecount, 0,
- buffer->pagecount << PAGE_SHIFT,
- GFP_KERNEL);
+ max_segment = dma_get_max_seg_size(attachment->dev);
+ ret = sg_alloc_table_from_pages_segment(&a->table, buffer->pages,
+ buffer->pagecount, 0,
+ buffer->pagecount << PAGE_SHIFT,
+ max_segment, GFP_KERNEL);
if (ret) {
kfree(a);
return ret;
--
2.39.2
Set allow_cache_hints to 1 for the vb2_queue capture queue in the
STM32MP15xx DCMI V4L2 driver. This allows us to allocate buffers
with the V4L2_MEMORY_FLAG_NON_COHERENT set. On STM32MP15xx SoCs,
this enables caching for this memory, which improves performance
when being read from CPU.
This change should be safe from race conditions since videobuf2
already invalidates or flushes the appropriate cache lines in
its prepare() and finish() methods.
Tested on a STM32MP157F SoC. Resulted in 4x buffer access speedup.
Signed-off-by: Marek Vasut <marex(a)denx.de>
---
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: Alexandre Torgue <alexandre.torgue(a)foss.st.com>
Cc: Hugues Fruchet <hugues.fruchet(a)foss.st.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Maxime Coquelin <mcoquelin.stm32(a)gmail.com>
Cc: Philipp Zabel <p.zabel(a)pengutronix.de>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-media(a)vger.kernel.org
Cc: linux-stm32(a)st-md-mailman.stormreply.com
---
drivers/media/platform/st/stm32/stm32-dcmi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/media/platform/st/stm32/stm32-dcmi.c b/drivers/media/platform/st/stm32/stm32-dcmi.c
index ad8e9742e1ae7..2ac508da5ba36 100644
--- a/drivers/media/platform/st/stm32/stm32-dcmi.c
+++ b/drivers/media/platform/st/stm32/stm32-dcmi.c
@@ -2084,6 +2084,7 @@ static int dcmi_probe(struct platform_device *pdev)
q->mem_ops = &vb2_dma_contig_memops;
q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
q->min_buffers_needed = 2;
+ q->allow_cache_hints = 1;
q->dev = &pdev->dev;
ret = vb2_queue_init(q);
--
2.39.2