On Thu, Aug 27, 2020 at 10:31:41PM +0530, Amit Pundir wrote:
> On Thu, 27 Aug 2020 at 21:34, Greg Kroah-Hartman
> <gregkh(a)linuxfoundation.org> wrote:
> >
> > On Thu, Aug 27, 2020 at 09:31:27AM -0400, Laura Abbott wrote:
> > > On 8/27/20 8:36 AM, Greg Kroah-Hartman wrote:
> > > > The ION android code has long been marked to be removed, now that we
> > > > dma-buf support merged into the real part of the kernel.
> > > >
> > > > It was thought that we could wait to remove the ion kernel at a later
> > > > time, but as the out-of-tree Android fork of the ion code has diverged
> > > > quite a bit, and any Android device using the ion interface uses that
> > > > forked version and not this in-tree version, the in-tree copy of the
> > > > code is abandonded and not used by anyone.
> > > >
> > > > Combine this abandoned codebase with the need to make changes to it in
> > > > order to keep the kernel building properly, which then causes merge
> > > > issues when merging those changes into the out-of-tree Android code, and
> > > > you end up with two different groups of people (the in-kernel-tree
> > > > developers, and the Android kernel developers) who are both annoyed at
> > > > the current situation. Because of this problem, just drop the in-kernel
> > > > copy of the ion code now, as it's not used, and is only causing problems
> > > > for everyone involved.
> > > >
> > > > Cc: "Arve Hjønnevåg" <arve(a)android.com>
> > > > Cc: "Christian König" <christian.koenig(a)amd.com>
> > > > Cc: Christian Brauner <christian(a)brauner.io>
> > > > Cc: Christoph Hellwig <hch(a)infradead.org>
> > > > Cc: Hridya Valsaraju <hridya(a)google.com>
> > > > Cc: Joel Fernandes <joel(a)joelfernandes.org>
> > > > Cc: John Stultz <john.stultz(a)linaro.org>
> > > > Cc: Laura Abbott <laura(a)labbott.name>
> > > > Cc: Martijn Coenen <maco(a)android.com>
> > > > Cc: Shuah Khan <shuah(a)kernel.org>
> > > > Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> > > > Cc: Suren Baghdasaryan <surenb(a)google.com>
> > > > Cc: Todd Kjos <tkjos(a)android.com>
> > > > Cc: devel(a)driverdev.osuosl.org
> > > > Cc: dri-devel(a)lists.freedesktop.org
> > > > Cc: linaro-mm-sig(a)lists.linaro.org
> > > > Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> > >
> > > We discussed this at the Android MC on Monday and the plan was to
> > > remove it after the next LTS release.
> >
> > I know it was discussed, my point is that it is actually causing
> > problems now (with developers who want to change the internal kernel api
> > hitting issues, and newbies trying to clean up code in ways that isn't
> > exactly optimal wasting maintainer cycles), and that anyone who uses
> > this code, is not actually using this version of the code. Everyone who
> > relies on ion right now, is using the version that is in the Android
> > common kernel tree, which has diverged from this in-kernel way quite a
> > bit now for the reason that we didn't want to take any of those new
> > features in the in-kernel version.
> >
> > So this is a problem that we have caused by just wanting to wait, no one
> > is using this code, combined with it causing problems for the upstream
> > developers.
> >
> > There is nothing "magic" about the last kernel of the year that requires
> > this code to sit here until then. At that point in time, all users
> > will, again, be using the forked Android kernel version, and if we
> > delete this now here, that fork can remain just fine, with the added
> > benifit of it reducing developer workloads here in-kernel.
> >
> > So why wait?
>
> Hi,
>
> I don't know what is the right thing to do here. I just want to
> highlight that AOSP's audio (codec2) HAL depends on the ION system
> heap and it will break AOSP for people who boot mainline on their
> devices, even for just testing purpose like we do in Linaro. Right now
> we need only 1 (Android specific out-of-tree) patch to boot AOSP with
> mainline and Sumit is already trying to upstream that vma naming
> patch. Removal of in-kernel ION, will just add more to that delta.
As AOSP will continue to rely on ION after December of this year, all
you are doing is postponing the inevitable a few more months.
Push back on the Android team to fix up the code to not use ION, they
know this needs to happen.
thanks,
greg k-h
On Thu, Aug 27, 2020 at 09:31:27AM -0400, Laura Abbott wrote:
> On 8/27/20 8:36 AM, Greg Kroah-Hartman wrote:
> > The ION android code has long been marked to be removed, now that we
> > dma-buf support merged into the real part of the kernel.
> >
> > It was thought that we could wait to remove the ion kernel at a later
> > time, but as the out-of-tree Android fork of the ion code has diverged
> > quite a bit, and any Android device using the ion interface uses that
> > forked version and not this in-tree version, the in-tree copy of the
> > code is abandonded and not used by anyone.
> >
> > Combine this abandoned codebase with the need to make changes to it in
> > order to keep the kernel building properly, which then causes merge
> > issues when merging those changes into the out-of-tree Android code, and
> > you end up with two different groups of people (the in-kernel-tree
> > developers, and the Android kernel developers) who are both annoyed at
> > the current situation. Because of this problem, just drop the in-kernel
> > copy of the ion code now, as it's not used, and is only causing problems
> > for everyone involved.
> >
> > Cc: "Arve Hjønnevåg" <arve(a)android.com>
> > Cc: "Christian König" <christian.koenig(a)amd.com>
> > Cc: Christian Brauner <christian(a)brauner.io>
> > Cc: Christoph Hellwig <hch(a)infradead.org>
> > Cc: Hridya Valsaraju <hridya(a)google.com>
> > Cc: Joel Fernandes <joel(a)joelfernandes.org>
> > Cc: John Stultz <john.stultz(a)linaro.org>
> > Cc: Laura Abbott <laura(a)labbott.name>
> > Cc: Martijn Coenen <maco(a)android.com>
> > Cc: Shuah Khan <shuah(a)kernel.org>
> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Cc: Suren Baghdasaryan <surenb(a)google.com>
> > Cc: Todd Kjos <tkjos(a)android.com>
> > Cc: devel(a)driverdev.osuosl.org
> > Cc: dri-devel(a)lists.freedesktop.org
> > Cc: linaro-mm-sig(a)lists.linaro.org
> > Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>
> We discussed this at the Android MC on Monday and the plan was to
> remove it after the next LTS release.
I know it was discussed, my point is that it is actually causing
problems now (with developers who want to change the internal kernel api
hitting issues, and newbies trying to clean up code in ways that isn't
exactly optimal wasting maintainer cycles), and that anyone who uses
this code, is not actually using this version of the code. Everyone who
relies on ion right now, is using the version that is in the Android
common kernel tree, which has diverged from this in-kernel way quite a
bit now for the reason that we didn't want to take any of those new
features in the in-kernel version.
So this is a problem that we have caused by just wanting to wait, no one
is using this code, combined with it causing problems for the upstream
developers.
There is nothing "magic" about the last kernel of the year that requires
this code to sit here until then. At that point in time, all users
will, again, be using the forked Android kernel version, and if we
delete this now here, that fork can remain just fine, with the added
benifit of it reducing developer workloads here in-kernel.
So why wait?
thanks,
greg k-h
On Fri, Aug 21, 2020 at 2:14 AM Kunihiko Hayashi
<hayashi.kunihiko(a)socionext.com> wrote:
>
> On 2020/08/01 4:38, John Stultz wrote:
> > On Fri, Jul 31, 2020 at 2:32 AM Kunihiko Hayashi
> > <hayashi.kunihiko(a)socionext.com> wrote:
> >> On 2020/07/29 4:17, John Stultz wrote:
> >>> Do you have a upstream driver that you plan to make use this new call?
> >>
> >> Unfortunately I don't have an upstream driver using this call.
> >>
> >> This call is called from dma-buf heaps "importer" or "customer",
> >> and I only made an example (do nothing) importer driver
> >> to test the call.
> >>
> >>> We want to have in-tree users of code added.
> >>
> >> I think this is a generic way to use non-default CMA heaps, however,
> >> we need in-tree "importer" drivers to want to use non-default CMA heaps.
> >> I don't find it from now.
> >>
> >
> > Yea, I and again, I do agree this is functionality that will be
> > needed. But we'll need to wait for a user (camera driver, etc which
> > would utilize the reserved cma region) before we can merge it
> > upstream. :( Do let me know if you have an out of tree driver that
> > would make use of it, and we can see what can be done to help upstream
> > things.
>
> Sorry for late.
> Before I prepare or find a user driver as "importer",
> I think something is different in this patch.
>
> This patch makes it possible to treat non-default CMA connected to
> "importer" device with memory-region as dma-buf heaps.
>
> However, the allocated memory from this dma-buf heaps can be used
> for "any" devices, and the "importer" can treat memories from other
> dma-buf heaps.
>
> So, the "importer" and the non-default CMA aren't directly related,
> and I think an "exporter" for the non-default CMA should be enabled.
>
> In paticular, the kernel initializer (as an "exporter") calls
> dma_heap_add_cma() for all CMAs defined in Devicetree, and
> the device files associated with each CMA appear under "/dev/dma_heap/".
> For example:
>
> /dev/dma_heap/linux,cma@10000000
> /dev/dma_heap/linux,cma@11000000
> /dev/dma_heap/linux,cma@12000000
> ...
>
> All of these device files can be fairly allocated to any "importer" device.
>
> Actually I think that the kernel should executes dma_heap_add_cma()
> for ALL defined reserved-memory nodes.
>
> If this idea hasn't been discussed yet and this is reasonable,
> I'll prepare RFC patches.
So yes! An earlier version of the CMA heap I submitted did add all CMA
regions as accessible heaps as you propose here.
However, the concern was that in some cases, those regions are device
specific reserved memory that the driver is probably expecting to have
exclusive access. To allow (sufficiently privileged, or misconfigured)
userland to be able to allocate out of that seemed like it might cause
trouble, and instead we should have CMA regions explicitly exported.
There was some proposal to add a dt property to the reserved memory
section (similar to linux,cma-default) and use that to do the
exporting, but other discussions seemed to prefer having drivers
export it explicitly in a fashion very similar to what your earlier
patch does. The only trouble is we just need an upstream driver to add
such a call in the series before we can merge it.
thanks
-john
These patch series to introduce a new dma heap, chunk heap.
That heap is needed for special HW that requires bulk allocation of
fixed high order pages. For example, 64MB dma-buf pages are made up
to fixed order-4 pages * 1024.
The chunk heap uses alloc_pages_bulk to allocate high order page.
https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@kernel.org
The chunk heap is registered by device tree with alignment and memory node
of contiguous memory allocator(CMA). Alignment defines chunk page size.
For example, alignment 0x1_0000 means chunk page size is 64KB.
The phandle to memory node indicates contiguous memory allocator(CMA).
If device node doesn't have cma, the registration of chunk heap fails.
The patchset includes the following:
- export dma-heap API to register kernel module dma heap.
- add chunk heap implementation.
- document of device tree to register chunk heap
Hyesoo Yu (3):
dma-buf: add missing EXPORT_SYMBOL_GPL() for dma heaps
dma-buf: heaps: add chunk heap to dmabuf heaps
dma-heap: Devicetree binding for chunk heap
.../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 +++++
drivers/dma-buf/dma-heap.c | 2 +
drivers/dma-buf/heaps/Kconfig | 9 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/chunk_heap.c | 222 +++++++++++++++++++++
drivers/dma-buf/heaps/heap-helpers.c | 2 +
6 files changed, 282 insertions(+)
create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
--
2.7.4
Fix W=1 compile warnings (invalid kerneldoc):
drivers/dma-buf/dma-buf.c:328: warning: Function parameter or member 'dmabuf' not described in 'dma_buf_set_name'
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/dma-buf/dma-buf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 1699a8e309ef..58564d82a3a2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -316,9 +316,9 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
* name of the dma-buf if the same piece of memory is used for multiple
* purpose between different devices.
*
- * @dmabuf [in] dmabuf buffer that will be renamed.
- * @buf: [in] A piece of userspace memory that contains the name of
- * the dma-buf.
+ * @dmabuf: [in] dmabuf buffer that will be renamed.
+ * @buf: [in] A piece of userspace memory that contains the name of
+ * the dma-buf.
*
* Returns 0 on success. If the dma-buf buffer is already attached to
* devices, return -EBUSY.
--
2.17.1
This patchset implements the current proposal for virtio cross-device
resource sharing [1]. It will be used to import virtio resources into
the virtio-video driver currently under discussion [2]. The patch
under consideration to add support in the virtio-video driver is [3].
It uses the APIs from v3 of this series, but the changes to update it
are relatively minor.
This patchset adds a new flavor of dma-bufs that supports querying the
underlying virtio object UUID, as well as adding support for exporting
resources from virtgpu.
[1] https://markmail.org/thread/2ypjt5cfeu3m6lxu
[2] https://markmail.org/thread/p5d3k566srtdtute
[3] https://markmail.org/thread/j4xlqaaim266qpks
v6 -> v7 changes:
- Fix most strict checkpatch comments
David Stevens (3):
virtio: add dma-buf support for exported objects
virtio-gpu: add VIRTIO_GPU_F_RESOURCE_UUID feature
drm/virtio: Support virtgpu exported resources
drivers/gpu/drm/virtio/virtgpu_drv.c | 3 +
drivers/gpu/drm/virtio/virtgpu_drv.h | 21 ++++++
drivers/gpu/drm/virtio/virtgpu_kms.c | 4 ++
drivers/gpu/drm/virtio/virtgpu_prime.c | 96 +++++++++++++++++++++++++-
drivers/gpu/drm/virtio/virtgpu_vq.c | 55 +++++++++++++++
drivers/virtio/Makefile | 2 +-
drivers/virtio/virtio.c | 6 ++
drivers/virtio/virtio_dma_buf.c | 85 +++++++++++++++++++++++
include/linux/virtio.h | 1 +
include/linux/virtio_dma_buf.h | 37 ++++++++++
include/uapi/linux/virtio_gpu.h | 19 +++++
11 files changed, 325 insertions(+), 4 deletions(-)
create mode 100644 drivers/virtio/virtio_dma_buf.c
create mode 100644 include/linux/virtio_dma_buf.h
--
2.28.0.220.ged08abb693-goog
This patchset implements the current proposal for virtio cross-device
resource sharing [1]. It will be used to import virtio resources into
the virtio-video driver currently under discussion [2]. The patch
under consideration to add support in the virtio-video driver is [3].
It uses the APIs from v3 of this series, but the changes to update it
are relatively minor.
This patchset adds a new flavor of dma-bufs that supports querying the
underlying virtio object UUID, as well as adding support for exporting
resources from virtgpu.
[1] https://markmail.org/thread/2ypjt5cfeu3m6lxu
[2] https://markmail.org/thread/p5d3k566srtdtute
[3] https://markmail.org/thread/j4xlqaaim266qpks
v5 -> v6 changes:
- Fix >80 character comment
David Stevens (3):
virtio: add dma-buf support for exported objects
virtio-gpu: add VIRTIO_GPU_F_RESOURCE_UUID feature
drm/virtio: Support virtgpu exported resources
drivers/gpu/drm/virtio/virtgpu_drv.c | 3 +
drivers/gpu/drm/virtio/virtgpu_drv.h | 20 ++++++
drivers/gpu/drm/virtio/virtgpu_kms.c | 4 ++
drivers/gpu/drm/virtio/virtgpu_prime.c | 96 +++++++++++++++++++++++++-
drivers/gpu/drm/virtio/virtgpu_vq.c | 55 +++++++++++++++
drivers/virtio/Makefile | 2 +-
drivers/virtio/virtio.c | 6 ++
drivers/virtio/virtio_dma_buf.c | 82 ++++++++++++++++++++++
include/linux/virtio.h | 1 +
include/linux/virtio_dma_buf.h | 37 ++++++++++
include/uapi/linux/virtio_gpu.h | 19 +++++
11 files changed, 321 insertions(+), 4 deletions(-)
create mode 100644 drivers/virtio/virtio_dma_buf.c
create mode 100644 include/linux/virtio_dma_buf.h
--
2.28.0.220.ged08abb693-goog
Unless there are any remaining objections to these patches, what are
the next steps towards getting these merged? Sorry, I'm not familiar
with the workflow for contributing patches to Linux.
Thanks,
David
On Tue, Jun 9, 2020 at 6:53 PM Michael S. Tsirkin <mst(a)redhat.com> wrote:
>
> On Tue, Jun 09, 2020 at 10:25:15AM +0900, David Stevens wrote:
> > This patchset implements the current proposal for virtio cross-device
> > resource sharing [1]. It will be used to import virtio resources into
> > the virtio-video driver currently under discussion [2]. The patch
> > under consideration to add support in the virtio-video driver is [3].
> > It uses the APIs from v3 of this series, but the changes to update it
> > are relatively minor.
> >
> > This patchset adds a new flavor of dma-bufs that supports querying the
> > underlying virtio object UUID, as well as adding support for exporting
> > resources from virtgpu.
>
> Gerd, David, if possible, please test this in configuration with
> virtual VTD enabled but with iommu_platform=off
> to make sure we didn't break this config.
>
>
> Besides that, for virtio parts:
>
> Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
>
>
> > [1] https://markmail.org/thread/2ypjt5cfeu3m6lxu
> > [2] https://markmail.org/thread/p5d3k566srtdtute
> > [3] https://markmail.org/thread/j4xlqaaim266qpks
> >
> > v4 -> v5 changes:
> > - Remove virtio_dma_buf_export_info.
> >
> > David Stevens (3):
> > virtio: add dma-buf support for exported objects
> > virtio-gpu: add VIRTIO_GPU_F_RESOURCE_UUID feature
> > drm/virtio: Support virtgpu exported resources
> >
> > drivers/gpu/drm/virtio/virtgpu_drv.c | 3 +
> > drivers/gpu/drm/virtio/virtgpu_drv.h | 20 ++++++
> > drivers/gpu/drm/virtio/virtgpu_kms.c | 4 ++
> > drivers/gpu/drm/virtio/virtgpu_prime.c | 96 +++++++++++++++++++++++++-
> > drivers/gpu/drm/virtio/virtgpu_vq.c | 55 +++++++++++++++
> > drivers/virtio/Makefile | 2 +-
> > drivers/virtio/virtio.c | 6 ++
> > drivers/virtio/virtio_dma_buf.c | 82 ++++++++++++++++++++++
> > include/linux/virtio.h | 1 +
> > include/linux/virtio_dma_buf.h | 37 ++++++++++
> > include/uapi/linux/virtio_gpu.h | 19 +++++
> > 11 files changed, 321 insertions(+), 4 deletions(-)
> > create mode 100644 drivers/virtio/virtio_dma_buf.c
> > create mode 100644 include/linux/virtio_dma_buf.h
> >
> > --
> > 2.27.0.278.ge193c7cf3a9-goog
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe(a)lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help(a)lists.oasis-open.org
>
This is an RFC because I'm still trying to grok the correct behavior.
Consider a dma_fence_array created two two fence and signal_on_any is true.
A reference to dma_fence_array is taken for each waiting fence.
When the client calls dma_fence_wait() only one of the fences is signaled.
The client returns successfully from the wait and puts it's reference to
the array fence but the array fence still remains because of the remaining
un-signaled fence.
Now consider that the unsignaled fence is signaled while the timeline is being
destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
following sequence occurs:
1) dma_fence_array_cb_func is called
2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
callback function calls dma_fence_put() instead of triggering the irq work
3) The array fence is released which in turn puts the lingering fence which is
then released
4) deadlock with the timeline
I think that we can fix this with the attached patch. Once the fence is
signaled signaling it again in the irq worker shouldn't hurt anything. The only
gotcha might be how the error is propagated - I wasn't quite sure the intent of
clearing it only after getting to the irq worker.
Signed-off-by: Jordan Crouse <jcrouse(a)codeaurora.org>
---
drivers/dma-buf/dma-fence-array.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be94..b8829b024255 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk)
{
struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
- dma_fence_array_clear_pending_error(array);
-
dma_fence_signal(&array->base);
dma_fence_put(&array->base);
}
@@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
dma_fence_array_set_pending_error(array, f->error);
- if (atomic_dec_and_test(&array->num_pending))
- irq_work_queue(&array->work);
- else
- dma_fence_put(&array->base);
+ if (!atomic_dec_and_test(&array->num_pending))
+ dma_fence_array_set_pending_error(array, f->error);
+
+ irq_work_queue(&array->work);
}
static bool dma_fence_array_enable_signaling(struct dma_fence *fence)
--
2.25.1
With commit 72b6ede73623 ("dma-buf.rst: Document why indefinite fences are
a bad idea"), document generation warns:
Documentation/driver-api/dma-buf.rst:182: \
WARNING: Title underline too short.
Repair length of title underline to remove warning.
Fixes: 72b6ede73623 ("dma-buf.rst: Document why indefinite fences are a bad idea")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
---
Daniel, please pick this minor non-urgent fix to your new documentation.
Documentation/driver-api/dma-buf.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 100bfd227265..13ea0cc0a3fa 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -179,7 +179,7 @@ DMA Fence uABI/Sync File
:internal:
Indefinite DMA Fences
-~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~
At various times &dma_fence with an indefinite time until dma_fence_wait()
finishes have been proposed. Examples include:
--
2.17.1
On Tue, Jul 21, 2020 at 06:16:47PM +0800, Xin He wrote:
> From: Qi Liu <liuqi.16(a)bytedance.com>
>
> We should put the reference count of the fence after calling
> virtio_gpu_cmd_submit(). So add the missing dma_fence_put().
> virtio_gpu_cmd_submit(vgdev, buf, exbuf->size,
> vfpriv->ctx_id, buflist, out_fence);
> + dma_fence_put(&out_fence->f);
> virtio_gpu_notify(vgdev);
Pushed to drm-misc-fixes.
thanks,
Gerd
On Mon, Aug 03, 2020 at 08:30:59PM -0400, Joel Fernandes wrote:
> On Mon, Aug 3, 2020 at 10:47 AM 'Kalesh Singh' via kernel-team
> <kernel-team(a)android.com> wrote:
> >
> > Provides a per process hook for the acquisition of file descriptors,
> > despite the method used to obtain the descriptor.
> >
>
> Hi,
> So apart from all of the comments received, I think it is hard to
> understand what the problem is, what the front-end looks like etc.
> Your commit message is 1 line only.
>
> I do remember some of the challenges discussed before, but it would
> describe the problem in the commit message in detail and then discuss
> why this solution is fit. Please read submitting-patches.rst
> especially "2) Describe your changes".
>
> thanks,
>
> - Joel
Thanks for the advice Joel :)
>
>
> > Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
> > ---
> > Documentation/filesystems/vfs.rst | 5 +++++
> > fs/file.c | 3 +++
> > include/linux/fs.h | 1 +
> > 3 files changed, 9 insertions(+)
> >
> > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> > index ed17771c212b..95b30142c8d9 100644
> > --- a/Documentation/filesystems/vfs.rst
> > +++ b/Documentation/filesystems/vfs.rst
> > @@ -1123,6 +1123,11 @@ otherwise noted.
> > ``fadvise``
> > possibly called by the fadvise64() system call.
> >
> > +``fd_install``
> > + called by the VFS when a file descriptor is installed in the
> > + process's file descriptor table, regardless how the file descriptor
> > + was acquired -- be it via the open syscall, received over IPC, etc.
> > +
> > Note that the file operations are implemented by the specific
> > filesystem in which the inode resides. When opening a device node
> > (character or block special) most filesystems will call special
> > diff --git a/fs/file.c b/fs/file.c
> > index abb8b7081d7a..f5db8622b851 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -616,6 +616,9 @@ void __fd_install(struct files_struct *files, unsigned int fd,
> > void fd_install(unsigned int fd, struct file *file)
> > {
> > __fd_install(current->files, fd, file);
> > +
> > + if (file->f_op->fd_install)
> > + file->f_op->fd_install(fd, file);
> > }
> >
> > EXPORT_SYMBOL(fd_install);
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index f5abba86107d..b976fbe8c902 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1864,6 +1864,7 @@ struct file_operations {
> > struct file *file_out, loff_t pos_out,
> > loff_t len, unsigned int remap_flags);
> > int (*fadvise)(struct file *, loff_t, loff_t, int);
> > + void (*fd_install)(int, struct file *);
> > } __randomize_layout;
> >
> > struct inode_operations {
> > --
> > 2.28.0.163.g6104cc2f0b6-goog
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe(a)android.com.
> >
On Thu, Jul 16, 2020 at 6:10 PM Kunihiko Hayashi
<hayashi.kunihiko(a)socionext.com> wrote:
>
> Current dma-buf heaps can handle only default CMA. This introduces
> dma_heap_add_cma() function to attach CMA heaps that belongs to a device.
>
> At first, the driver calls of_reserved_mem_device_init() to set
> memory-region property associated with reserved-memory defined as CMA
> to the device. And when the driver calls this dma_heap_add_cma(),
> the CMA will be added to dma-buf heaps.
>
> For example, prepare CMA node named "linux,cma@10000000" and
> specify the node for memory-region property. After the above calls
> in the driver, a device file "/dev/dma_heap/linux,cma@10000000"
> associated with the CMA become available as dma-buf heaps.
>
> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
> ---
> drivers/dma-buf/heaps/cma_heap.c | 12 ++++++++++++
> include/linux/dma-heap.h | 9 +++++++++
> 2 files changed, 21 insertions(+)
Hey! Sorry for the slow response on this! I just realized I never replied!
> diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
> index 626cf7f..5d2442e 100644
> --- a/drivers/dma-buf/heaps/cma_heap.c
> +++ b/drivers/dma-buf/heaps/cma_heap.c
> @@ -162,6 +162,18 @@ static int __add_cma_heap(struct cma *cma, void *data)
> return 0;
> }
>
> +/* add device CMA heap to dmabuf heaps */
> +int dma_heap_add_cma(struct device *dev)
> +{
> + struct cma *cma = dev_get_cma_area(dev);
> +
> + if (!cma)
> + return -ENOMEM;
> +
> + return __add_cma_heap(cma, NULL);
> +}
> +EXPORT_SYMBOL_GPL(dma_heap_add_cma);
> +
> static int add_default_cma_heap(void)
> {
> struct cma *default_cma = dev_get_cma_area(NULL);
> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> index 454e354..16bec7d 100644
> --- a/include/linux/dma-heap.h
> +++ b/include/linux/dma-heap.h
> @@ -56,4 +56,13 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
> */
> struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info);
>
> +#ifdef CONFIG_DMABUF_HEAPS_CMA
> +/**
> + * dma_heap_add_cma - adds a device CMA heap to dmabuf heaps
> + * @dev: device with a CMA heap to register
> + */
> +int dma_heap_add_cma(struct device *dev);
> +
> +#endif /* CONFIG_DMABUF_HEAPS_CMA */
> +
> #endif /* _DMA_HEAPS_H */
> --
> 2.7.4
Looks sane to me. Being able to expose different multiple CMA heaps
is needed, and I agree this way (as opposed to my earlier dts
appraoch) is probably the best approach. The only bit I'm so-so on is
adding the CMA heap specific call in the dma-heap.h, but at the same
time I can't justify adding a whole new header for a single function.
Do you have a upstream driver that you plan to make use this new call?
We want to have in-tree users of code added.
But if so, feel free to add my:
Acked-by: John Stultz <john.stultz(a)linaro.org>
To this patch when you submit the driver changes.
thanks
-john
Trying to grab dma_resv_lock while in commit_tail before we've done
all the code that leads to the eventual signalling of the vblank event
(which can be a dma_fence) is deadlock-y. Don't do that.
Here the solution is easy because just grabbing locks to read
something races anyway. We don't need to bother, READ_ONCE is
equivalent. And avoids the locking issue.
v2: Also take into account tmz_surface boolean, plus just delete the
old code.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
DC-folks, I think this split out patch from my series here
https://lore.kernel.org/dri-devel/20200707201229.472834-1-daniel.vetter@ffw…
should be ready for review/merging. I fixed it up a bit so that it's not
just a gross hack :-)
Cheers, Daniel
---
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 21ec64fe5527..a20b62b1f2ef 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6959,20 +6959,13 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
DRM_ERROR("Waiting for fences timed out!");
/*
- * TODO This might fail and hence better not used, wait
- * explicitly on fences instead
- * and in general should be called for
- * blocking commit to as per framework helpers
+ * We cannot reserve buffers here, which means the normal flag
+ * access functions don't work. Paper over this with READ_ONCE,
+ * but maybe the flags are invariant enough that not even that
+ * would be needed.
*/
- r = amdgpu_bo_reserve(abo, true);
- if (unlikely(r != 0))
- DRM_ERROR("failed to reserve buffer before flip\n");
-
- amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
-
- tmz_surface = amdgpu_bo_encrypted(abo);
-
- amdgpu_bo_unreserve(abo);
+ tiling_flags = READ_ONCE(abo->tiling_flags);
+ tmz_surface = READ_ONCE(abo->flags) & AMDGPU_GEM_CREATE_ENCRYPTED;
fill_dc_plane_info_and_addr(
dm->adev, new_plane_state, tiling_flags,
--
2.27.0
Fix W=1 compile warnings (invalid kerneldoc):
drivers/dma-buf/dma-buf.c:328: warning: Function parameter or member 'dmabuf' not described in 'dma_buf_set_name'
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/dma-buf/dma-buf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 1699a8e309ef..58564d82a3a2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -316,9 +316,9 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
* name of the dma-buf if the same piece of memory is used for multiple
* purpose between different devices.
*
- * @dmabuf [in] dmabuf buffer that will be renamed.
- * @buf: [in] A piece of userspace memory that contains the name of
- * the dma-buf.
+ * @dmabuf: [in] dmabuf buffer that will be renamed.
+ * @buf: [in] A piece of userspace memory that contains the name of
+ * the dma-buf.
*
* Returns 0 on success. If the dma-buf buffer is already attached to
* devices, return -EBUSY.
--
2.17.1
Am 22.07.2020 18:04 schrieb Wei Yongjun <weiyongjun1(a)huawei.com>:
The sparse tool complains as follows:
drivers/dma-buf/dma-fence.c:249:25: warning:
symbol 'dma_fence_lockdep_map' was not declared. Should it be static?
This variable is not used outside of dma-fence.c, so this commit
marks it static.
Fixes: 5fbff813a4a3 ("dma-fence: basic lockdep annotations")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
---
drivers/dma-buf/dma-fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index af1d8ea926b3..43624b4ee13d 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
* concerned.
*/
#ifdef CONFIG_LOCKDEP
-struct lockdep_map dma_fence_lockdep_map = {
+static struct lockdep_map dma_fence_lockdep_map = {
.name = "dma_fence_map"
};
Comes up every few years, gets somewhat tedious to discuss, let's
write this down once and for all.
What I'm not sure about is whether the text should be more explicit in
flat out mandating the amdkfd eviction fences for long running compute
workloads or workloads where userspace fencing is allowed.
v2: Now with dot graph!
Cc: Jesse Natalie <jenatali(a)microsoft.com>
Cc: Steve Pronovost <spronovo(a)microsoft.com>
Cc: Jason Ekstrand <jason(a)jlekstrand.net>
Cc: Felix Kuehling <Felix.Kuehling(a)amd.com>
Cc: Mika Kuoppala <mika.kuoppala(a)intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom(a)intel.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
Documentation/driver-api/dma-buf.rst | 70 ++++++++++++++++++++++++
drivers/gpu/drm/virtio/virtgpu_display.c | 20 -------
2 files changed, 70 insertions(+), 20 deletions(-)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index f8f6decde359..037ba0078bb4 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -178,3 +178,73 @@ DMA Fence uABI/Sync File
.. kernel-doc:: include/linux/sync_file.h
:internal:
+Idefinite DMA Fences
+~~~~~~~~~~~~~~~~~~~~
+
+At various times &dma_fence with an indefinite time until dma_fence_wait()
+finishes have been proposed. Examples include:
+
+* Future fences, used in HWC1 to signal when a buffer isn't used by the display
+ any longer, and created with the screen update that makes the buffer visible.
+ The time this fence completes is entirely under userspace's control.
+
+* Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet
+ been set. Used to asynchronously delay command submission.
+
+* Userspace fences or gpu futexes, fine-grained locking within a command buffer
+ that userspace uses for synchronization across engines or with the CPU, which
+ are then imported as a DMA fence for integration into existing winsys
+ protocols.
+
+* Long-running compute command buffers, while still using traditional end of
+ batch DMA fences for memory management instead of context preemption DMA
+ fences which get reattached when the compute job is rescheduled.
+
+Common to all these schemes is that userspace controls the dependencies of these
+fences and controls when they fire. Mixing indefinite fences with normal
+in-kernel DMA fences does not work, even when a fallback timeout is included to
+protect against malicious userspace:
+
+* Only the kernel knows about all DMA fence dependencies, userspace is not aware
+ of dependencies injected due to memory management or scheduler decisions.
+
+* Only userspace knows about all dependencies in indefinite fences and when
+ exactly they will complete, the kernel has no visibility.
+
+Furthermore the kernel has to be able to hold up userspace command submission
+for memory management needs, which means we must support indefinite fences being
+dependent upon DMA fences. If the kernel also support indefinite fences in the
+kernel like a DMA fence, like any of the above proposal would, there is the
+potential for deadlocks.
+
+.. kernel-render:: DOT
+ :alt: Indefinite Fencing Dependency Cycle
+ :caption: Indefinite Fencing Dependency Cycle
+
+ digraph "Fencing Cycle" {
+ node [shape=box bgcolor=grey style=filled]
+ kernel [label="Kernel DMA Fences"]
+ userspace [label="userspace controlled fences"]
+ kernel -> userspace [label="memory management"]
+ userspace -> kernel [label="Future fence, fence proxy, ..."]
+
+ { rank=same; kernel userspace }
+ }
+
+This means that the kernel might accidentally create deadlocks
+through memory management dependencies which userspace is unaware of, which
+randomly hangs workloads until the timeout kicks in. Workloads, which from
+userspace's perspective, do not contain a deadlock. In such a mixed fencing
+architecture there is no single entity with knowledge of all dependencies.
+Thefore preventing such deadlocks from within the kernel is not possible.
+
+The only solution to avoid dependencies loops is by not allowing indefinite
+fences in the kernel. This means:
+
+* No future fences, proxy fences or userspace fences imported as DMA fences,
+ with or without a timeout.
+
+* No DMA fences that signal end of batchbuffer for command submission where
+ userspace is allowed to use userspace fencing or long running compute
+ workloads. This also means no implicit fencing for shared buffers in these
+ cases.
diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index f3ce49c5a34c..af55b334be2f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -314,25 +314,6 @@ virtio_gpu_user_framebuffer_create(struct drm_device *dev,
return &virtio_gpu_fb->base;
}
-static void vgdev_atomic_commit_tail(struct drm_atomic_state *state)
-{
- struct drm_device *dev = state->dev;
-
- drm_atomic_helper_commit_modeset_disables(dev, state);
- drm_atomic_helper_commit_modeset_enables(dev, state);
- drm_atomic_helper_commit_planes(dev, state, 0);
-
- drm_atomic_helper_fake_vblank(state);
- drm_atomic_helper_commit_hw_done(state);
-
- drm_atomic_helper_wait_for_vblanks(dev, state);
- drm_atomic_helper_cleanup_planes(dev, state);
-}
-
-static const struct drm_mode_config_helper_funcs virtio_mode_config_helpers = {
- .atomic_commit_tail = vgdev_atomic_commit_tail,
-};
-
static const struct drm_mode_config_funcs virtio_gpu_mode_funcs = {
.fb_create = virtio_gpu_user_framebuffer_create,
.atomic_check = drm_atomic_helper_check,
@@ -346,7 +327,6 @@ void virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev)
drm_mode_config_init(vgdev->ddev);
vgdev->ddev->mode_config.quirk_addfb_prefer_host_byte_order = true;
vgdev->ddev->mode_config.funcs = &virtio_gpu_mode_funcs;
- vgdev->ddev->mode_config.helper_private = &virtio_mode_config_helpers;
/* modes will be validated against the framebuffer size */
vgdev->ddev->mode_config.min_width = XRES_MIN;
--
2.27.0
My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_fence_emit+0x30/0x330 [amdgpu]
amdgpu_ib_schedule+0x306/0x550 [amdgpu]
amdgpu_job_run+0x10f/0x260 [amdgpu]
drm_sched_main+0x1b9/0x490 [gpu_sched]
kthread+0x12e/0x150
Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.
I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.
Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.
v2: Two more allocations in scheduler paths.
Frist one:
__kmalloc+0x58/0x720
amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Second one:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_sync_fence+0x7e/0x110 [amdgpu]
amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 8d84975885cd..a089a827fdfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
uint32_t seq;
int r;
- fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+ fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
if (fence == NULL)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 267fa45ddb66..a333ca2d4ddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
return amdgpu_sync_fence(sync, ring->vmid_wait);
- fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+ fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
if (!fences)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 8ea6c49529e7..af22b526cec9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
if (amdgpu_sync_add_later(sync, f))
return 0;
- e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+ e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
if (!e)
return -ENOMEM;
--
2.27.0
Design is similar to the lockdep annotations for workers, but with
some twists:
- We use a read-lock for the execution/worker/completion side, so that
this explicit annotation can be more liberally sprinkled around.
With read locks lockdep isn't going to complain if the read-side
isn't nested the same way under all circumstances, so ABBA deadlocks
are ok. Which they are, since this is an annotation only.
- We're using non-recursive lockdep read lock mode, since in recursive
read lock mode lockdep does not catch read side hazards. And we
_very_ much want read side hazards to be caught. For full details of
this limitation see
commit e91498589746065e3ae95d9a00b068e525eec34f
Author: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed Aug 23 13:13:11 2017 +0200
locking/lockdep/selftests: Add mixed read-write ABBA tests
- To allow nesting of the read-side explicit annotations we explicitly
keep track of the nesting. lock_is_held() allows us to do that.
- The wait-side annotation is a write lock, and entirely done within
dma_fence_wait() for everyone by default.
- To be able to freely annotate helper functions I want to make it ok
to call dma_fence_begin/end_signalling from soft/hardirq context.
First attempt was using the hardirq locking context for the write
side in lockdep, but this forces all normal spinlocks nested within
dma_fence_begin/end_signalling to be spinlocks. That bollocks.
The approach now is to simple check in_atomic(), and for these cases
entirely rely on the might_sleep() check in dma_fence_wait(). That
will catch any wrong nesting against spinlocks from soft/hardirq
contexts.
The idea here is that every code path that's critical for eventually
signalling a dma_fence should be annotated with
dma_fence_begin/end_signalling. The annotation ideally starts right
after a dma_fence is published (added to a dma_resv, exposed as a
sync_file fd, attached to a drm_syncobj fd, or anything else that
makes the dma_fence visible to other kernel threads), up to and
including the dma_fence_wait(). Examples are irq handlers, the
scheduler rt threads, the tail of execbuf (after the corresponding
fences are visible), any workers that end up signalling dma_fences and
really anything else. Not annotated should be code paths that only
complete fences opportunistically as the gpu progresses, like e.g.
shrinker/eviction code.
The main class of deadlocks this is supposed to catch are:
Thread A:
mutex_lock(A);
mutex_unlock(A);
dma_fence_signal();
Thread B:
mutex_lock(A);
dma_fence_wait();
mutex_unlock(A);
Thread B is blocked on A signalling the fence, but A never gets around
to that because it cannot acquire the lock A.
Note that dma_fence_wait() is allowed to be nested within
dma_fence_begin/end_signalling sections. To allow this to happen the
read lock needs to be upgraded to a write lock, which means that any
other lock is acquired between the dma_fence_begin_signalling() call and
the call to dma_fence_wait(), and still held, this will result in an
immediate lockdep complaint. The only other option would be to not
annotate such calls, defeating the point. Therefore these annotations
cannot be sprinkled over the code entirely mindless to avoid false
positives.
Originally I hope that the cross-release lockdep extensions would
alleviate the need for explicit annotations:
https://lwn.net/Articles/709849/
But there's a few reasons why that's not an option:
- It's not happening in upstream, since it got reverted due to too
many false positives:
commit e966eaeeb623f09975ef362c2866fae6f86844f9
Author: Ingo Molnar <mingo(a)kernel.org>
Date: Tue Dec 12 12:31:16 2017 +0100
locking/lockdep: Remove the cross-release locking checks
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.
- cross-release uses the complete() call to annotate the end of
critical sections, for dma_fence that would be dma_fence_signal().
But we do not want all dma_fence_signal() calls to be treated as
critical, since many are opportunistic cleanup of gpu requests. If
these get stuck there's still the main completion interrupt and
workers who can unblock everyone. Automatically annotating all
dma_fence_signal() calls would hence cause false positives.
- cross-release had some educated guesses for when a critical section
starts, like fresh syscall or fresh work callback. This would again
cause false positives without explicit annotations, since for
dma_fence the critical sections only starts when we publish a fence.
- Furthermore there can be cases where a thread never does a
dma_fence_signal, but is still critical for reaching completion of
fences. One example would be a scheduler kthread which picks up jobs
and pushes them into hardware, where the interrupt handler or
another completion thread calls dma_fence_signal(). But if the
scheduler thread hangs, then all the fences hang, hence we need to
manually annotate it. cross-release aimed to solve this by chaining
cross-release dependencies, but the dependency from scheduler thread
to the completion interrupt handler goes through hw where
cross-release code can't observe it.
In short, without manual annotations and careful review of the start
and end of critical sections, cross-relese dependency tracking doesn't
work. We need explicit annotations.
v2: handle soft/hardirq ctx better against write side and dont forget
EXPORT_SYMBOL, drivers can't use this otherwise.
v3: Kerneldoc.
v4: Some spelling fixes from Mika
v5: Amend commit message to explain in detail why cross-release isn't
the solution.
v6: Pull out misplaced .rst hunk.
Cc: Felix Kuehling <Felix.Kuehling(a)amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom(a)intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala(a)intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom(a)intel.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
Documentation/driver-api/dma-buf.rst | 6 +
drivers/dma-buf/dma-fence.c | 161 +++++++++++++++++++++++++++
include/linux/dma-fence.h | 12 ++
3 files changed, 179 insertions(+)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 7fb7b661febd..05d856131140 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -133,6 +133,12 @@ DMA Fences
.. kernel-doc:: drivers/dma-buf/dma-fence.c
:doc: DMA fences overview
+DMA Fence Signalling Annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: drivers/dma-buf/dma-fence.c
+ :doc: fence signalling annotation
+
DMA Fences Functions Reference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 656e9ac2d028..0005bc002529 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -110,6 +110,160 @@ u64 dma_fence_context_alloc(unsigned num)
}
EXPORT_SYMBOL(dma_fence_context_alloc);
+/**
+ * DOC: fence signalling annotation
+ *
+ * Proving correctness of all the kernel code around &dma_fence through code
+ * review and testing is tricky for a few reasons:
+ *
+ * * It is a cross-driver contract, and therefore all drivers must follow the
+ * same rules for lock nesting order, calling contexts for various functions
+ * and anything else significant for in-kernel interfaces. But it is also
+ * impossible to test all drivers in a single machine, hence brute-force N vs.
+ * N testing of all combinations is impossible. Even just limiting to the
+ * possible combinations is infeasible.
+ *
+ * * There is an enormous amount of driver code involved. For render drivers
+ * there's the tail of command submission, after fences are published,
+ * scheduler code, interrupt and workers to process job completion,
+ * and timeout, gpu reset and gpu hang recovery code. Plus for integration
+ * with core mm with have &mmu_notifier, respectively &mmu_interval_notifier,
+ * and &shrinker. For modesetting drivers there's the commit tail functions
+ * between when fences for an atomic modeset are published, and when the
+ * corresponding vblank completes, including any interrupt processing and
+ * related workers. Auditing all that code, across all drivers, is not
+ * feasible.
+ *
+ * * Due to how many other subsystems are involved and the locking hierarchies
+ * this pulls in there is extremely thin wiggle-room for driver-specific
+ * differences. &dma_fence interacts with almost all of the core memory
+ * handling through page fault handlers via &dma_resv, dma_resv_lock() and
+ * dma_resv_unlock(). On the other side it also interacts through all
+ * allocation sites through &mmu_notifier and &shrinker.
+ *
+ * Furthermore lockdep does not handle cross-release dependencies, which means
+ * any deadlocks between dma_fence_wait() and dma_fence_signal() can't be caught
+ * at runtime with some quick testing. The simplest example is one thread
+ * waiting on a &dma_fence while holding a lock::
+ *
+ * lock(A);
+ * dma_fence_wait(B);
+ * unlock(A);
+ *
+ * while the other thread is stuck trying to acquire the same lock, which
+ * prevents it from signalling the fence the previous thread is stuck waiting
+ * on::
+ *
+ * lock(A);
+ * unlock(A);
+ * dma_fence_signal(B);
+ *
+ * By manually annotating all code relevant to signalling a &dma_fence we can
+ * teach lockdep about these dependencies, which also helps with the validation
+ * headache since now lockdep can check all the rules for us::
+ *
+ * cookie = dma_fence_begin_signalling();
+ * lock(A);
+ * unlock(A);
+ * dma_fence_signal(B);
+ * dma_fence_end_signalling(cookie);
+ *
+ * For using dma_fence_begin_signalling() and dma_fence_end_signalling() to
+ * annotate critical sections the following rules need to be observed:
+ *
+ * * All code necessary to complete a &dma_fence must be annotated, from the
+ * point where a fence is accessible to other threads, to the point where
+ * dma_fence_signal() is called. Un-annotated code can contain deadlock issues,
+ * and due to the very strict rules and many corner cases it is infeasible to
+ * catch these just with review or normal stress testing.
+ *
+ * * &struct dma_resv deserves a special note, since the readers are only
+ * protected by rcu. This means the signalling critical section starts as soon
+ * as the new fences are installed, even before dma_resv_unlock() is called.
+ *
+ * * The only exception are fast paths and opportunistic signalling code, which
+ * calls dma_fence_signal() purely as an optimization, but is not required to
+ * guarantee completion of a &dma_fence. The usual example is a wait IOCTL
+ * which calls dma_fence_signal(), while the mandatory completion path goes
+ * through a hardware interrupt and possible job completion worker.
+ *
+ * * To aid composability of code, the annotations can be freely nested, as long
+ * as the overall locking hierarchy is consistent. The annotations also work
+ * both in interrupt and process context. Due to implementation details this
+ * requires that callers pass an opaque cookie from
+ * dma_fence_begin_signalling() to dma_fence_end_signalling().
+ *
+ * * Validation against the cross driver contract is implemented by priming
+ * lockdep with the relevant hierarchy at boot-up. This means even just
+ * testing with a single device is enough to validate a driver, at least as
+ * far as deadlocks with dma_fence_wait() against dma_fence_signal() are
+ * concerned.
+ */
+#ifdef CONFIG_LOCKDEP
+struct lockdep_map dma_fence_lockdep_map = {
+ .name = "dma_fence_map"
+};
+
+/**
+ * dma_fence_begin_signalling - begin a critical DMA fence signalling section
+ *
+ * Drivers should use this to annotate the beginning of any code section
+ * required to eventually complete &dma_fence by calling dma_fence_signal().
+ *
+ * The end of these critical sections are annotated with
+ * dma_fence_end_signalling().
+ *
+ * Returns:
+ *
+ * Opaque cookie needed by the implementation, which needs to be passed to
+ * dma_fence_end_signalling().
+ */
+bool dma_fence_begin_signalling(void)
+{
+ /* explicitly nesting ... */
+ if (lock_is_held_type(&dma_fence_lockdep_map, 1))
+ return true;
+
+ /* rely on might_sleep check for soft/hardirq locks */
+ if (in_atomic())
+ return true;
+
+ /* ... and non-recursive readlock */
+ lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_);
+
+ return false;
+}
+EXPORT_SYMBOL(dma_fence_begin_signalling);
+
+/**
+ * dma_fence_end_signalling - end a critical DMA fence signalling section
+ *
+ * Closes a critical section annotation opened by dma_fence_begin_signalling().
+ */
+void dma_fence_end_signalling(bool cookie)
+{
+ if (cookie)
+ return;
+
+ lock_release(&dma_fence_lockdep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(dma_fence_end_signalling);
+
+void __dma_fence_might_wait(void)
+{
+ bool tmp;
+
+ tmp = lock_is_held_type(&dma_fence_lockdep_map, 1);
+ if (tmp)
+ lock_release(&dma_fence_lockdep_map, _THIS_IP_);
+ lock_map_acquire(&dma_fence_lockdep_map);
+ lock_map_release(&dma_fence_lockdep_map);
+ if (tmp)
+ lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
+}
+#endif
+
+
/**
* dma_fence_signal_locked - signal completion of a fence
* @fence: the fence to signal
@@ -170,14 +324,19 @@ int dma_fence_signal(struct dma_fence *fence)
{
unsigned long flags;
int ret;
+ bool tmp;
if (!fence)
return -EINVAL;
+ tmp = dma_fence_begin_signalling();
+
spin_lock_irqsave(fence->lock, flags);
ret = dma_fence_signal_locked(fence);
spin_unlock_irqrestore(fence->lock, flags);
+ dma_fence_end_signalling(tmp);
+
return ret;
}
EXPORT_SYMBOL(dma_fence_signal);
@@ -210,6 +369,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
might_sleep();
+ __dma_fence_might_wait();
+
trace_dma_fence_wait_start(fence);
if (fence->ops->wait)
ret = fence->ops->wait(fence, intr, timeout);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 3347c54f3a87..3f288f7db2ef 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -357,6 +357,18 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
} while (1);
}
+#ifdef CONFIG_LOCKDEP
+bool dma_fence_begin_signalling(void);
+void dma_fence_end_signalling(bool cookie);
+#else
+static inline bool dma_fence_begin_signalling(void)
+{
+ return true;
+}
+static inline void dma_fence_end_signalling(bool cookie) {}
+static inline void __dma_fence_might_wait(void) {}
+#endif
+
int dma_fence_signal(struct dma_fence *fence);
int dma_fence_signal_locked(struct dma_fence *fence);
signed long dma_fence_default_wait(struct dma_fence *fence,
--
2.27.0