On Fri, Oct 03, 2025 at 10:46:41AM +0900, Byungchul Park wrote:
> On Thu, Oct 02, 2025 at 12:39:31PM +0100, Mark Brown wrote:
> > On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> > > dept needs to notice every entrance from user to kernel mode to treat
> > > every kernel context independently when tracking wait-event dependencies.
> > > Roughly, system call and user oriented fault are the cases.
> > > Make dept aware of the entrances of arm64 and add support
> > > CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.
> > The description of what needs to be tracked probably needs some
> > tightening up here, it's not clear to me for example why exceptions for
> > mops or the vector extensions aren't included here, or what the
> > distinction is with error faults like BTI or GCS not being tracked?
> Thanks for the feedback but I'm afraid I don't get you. Can you explain
> in more detail with example?
Your commit log says we need to track every entrance from user mode to
kernel mode but the code only adds tracking to syscalls and some memory
faults. The exception types listed above (and some others) also result
in entries to the kernel from userspace.
> JFYI, pairs of wait and its event need to be tracked to see if each
> event can be prevented from being reachable by other waits like:
> context X context Y
>
> lock L
> ...
> initiate event A context start toward event A
> ... ...
> wait A // wait for event A and lock L // wait for unlock L and
> // prevent unlock L // prevent event A
> ... ...
> unlock L unlock L
> ...
> event A
> I meant things like this need to be tracked.
I don't think that's at all clear from the above context, and the
handling for some of the above exception types (eg, the vector
extensions) includes taking locks.
On Thu, Oct 02, 2025 at 05:12:28PM +0900, Byungchul Park wrote:
> This document describes the concept and APIs of dept.
>
> Signed-off-by: Byungchul Park <byungchul(a)sk.com>
> ---
> Documentation/dependency/dept.txt | 735 ++++++++++++++++++++++++++
> Documentation/dependency/dept_api.txt | 117 ++++
> 2 files changed, 852 insertions(+)
> create mode 100644 Documentation/dependency/dept.txt
> create mode 100644 Documentation/dependency/dept_api.txt
What about writing dept docs in reST (like the rest of kernel documentation)?
---- >8 ----
diff --git a/Documentation/dependency/dept.txt b/Documentation/locking/dept.rst
similarity index 92%
rename from Documentation/dependency/dept.txt
rename to Documentation/locking/dept.rst
index 5dd358b96734e6..7b90a0d95f0876 100644
--- a/Documentation/dependency/dept.txt
+++ b/Documentation/locking/dept.rst
@@ -8,7 +8,7 @@ How lockdep works
Lockdep detects a deadlock by checking lock acquisition order. For
example, a graph to track acquisition order built by lockdep might look
-like:
+like::
A -> B -
\
@@ -16,12 +16,12 @@ like:
/
C -> D -
- where 'A -> B' means that acquisition A is prior to acquisition B
- with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
Lockdep keeps adding each new acquisition order into the graph in
runtime. For example, 'E -> C' will be added when the two locks have
-been acquired in the order, E and then C. The graph will look like:
+been acquired in the order, E and then C. The graph will look like::
A -> B -
\
@@ -32,10 +32,10 @@ been acquired in the order, E and then C. The graph will look like:
\ /
------------------
- where 'A -> B' means that acquisition A is prior to acquisition B
- with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
-This graph contains a subgraph that demonstrates a loop like:
+This graph contains a subgraph that demonstrates a loop like::
-> E -
/ \
@@ -67,6 +67,8 @@ mechanisms, lockdep doesn't work.
Can lockdep detect the following deadlock?
+::
+
context X context Y context Z
mutex_lock A
@@ -80,6 +82,8 @@ Can lockdep detect the following deadlock?
No. What about the following?
+::
+
context X context Y
mutex_lock A
@@ -101,7 +105,7 @@ What leads a deadlock
---------------------
A deadlock occurs when one or multi contexts are waiting for events that
-will never happen. For example:
+will never happen. For example::
context X context Y context Z
@@ -121,24 +125,24 @@ We call this *deadlock*.
If an event occurrence is a prerequisite to reaching another event, we
call it *dependency*. In this example:
- Event A occurrence is a prerequisite to reaching event C.
- Event C occurrence is a prerequisite to reaching event B.
- Event B occurrence is a prerequisite to reaching event A.
+ * Event A occurrence is a prerequisite to reaching event C.
+ * Event C occurrence is a prerequisite to reaching event B.
+ * Event B occurrence is a prerequisite to reaching event A.
In terms of dependency:
- Event C depends on event A.
- Event B depends on event C.
- Event A depends on event B.
+ * Event C depends on event A.
+ * Event B depends on event C.
+ * Event A depends on event B.
-Dependency graph reflecting this example will look like:
+Dependency graph reflecting this example will look like::
-> C -> A -> B -
/ \
\ /
----------------
- where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
A circular dependency exists. Such a circular dependency leads a
deadlock since no waiters can have desired events triggered.
@@ -152,7 +156,7 @@ Introduce DEPT
--------------
DEPT(DEPendency Tracker) tracks wait and event instead of lock
-acquisition order so as to recognize the following situation:
+acquisition order so as to recognize the following situation::
context X context Y context Z
@@ -165,18 +169,18 @@ acquisition order so as to recognize the following situation:
event A
and builds up a dependency graph in runtime that is similar to lockdep.
-The graph might look like:
+The graph might look like::
-> C -> A -> B -
/ \
\ /
----------------
- where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
DEPT keeps adding each new dependency into the graph in runtime. For
example, 'B -> D' will be added when event D occurrence is a
-prerequisite to reaching event B like:
+prerequisite to reaching event B like::
|
v
@@ -184,7 +188,7 @@ prerequisite to reaching event B like:
.
event B
-After the addition, the graph will look like:
+After the addition, the graph will look like::
-> D
/
@@ -209,6 +213,8 @@ How DEPT works
Let's take a look how DEPT works with the 1st example in the section
'Limitation of lockdep'.
+::
+
context X context Y context Z
mutex_lock A
@@ -220,7 +226,7 @@ Let's take a look how DEPT works with the 1st example in the section
mutex_unlock A
mutex_unlock A
-Adding comments to describe DEPT's view in terms of wait and event:
+Adding comments to describe DEPT's view in terms of wait and event::
context X context Y context Z
@@ -248,7 +254,7 @@ Adding comments to describe DEPT's view in terms of wait and event:
mutex_unlock A
/* event A */
-Adding more supplementary comments to describe DEPT's view in detail:
+Adding more supplementary comments to describe DEPT's view in detail::
context X context Y context Z
@@ -283,7 +289,7 @@ Adding more supplementary comments to describe DEPT's view in detail:
mutex_unlock A
/* event A that's been valid since 4 */
-Let's build up dependency graph with this example. Firstly, context X:
+Let's build up dependency graph with this example. Firstly, context X::
context X
@@ -292,7 +298,7 @@ Let's build up dependency graph with this example. Firstly, context X:
/* start to take into account event B's context */
/* 2 */
-There are no events to create dependency. Next, context Y:
+There are no events to create dependency. Next, context Y::
context Y
@@ -317,13 +323,13 @@ waits between 3 and the event, event B does not create dependency. For
event A, there is a wait, folio_lock B, between 1 and the event. Which
means event A cannot be triggered if event B does not wake up the wait.
Therefore, we can say event A depends on event B, say, 'A -> B'. The
-graph will look like after adding the dependency:
+graph will look like after adding the dependency::
A -> B
- where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
-Lastly, context Z:
+Lastly, context Z::
context Z
@@ -343,7 +349,7 @@ wait, mutex_lock A, between 2 and the event - remind 2 is at a very
start and before the wait in timeline. Which means event B cannot be
triggered if event A does not wake up the wait. Therefore, we can say
event B depends on event A, say, 'B -> A'. The graph will look like
-after adding the dependency:
+after adding the dependency::
-> A -> B -
/ \
@@ -367,6 +373,8 @@ Interpret DEPT report
The following is the example in the section 'How DEPT works'.
+::
+
context X context Y context Z
mutex_lock A
@@ -402,7 +410,7 @@ The following is the example in the section 'How DEPT works'.
We can Simplify this by replacing each waiting point with [W], each
point where its event's context starts with [S] and each event with [E].
-This example will look like after the replacement:
+This example will look like after the replacement::
context X context Y context Z
@@ -419,6 +427,8 @@ This example will look like after the replacement:
DEPT uses the symbols [W], [S] and [E] in its report as described above.
The following is an example reported by DEPT for a real problem.
+::
+
Link: https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SA…
Link: https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.pa…
@@ -620,6 +630,8 @@ The following is an example reported by DEPT for a real problem.
Let's take a look at the summary that is the most important part.
+::
+
---------------------------------------------------
summary
---------------------------------------------------
@@ -639,7 +651,7 @@ Let's take a look at the summary that is the most important part.
[W]: the wait blocked
[E]: the event not reachable
-The summary shows the following scenario:
+The summary shows the following scenario::
context A context B context ?(unknown)
@@ -652,7 +664,7 @@ The summary shows the following scenario:
[E] unlock(&ni->ni_lock:0)
-Adding supplementary comments to describe DEPT's view in detail:
+Adding supplementary comments to describe DEPT's view in detail::
context A context B context ?(unknown)
@@ -677,7 +689,7 @@ Adding supplementary comments to describe DEPT's view in detail:
[E] unlock(&ni->ni_lock:0)
/* event that's been valid since 2 */
-Let's build up dependency graph with this report. Firstly, context A:
+Let's build up dependency graph with this report. Firstly, context A::
context A
@@ -697,13 +709,13 @@ wait, folio_lock(&f1), between 2 and the event. Which means
unlock(&ni->ni_lock:0) is not reachable if folio_unlock(&f1) does not
wake up the wait. Therefore, we can say unlock(&ni->ni_lock:0) depends
on folio_unlock(&f1), say, 'unlock(&ni->ni_lock:0) -> folio_unlock(&f1)'.
-The graph will look like after adding the dependency:
+The graph will look like after adding the dependency::
unlock(&ni->ni_lock:0) -> folio_unlock(&f1)
- where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
-Secondly, context B:
+Secondly, context B::
context B
@@ -719,14 +731,14 @@ very start and before the wait in timeline. Which means folio_unlock(&f1)
is not reachable if unlock(&ni->ni_lock:0) does not wake up the wait.
Therefore, we can say folio_unlock(&f1) depends on unlock(&ni->ni_lock:0),
say, 'folio_unlock(&f1) -> unlock(&ni->ni_lock:0)'. The graph will look
-like after adding the dependency:
+like after adding the dependency::
-> unlock(&ni->ni_lock:0) -> folio_unlock(&f1) -
/ \
\ /
------------------------------------------------
- where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
A new loop has been created. So DEPT can report it as a deadlock! Cool!
diff --git a/Documentation/dependency/dept_api.txt b/Documentation/locking/dept_api.rst
similarity index 97%
rename from Documentation/dependency/dept_api.txt
rename to Documentation/locking/dept_api.rst
index 8e0d5a118a460e..96c4d65f4a9a2d 100644
--- a/Documentation/dependency/dept_api.txt
+++ b/Documentation/locking/dept_api.rst
@@ -10,6 +10,8 @@ already applied into the existing synchronization primitives e.g.
waitqueue, swait, wait_for_completion(), dma fence and so on. The basic
APIs of SDT are:
+.. code-block:: c
+
/*
* After defining 'struct dept_map map', initialize the instance.
*/
@@ -27,6 +29,8 @@ APIs of SDT are:
The advanced APIs of SDT are:
+.. code-block:: c
+
/*
* After defining 'struct dept_map map', initialize the instance
* using an external key.
@@ -83,6 +87,8 @@ Do not use these APIs directly. These are the wrappers for typical
locks, that have been already applied into major locks internally e.g.
spin lock, mutex, rwlock and so on. The APIs of LDT are:
+.. code-block:: c
+
ldt_init(map, key, sub, name);
ldt_lock(map, sub_local, try, nest, ip);
ldt_rlock(map, sub_local, try, nest, ip, queued);
@@ -96,6 +102,8 @@ Raw APIs
--------
Do not use these APIs directly. The raw APIs of dept are:
+.. code-block:: c
+
dept_free_range(start, size);
dept_map_init(map, key, sub, name);
dept_map_reinit(map, key, sub, name);
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 6a9ea96c8bcb70..7ec3dce7fee425 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -24,6 +24,8 @@ Locking
percpu-rw-semaphore
robust-futexes
robust-futex-ABI
+ dept
+ dept_api
.. only:: subproject and html
> +Can lockdep detect the following deadlock?
> +
> + context X context Y context Z
> +
> + mutex_lock A
> + folio_lock B
> + folio_lock B <- DEADLOCK
> + mutex_lock A <- DEADLOCK
> + folio_unlock B
> + folio_unlock B
> + mutex_unlock A
> + mutex_unlock A
> +
> +No. What about the following?
> +
> + context X context Y
> +
> + mutex_lock A
> + mutex_lock A <- DEADLOCK
> + wait_for_complete B <- DEADLOCK
> + complete B
> + mutex_unlock A
> + mutex_unlock A
Can you explain how DEPT detects deadlock on the second example above (like
the first one being described in "How DEPT works" section)?
Confused...
--
An old man doll... just what I always wanted! - Clara
On Thu, Oct 2, 2025 at 12:47 AM Maxime Ripard <mripard(a)redhat.com> wrote:
> On Thu, Sep 11, 2025 at 03:49:43PM +0200, Jens Wiklander wrote:
> > Export the dma-buf heap functions to allow them to be used by the OP-TEE
> > driver. The OP-TEE driver wants to register and manage specific secure
> > DMA heaps with it.
> >
> > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com>
> > Reviewed-by: T.J. Mercier <tjmercier(a)google.com>
> > Acked-by: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
> > ---
> > drivers/dma-buf/dma-heap.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 3cbe87d4a464..8ab49924f8b7 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -11,6 +11,7 @@
> > #include <linux/dma-buf.h>
> > #include <linux/dma-heap.h>
> > #include <linux/err.h>
> > +#include <linux/export.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > #include <linux/syscalls.h>
> > @@ -202,6 +203,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
> > {
> > return heap->priv;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_drvdata, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_get_name - get heap name
> > @@ -214,6 +216,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> > {
> > return heap->name;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_name, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_add - adds a heap to dmabuf heaps
> > @@ -303,6 +306,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > kfree(heap);
> > return err_ret;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_add, "DMA_BUF_HEAP");
>
> It's not clear to me why we would need to export those symbols.
>
> As far as I know, heaps cannot be removed, and compiling them as module
> means that we would be able to remove them.
>
> Now, if we don't expect the users to be compiled as modules, then we
> don't need to export these symbols at all.
>
> Am I missing something?
For things like distro kernels (or in Android's case, the GKI),
there's a benefit for modules that can be loaded permanently (not
having a module_exit hook).
One doesn't have to bloat the base kernel image/memory usage for
everyone, while still not having to necessarily deal with
complications from module unloading issues.
thanks
-john
Hi,
On Thu, Oct 2, 2025 at 9:54 AM Maxime Ripard <mripard(a)redhat.com> wrote:
>
> On Thu, Sep 11, 2025 at 03:49:44PM +0200, Jens Wiklander wrote:
> > +static const char *heap_id_2_name(enum tee_dma_heap_id id)
> > +{
> > + switch (id) {
> > + case TEE_DMA_HEAP_SECURE_VIDEO_PLAY:
> > + return "protected,secure-video";
> > + case TEE_DMA_HEAP_TRUSTED_UI:
> > + return "protected,trusted-ui";
> > + case TEE_DMA_HEAP_SECURE_VIDEO_RECORD:
> > + return "protected,secure-video-record";
> > + default:
> > + return NULL;
> > + }
> > +}
>
> We've recently agreed on a naming guideline (even though it's not merged yet)
>
> https://lore.kernel.org/r/20250728-dma-buf-heap-names-doc-v4-1-f73f71cf0dfd…
I wasn't aware of that (or had forgotten it), but during the revisions
of this patch set, we changed to use "protected".
>
> Secure and trusted should be defined I guess, because secure and
> protected at least seem redundant to me.
Depending on the use case, the protected buffer is only accessible to
a specific set of devices. This is typically configured by the TEE
firmware based on which heap we're using. To distinguish between the
different heaps, I've simply added the name of the use case after the
comma. So the name of the heap for the Trusted-UI use case is
"protected,trusted-ui". What would a heap called "protected,ui"
represent? Protected buffers for a UI use case? What kind of UI use
case? If the name of the heap is too generic, it might cover more than
one use case with conflicting requirements for which devices should be
able to access the protected memory.
Thanks,
Jens
Le jeudi 02 octobre 2025 à 09:47 +0200, Maxime Ripard a écrit :
> Hi,
>
> On Thu, Sep 11, 2025 at 03:49:43PM +0200, Jens Wiklander wrote:
> > Export the dma-buf heap functions to allow them to be used by the OP-TEE
> > driver. The OP-TEE driver wants to register and manage specific secure
> > DMA heaps with it.
> >
> > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com>
> > Reviewed-by: T.J. Mercier <tjmercier(a)google.com>
> > Acked-by: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
> > ---
> > drivers/dma-buf/dma-heap.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 3cbe87d4a464..8ab49924f8b7 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -11,6 +11,7 @@
> > #include <linux/dma-buf.h>
> > #include <linux/dma-heap.h>
> > #include <linux/err.h>
> > +#include <linux/export.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > #include <linux/syscalls.h>
> > @@ -202,6 +203,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
> > {
> > return heap->priv;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_drvdata, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_get_name - get heap name
> > @@ -214,6 +216,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> > {
> > return heap->name;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_name, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_add - adds a heap to dmabuf heaps
> > @@ -303,6 +306,7 @@ struct dma_heap *dma_heap_add(const struct
> > dma_heap_export_info *exp_info)
> > kfree(heap);
> > return err_ret;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_add, "DMA_BUF_HEAP");
>
> It's not clear to me why we would need to export those symbols.
>
> As far as I know, heaps cannot be removed, and compiling them as module
> means that we would be able to remove them.
>
> Now, if we don't expect the users to be compiled as modules, then we
> don't need to export these symbols at all.
Maybe I'm getting out of topic, sorry if its the case, but making that a hard
rule seems very limiting. Didn't we said that a heap driver could be made to
represent memory region on a remote device such as an eGPU ?
Nicolas
>
> Am I missing something?
>
> Maxime
Hi,
On Thu, Oct 2, 2025 at 9:47 AM Maxime Ripard <mripard(a)redhat.com> wrote:
>
> Hi,
>
> On Thu, Sep 11, 2025 at 03:49:43PM +0200, Jens Wiklander wrote:
> > Export the dma-buf heap functions to allow them to be used by the OP-TEE
> > driver. The OP-TEE driver wants to register and manage specific secure
> > DMA heaps with it.
> >
> > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com>
> > Reviewed-by: T.J. Mercier <tjmercier(a)google.com>
> > Acked-by: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
> > ---
> > drivers/dma-buf/dma-heap.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 3cbe87d4a464..8ab49924f8b7 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -11,6 +11,7 @@
> > #include <linux/dma-buf.h>
> > #include <linux/dma-heap.h>
> > #include <linux/err.h>
> > +#include <linux/export.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > #include <linux/syscalls.h>
> > @@ -202,6 +203,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
> > {
> > return heap->priv;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_drvdata, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_get_name - get heap name
> > @@ -214,6 +216,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> > {
> > return heap->name;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_get_name, "DMA_BUF_HEAP");
> >
> > /**
> > * dma_heap_add - adds a heap to dmabuf heaps
> > @@ -303,6 +306,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > kfree(heap);
> > return err_ret;
> > }
> > +EXPORT_SYMBOL_NS_GPL(dma_heap_add, "DMA_BUF_HEAP");
>
> It's not clear to me why we would need to export those symbols.
>
> As far as I know, heaps cannot be removed, and compiling them as module
> means that we would be able to remove them.
>
> Now, if we don't expect the users to be compiled as modules, then we
> don't need to export these symbols at all.
>
> Am I missing something?
In this case, it's the TEE module that _might_ need to instantiate a
DMA heap. Whether it will be instantiated depends on the TEE backend
driver and the TEE firmware. If a heap is instantiated, then it will
not be possible to unload the TEE module. That might not be perfect,
but in my opinion, it's better than other options, such as always
making the TEE subsystem built-in or disabling DMA-heap support when
compiled as a module.
Thanks,
Jens
On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> dept needs to notice every entrance from user to kernel mode to treat
> every kernel context independently when tracking wait-event dependencies.
> Roughly, system call and user oriented fault are the cases.
>
> Make dept aware of the entrances of arm64 and add support
> CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.
The description of what needs to be tracked probably needs some
tightening up here, it's not clear to me for example why exceptions for
mops or the vector extensions aren't included here, or what the
distinction is with error faults like BTI or GCS not being tracked?
On Thu, Oct 2, 2025 at 12:12 PM Guangbo Cui <2407018371(a)qq.com> wrote:
>
> The DEPT patch series changed `wait_for_completion` into a macro.
Thanks!
In general, it is useful to provide a Link: to Lore to the right patch
(i.e. context is good), and please clarify in which tree you found the
issue if any -- I don't see it in linux-next, so I imagine it is not
applied, but "changed" sounds like it was? If it was actually applied,
please also provide a Fixes: tag.
Cheers,
Miguel
Changelog:
v4:
* Split pcim_p2pdma_provider() to two functions, one that initializes
array of providers and another to return right provider pointer.
v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org
* Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
* Cache provider in vfio_pci_dma_buf struct instead of BAR index.
* Removed misleading comment from pcim_p2pdma_provider().
* Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
* Added extra patch which adds new CONFIG, so next patches can reuse it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Add dma-buf export config for MMIO regions
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 177 +++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 56 ++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 398 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 ++
drivers/vfio/vfio_main.c | 2 +
include/linux/pci-p2pdma.h | 120 +++++----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 3 +
include/uapi/linux/vfio.h | 25 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
17 files changed, 750 insertions(+), 122 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.51.0