Hi,
This series is the follow-up of the discussion that John and I had some
time ago here:
https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrr…
The initial problem we were discussing was that I'm currently working on
a platform which has a memory layout with ECC enabled. However, enabling
the ECC has a number of drawbacks on that platform: lower performance,
increased memory usage, etc. So for things like framebuffers, the
trade-off isn't great and thus there's a memory region with ECC disabled
to allocate from for such use cases.
After a suggestion from John, I chose to first start using heap
allocations flags to allow for userspace to ask for a particular ECC
setup. This is then backed by a new heap type that runs from reserved
memory chunks flagged as such, and the existing DT properties to specify
the ECC properties.
After further discussion, it was considered that flags were not the
right solution, and relying on the names of the heaps would be enough to
let userspace know the kind of buffer it deals with.
Thus, even though the uAPI part of it has been dropped in this second
version, we still need a driver to create heaps out of carved-out memory
regions. In addition to the original usecase, a similar driver can be
found in BSPs from most vendors, so I believe it would be a useful
addition to the kernel.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v5:
- Rebased on 6.16-rc2
- Switch from property to dedicated binding
- Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kerne…
Changes in v4:
- Rebased on 6.15-rc7
- Map buffers only when map is actually called, not at allocation time
- Deal with restricted-dma-pool and shared-dma-pool
- Reword Kconfig options
- Properly report dma_map_sgtable failures
- Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kerne…
Changes in v3:
- Reworked global variable patch
- Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kerne…
Changes in v2:
- Add vmap/vunmap operations
- Drop ECC flags uapi
- Rebase on top of 6.14
- Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kerne…
---
Maxime Ripard (2):
dt-bindings: reserved-memory: Introduce carved-out memory region binding
dma-buf: heaps: Introduce a new heap for reserved memory
.../bindings/reserved-memory/carved-out.yaml | 49 +++
drivers/dma-buf/heaps/Kconfig | 8 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/carveout_heap.c | 362 +++++++++++++++++++++
4 files changed, 420 insertions(+)
---
base-commit: d076bed8cb108ba2236d4d49c92303fda4036893
change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
We've discussed a number of times of how some heap names are bad, but
not really what makes a good heap name.
Let's document what we expect the heap names to look like.
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v2:
- Added justifications for each requirement / suggestions
- Added a mention and example of buffer attributes
- Link to v1: https://lore.kernel.org/r/20250520-dma-buf-heap-names-doc-v1-1-ab31f74809ee…
---
Documentation/userspace-api/dma-buf-heaps.rst | 38 +++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 535f49047ce6450796bf4380c989e109355efc05..835ad1c3a65bc07b6f41d387d85c57162909e859 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -21,5 +21,43 @@ following heaps:
usually created either through the kernel commandline through the
`cma` parameter, a memory region Device-Tree node with the
`linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
`CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
might be called ``reserved``, ``linux,cma``, or ``default-pool``.
+
+Naming Convention
+=================
+
+``dma-buf`` heaps name should meet a number of constraints:
+
+- That name must be stable, and must not change from one version to the
+ other. Userspace identifies heaps by their name, so if the names ever
+ changes, we would be likely to introduce regressions.
+
+- That name must describe the memory region the heap will allocate from,
+ and must uniquely identify it in a given platform. Since userspace
+ applications use the heap name as the discriminant, it must be able to
+ tell which heap it wants to use reliably if there's multiple heaps.
+
+- That name must not mention implementation details, such as the
+ allocator. The heap driver will change over time, and implementation
+ details when it was introduced might not be relevant in the future.
+
+- The name should describe properties of the buffers that would be
+ allocated. Doing so will make heap identification easier for
+ userspace. Such properties are:
+
+ - ``cacheable`` / ``uncacheable`` for buffers with CPU caches enabled
+ or disabled;
+
+ - ``contiguous`` for physically contiguous buffers;
+
+ - ``protected`` for encrypted buffers not accessible the OS;
+
+- The name may describe intended usage. Doing so will make heap
+ identification easier for userspace applications and users.
+
+For example, assuming a platform with a reserved memory region located
+at the RAM address 0x42000000, intended to allocate video framebuffers,
+physically contiguous, and backed by the CMA kernel allocator. Good
+names would be ``memory@42000000-cacheable-contiguous`` or
+``video@42000000``, but ``cma-video`` wouldn't.
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
We've discussed a number of times of how some heap names are bad, but
not really what makes a good heap name.
Let's document what we expect the heap names to look like.
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Documentation/userspace-api/dma-buf-heaps.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
index 535f49047ce6450796bf4380c989e109355efc05..b24618e360a9a9ba0bd85135d8c1760776f1a37f 100644
--- a/Documentation/userspace-api/dma-buf-heaps.rst
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -21,5 +21,24 @@ following heaps:
usually created either through the kernel commandline through the
`cma` parameter, a memory region Device-Tree node with the
`linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
`CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
might be called ``reserved``, ``linux,cma``, or ``default-pool``.
+
+Naming Convention
+=================
+
+A good heap name is a name that:
+
+- Is stable, and won't change from one version to the other;
+
+- Describes the memory region the heap will allocate from, and will
+ uniquely identify it in a given platform;
+
+- Doesn't use implementation details, such as the allocator;
+
+- Can describe intended usage.
+
+For example, assuming a platform with a reserved memory region located
+at the RAM address 0x42000000, intended to allocate video framebuffers,
+and backed by the CMA kernel allocator. Good names would be
+`memory@42000000` or `video@42000000`, but `cma-video` wouldn't.
---
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
On Fri, Jun 13, 2025 at 09:33:55AM +0000, wangtao wrote:
>
> >
> > On Mon, Jun 09, 2025 at 09:32:20AM +0000, wangtao wrote:
> > > Are you suggesting adding an ITER_DMABUF type to iov_iter,
> >
> > Yes.
>
> May I clarify: Do all disk operations require data to pass through
> memory (reading into memory or writing from memory)? In the block layer,
> the bio structure uses bio_iov_iter_get_pages to convert iter_type
> objects into memory-backed bio_vec representations.
> However, some dmabufs are not memory-based, making page-to-bio_vec
> conversion impossible. This suggests adding a callback function in
> dma_buf_ops to handle dmabuf- to-bio_vec conversion.
bios do support PCI P2P tranfers. This could be fairly easily extended
to other peer to peer transfers if we manage to come up with a coherent
model for them. No need for a callback.
On Fri, Jun 13, 2025 at 09:43:08AM +0000, wangtao wrote:
> Here's my analysis on Linux 6.6 with F2FS/iomap.
Linux 6.6 is almost two years old and completely irrelevant. Please
provide numbers on 6.16 or current Linus' tree.
On 6/10/25 18:42, Tvrtko Ursulin wrote:
> Dma-fence objects currently suffer from a potential use after free problem
> where fences exported to userspace and other drivers can outlive the
> exporting driver, or the associated data structures.
>
> The discussion on how to address this concluded that adding reference
> counting to all the involved objects is not desirable, since it would need
> to be very wide reaching and could cause unloadable drivers if another
> entity would be holding onto a signaled fence reference potentially
> indefinitely.
>
> This patch enables the safe access by introducing and documenting a
> contract between fence exporters and users. It documents a set of
> contraints and adds helpers which a) drivers with potential to suffer from
> the use after free must use and b) users of the dma-fence API must use as
> well.
>
> Premise of the design has multiple sides:
>
> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
> signalling a fence and freeing the driver private data associated with it.
>
> The grace period does not have to follow the signalling immediately but
> HAS to happen before data is freed.
>
> 2. Users of the dma-fence API marked with such requirement MUST contain
> the complete access to the data within a single code block guarded by
> rcu_read_lock() and rcu_read_unlock().
>
> The combination of the two ensures that whoever sees the
> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
> valid fence->lock and valid data potentially accessed by the fence->ops
> virtual functions, until the call to rcu_read_unlock().
>
> 3. Module unload (fence->ops) disappearing is for now explicitly not
> handled. That would required a more complex protection, possibly needing
> SRCU instead of RCU to handle callers such as dma_fence_release() and
> dma_fence_wait_timeout(), where race between
> dma_fence_enable_sw_signaling, signalling, and dereference of
> fence->ops->wait() would need a sleeping SRCU context.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
> ---
> drivers/dma-buf/dma-fence.c | 111 ++++++++++++++++++++++++++++---
> include/linux/dma-fence.h | 31 ++++++---
> include/trace/events/dma_fence.h | 38 +++++++++--
> 3 files changed, 157 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 74f9e4b665e3..3f78c56b58dc 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -511,12 +511,20 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>
> dma_fence_enable_sw_signaling(fence);
>
> - trace_dma_fence_wait_start(fence);
> + if (trace_dma_fence_wait_start_enabled()) {
> + rcu_read_lock();
> + trace_dma_fence_wait_start(fence);
> + rcu_read_unlock();
> + }
> if (fence->ops->wait)
> ret = fence->ops->wait(fence, intr, timeout);
> else
> ret = dma_fence_default_wait(fence, intr, timeout);
> - trace_dma_fence_wait_end(fence);
> + if (trace_dma_fence_wait_end_enabled()) {
> + rcu_read_lock();
> + trace_dma_fence_wait_end(fence);
> + rcu_read_unlock();
> + }
> return ret;
> }
> EXPORT_SYMBOL(dma_fence_wait_timeout);
> @@ -533,16 +541,23 @@ void dma_fence_release(struct kref *kref)
> struct dma_fence *fence =
> container_of(kref, struct dma_fence, refcount);
>
> + rcu_read_lock();
> trace_dma_fence_destroy(fence);
>
> - if (WARN(!list_empty(&fence->cb_list) &&
> - !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags),
> - "Fence %s:%s:%llx:%llx released with pending signals!\n",
> - dma_fence_driver_name(fence),
> - dma_fence_timeline_name(fence),
> - fence->context, fence->seqno)) {
> + if (!list_empty(&fence->cb_list) &&
> + !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
> + const char __rcu *timeline;
> + const char __rcu *driver;
> unsigned long flags;
>
> + driver = dma_fence_driver_name(fence);
> + timeline = dma_fence_timeline_name(fence);
> +
> + WARN(1,
> + "Fence %s:%s:%llx:%llx released with pending signals!\n",
> + rcu_dereference(driver), rcu_dereference(timeline),
> + fence->context, fence->seqno);
> +
> /*
> * Failed to signal before release, likely a refcounting issue.
> *
> @@ -556,6 +571,8 @@ void dma_fence_release(struct kref *kref)
> spin_unlock_irqrestore(fence->lock, flags);
> }
>
> + rcu_read_unlock();
> +
> if (fence->ops->release)
> fence->ops->release(fence);
> else
> @@ -982,11 +999,21 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
> */
> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> {
> + const char __rcu *timeline;
> + const char __rcu *driver;
> +
> + rcu_read_lock();
> +
> + timeline = dma_fence_timeline_name(fence);
> + driver = dma_fence_driver_name(fence);
> +
> seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> - dma_fence_driver_name(fence),
> - dma_fence_timeline_name(fence),
> + rcu_dereference(driver),
> + rcu_dereference(timeline),
> fence->seqno,
> dma_fence_is_signaled(fence) ? "" : "un");
> +
> + rcu_read_unlock();
> }
> EXPORT_SYMBOL(dma_fence_describe);
>
> @@ -1055,3 +1082,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> BIT(DMA_FENCE_FLAG_SEQNO64_BIT));
> }
> EXPORT_SYMBOL(dma_fence_init64);
> +
> +/**
> + * dma_fence_driver_name - Access the driver name
> + * @fence: the fence to query
> + *
> + * Returns a driver name backing the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &rcu_read_lock and &rcu_read_unlock pair.
> + */
> +const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> +{
> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> + "RCU protection is required for safe access to returned string");
> +
> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + return fence->ops->get_driver_name(fence);
> + else
> + return "detached-driver";
> +}
> +EXPORT_SYMBOL(dma_fence_driver_name);
> +
> +/**
> + * dma_fence_timeline_name - Access the timeline name
> + * @fence: the fence to query
> + *
> + * Returns a timeline name provided by the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &rcu_read_lock and &rcu_read_unlock pair.
> + */
> +const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
> +{
> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> + "RCU protection is required for safe access to returned string");
> +
> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + return fence->ops->get_driver_name(fence);
> + else
> + return "signaled-timeline";
> +}
> +EXPORT_SYMBOL(dma_fence_timeline_name);
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 10a849cb2d3f..64639e104110 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -378,15 +378,28 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
> struct dma_fence_cb *cb);
> void dma_fence_enable_sw_signaling(struct dma_fence *fence);
>
> -static inline const char *dma_fence_driver_name(struct dma_fence *fence)
> -{
> - return fence->ops->get_driver_name(fence);
> -}
> -
> -static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
> -{
> - return fence->ops->get_timeline_name(fence);
> -}
> +/**
> + * DOC: Safe external access to driver provided object members
> + *
> + * All data not stored directly in the dma-fence object, such as the
> + * &dma_fence.lock and memory potentially accessed by functions in the
> + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled
> + * because after that point drivers are allowed to free it.
> + *
> + * All code accessing that data via the dma-fence API (or directly, which is
> + * discouraged), MUST make sure to contain the complete access within a
> + * &rcu_read_lock and &rcu_read_unlock pair.
> + *
> + * Some dma-fence API handles this automatically, while other, as for example
> + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that
> + * responsibility to the caller.
> + *
> + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses
> + * between signalling the fence and freeing the said data.
> + *
> + */
> +const char __rcu *dma_fence_driver_name(struct dma_fence *fence);
> +const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
>
> /**
> * dma_fence_is_signaled_locked - Return an indication if the fence
> diff --git a/include/trace/events/dma_fence.h b/include/trace/events/dma_fence.h
> index 84c83074ee81..4814a65b68dc 100644
> --- a/include/trace/events/dma_fence.h
> +++ b/include/trace/events/dma_fence.h
> @@ -34,14 +34,44 @@ DECLARE_EVENT_CLASS(dma_fence,
> __entry->seqno)
> );
>
> -DEFINE_EVENT(dma_fence, dma_fence_emit,
> +/*
> + * Safe only for call sites which are guaranteed to not race with fence
> + * signaling,holding the fence->lock and having checked for not signaled, or the
> + * signaling path itself.
> + */
> +DECLARE_EVENT_CLASS(dma_fence_unsignaled,
> +
> + TP_PROTO(struct dma_fence *fence),
> +
> + TP_ARGS(fence),
> +
> + TP_STRUCT__entry(
> + __string(driver, fence->ops->get_driver_name(fence))
> + __string(timeline, fence->ops->get_timeline_name(fence))
> + __field(unsigned int, context)
> + __field(unsigned int, seqno)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(driver);
> + __assign_str(timeline);
> + __entry->context = fence->context;
> + __entry->seqno = fence->seqno;
> + ),
> +
> + TP_printk("driver=%s timeline=%s context=%u seqno=%u",
> + __get_str(driver), __get_str(timeline), __entry->context,
> + __entry->seqno)
> +);
> +
> +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_emit,
>
> TP_PROTO(struct dma_fence *fence),
>
> TP_ARGS(fence)
> );
>
> -DEFINE_EVENT(dma_fence, dma_fence_init,
> +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_init,
>
> TP_PROTO(struct dma_fence *fence),
>
> @@ -55,14 +85,14 @@ DEFINE_EVENT(dma_fence, dma_fence_destroy,
> TP_ARGS(fence)
> );
>
> -DEFINE_EVENT(dma_fence, dma_fence_enable_signal,
> +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_enable_signal,
>
> TP_PROTO(struct dma_fence *fence),
>
> TP_ARGS(fence)
> );
>
> -DEFINE_EVENT(dma_fence, dma_fence_signaled,
> +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_signaled,
>
> TP_PROTO(struct dma_fence *fence),
>