On Wed, May 7, 2025 at 7:14 AM Alexei Starovoitov
<alexei.starovoitov(a)gmail.com> wrote:
>
> On Tue, May 6, 2025 at 5:10 PM T.J. Mercier <tjmercier(a)google.com> wrote:
> >
> > +/**
> > + * get_first_dmabuf - begin iteration through global list of DMA-bufs
> > + *
> > + * Returns the first buffer in the global list of DMA-bufs that's not in the
> > + * process of being destroyed. Increments that buffer's reference count to
> > + * prevent buffer destruction. Callers must release the reference, either by
> > + * continuing iteration with get_next_dmabuf(), or with dma_buf_put().
> > + *
> > + * Returns NULL If no active buffers are present.
> > + */
>
> kdoc wants to see 'Return:'.
>
> See errors in BPF CI.
>
> And patch 5 shouldn't be using /** for plain comments.
Thanks, I found the CI errors, fixed, and verified with
scripts/kernel-doc. I didn't receive emails about them though, not
sure if I should have.
> pw-bot: cr
Hi,
This patch set allocates the protected DMA-bufs from a DMA-heap
instantiated from the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QTEE) which sets up the
protection for the memory used for the DMA-bufs.
The DMA-heap uses a protected memory pool provided by the backend TEE
driver, allowing it to choose how to allocate the protected physical
memory.
The allocated DMA-bufs must be imported with a new TEE_IOC_SHM_REGISTER_FD
before they can be passed as arguments when requesting services from the
secure world.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) has been identified so far to serve as examples of what can be
expected. The use-cases has predefined DMA-heap names,
"protected,secure-video", "protected,trusted-ui", and
"protected,secure-video-record". The backend driver registers protected
memory pools for the use-cases it supports.
Each use-case has it's own protected memory pool since different use-cases
requires isolation from different parts of the system. A protected memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA and made protected as
needed by the TEE.
This can be tested on a RockPi 4B+ with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m rockpi4.xml \
-b prototype/sdp-v8
repo sync -j8
cd build
make toolchains -j$(nproc)
make all -j$(nproc)
# Copy ../out/rockpi4.img to an SD card and boot the RockPi from that
# Connect a monitor to the RockPi
# login and at the prompt:
gst-launch-1.0 videotestsrc ! \
aesenc key=1f9423681beb9a79215820f6bda73d0f \
iv=e9aa8e834d8d70b7e0d254ff670dd718 serialize-iv=true ! \
aesdec key=1f9423681beb9a79215820f6bda73d0f ! \
kmssink
The aesdec module has been hacked to use an OP-TEE TA to decrypt the stream
into protected DMA-bufs which are consumed by the kmssink.
The primitive QEMU tests from previous patch set can be tested on RockPi
in the same way with:
xtest --sdp-basic
The primitive test are tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v8
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%rm -rf ../trusted-firmware-a/build/qemu
for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory. There are also some
negative tests for out of bounds buffers etc.
Thanks,
Jens
Changes since V7:
* Adding "dma-buf: dma-heap: export declared functions",
"cma: export cma_alloc() and cma_release()", and
"dma-contiguous: export dma_contiguous_default_area" to export the symbols
needed to keep the TEE subsystem as a load module.
* Removing CONFIG_TEE_DMABUF_HEAP and CONFIG_TEE_CMA since they aren't
needed any longer.
* Addressing review comments in "optee: sync secure world ABI headers"
* Better align protected memory pool initialization between the smc-abi and
ffa-abi parts of the optee driver.
Changes since V6:
* Restricted memory is now known as protected memory since to use the same
term as https://docs.vulkan.org/guide/latest/protected.html. Update all
patches to consistently use protected memory.
* In "tee: implement protected DMA-heap" add the hidden config option
TEE_DMABUF_HEAP to tell if the DMABUF_HEAPS functions are available
for the TEE subsystem
* Adding "tee: refactor params_from_user()", broken out from the patch
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor"
* For "tee: new ioctl to a register tee_shm from a dmabuf file descriptor":
- Update commit message to mention protected memory
- Remove and open code tee_shm_get_parent_shm() in param_from_user_memref()
* In "tee: add tee_shm_alloc_cma_phys_mem" add the hidden config option
TEE_CMA to tell if the CMA functions are available for the TEE subsystem
* For "tee: tee_device_alloc(): copy dma_mask from parent device" and
"optee: pass parent device to tee_device_alloc", added
Reviewed-by: Sumit Garg <sumit.garg(a)kernel.org>
Changes since V5:
* Removing "tee: add restricted memory allocation" and
"tee: add TEE_IOC_RSTMEM_FD_INFO"
* Adding "tee: implement restricted DMA-heap",
"tee: new ioctl to a register tee_shm from a dmabuf file descriptor",
"tee: add tee_shm_alloc_cma_phys_mem()",
"optee: pass parent device to tee_device_alloc()", and
"tee: tee_device_alloc(): copy dma_mask from parent device"
* The two TEE driver OPs "rstmem_alloc()" and "rstmem_free()" are replaced
with a struct tee_rstmem_pool abstraction.
* Replaced the the TEE_IOC_RSTMEM_ALLOC user space API with the DMA-heap API
Changes since V4:
* Adding the patch "tee: add TEE_IOC_RSTMEM_FD_INFO" needed by the
GStreamer demo
* Removing the dummy CPU access and mmap functions from the dma_buf_ops
* Fixing a compile error in "optee: FF-A: dynamic restricted memory allocation"
reported by kernel test robot <lkp(a)intel.com>
Changes since V3:
* Make the use_case and flags field in struct tee_shm u32's instead of
u16's
* Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
* Import namespace DMA_BUF in module tee, reported by lkp(a)intel.com
* Added a note in the commit message for "optee: account for direction
while converting parameters" why it's needed
* Factor out dynamic restricted memory allocation from
"optee: support restricted memory allocation" into two new commits
"optee: FF-A: dynamic restricted memory allocation" and
"optee: smc abi: dynamic restricted memory allocation"
* Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Etienne Carriere (1):
tee: new ioctl to a register tee_shm from a dmabuf file descriptor
Jens Wiklander (13):
tee: tee_device_alloc(): copy dma_mask from parent device
optee: pass parent device to tee_device_alloc()
optee: account for direction while converting parameters
optee: sync secure world ABI headers
dma-buf: dma-heap: export declared functions
tee: implement protected DMA-heap
tee: refactor params_from_user()
cma: export cma_alloc() and cma_release()
dma-contiguous: export dma_contiguous_default_area
tee: add tee_shm_alloc_cma_phys_mem()
optee: support protected memory allocation
optee: FF-A: dynamic protected memory allocation
optee: smc abi: dynamic protected memory allocation
drivers/dma-buf/dma-heap.c | 3 +
drivers/tee/Makefile | 1 +
drivers/tee/optee/Makefile | 1 +
drivers/tee/optee/call.c | 10 +-
drivers/tee/optee/core.c | 1 +
drivers/tee/optee/ffa_abi.c | 198 ++++++++++++-
drivers/tee/optee/optee_ffa.h | 27 +-
drivers/tee/optee/optee_msg.h | 83 +++++-
drivers/tee/optee/optee_private.h | 55 +++-
drivers/tee/optee/optee_smc.h | 69 ++++-
drivers/tee/optee/protmem.c | 330 +++++++++++++++++++++
drivers/tee/optee/rpc.c | 31 +-
drivers/tee/optee/smc_abi.c | 191 ++++++++++--
drivers/tee/tee_core.c | 157 +++++++---
drivers/tee/tee_heap.c | 470 ++++++++++++++++++++++++++++++
drivers/tee/tee_private.h | 16 +
drivers/tee/tee_shm.c | 164 ++++++++++-
include/linux/tee_core.h | 70 +++++
include/linux/tee_drv.h | 10 +
include/uapi/linux/tee.h | 31 ++
kernel/dma/contiguous.c | 1 +
mm/cma.c | 2 +
22 files changed, 1789 insertions(+), 132 deletions(-)
create mode 100644 drivers/tee/optee/protmem.c
create mode 100644 drivers/tee/tee_heap.c
base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e
--
2.43.0
Dear All,
This patchset fixes the incorrect use of dma_sync_sg_*() calls in
media and related drivers. They are replaced with much safer
dma_sync_sgtable_*() variants, which take care of passing the proper
number of elements for the sync operation.
Best regards
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Change log:
-v2: fixes typos and added cc: stable
Patch summary:
Marek Szyprowski (3):
media: videobuf2: use sgtable-based scatterlist wrappers
udmabuf: use sgtable-based scatterlist wrappers
media: omap3isp: use sgtable-based scatterlist wrappers
drivers/dma-buf/udmabuf.c | 5 ++---
drivers/media/common/videobuf2/videobuf2-dma-sg.c | 4 ++--
drivers/media/platform/ti/omap3isp/ispccdc.c | 8 ++++----
drivers/media/platform/ti/omap3isp/ispstat.c | 6 ++----
4 files changed, 10 insertions(+), 13 deletions(-)
--
2.34.1
Dear All,
This patchset fixes the incorrect use of dma_sync_sg_*() calls in
media and related drivers. They are replaced with much safer
dma_sync_sgtable_*() variants, which take care of passing the proper
number of elements for the sync operation.
Best regards
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Patch summary:
Marek Szyprowski (3):
media: videobuf2: use sgtable-based scatterlist wrappers
udmabuf: use sgtable-based scatterlist wrappers
omap3isp:: use sgtable-based scatterlist wrappers
drivers/dma-buf/udmabuf.c | 5 ++---
drivers/media/common/videobuf2/videobuf2-dma-sg.c | 4 ++--
drivers/media/platform/ti/omap3isp/ispccdc.c | 8 ++++----
drivers/media/platform/ti/omap3isp/ispstat.c | 6 ++----
4 files changed, 10 insertions(+), 13 deletions(-)
--
2.34.1
Hi Jared,
Thanks for working on this
On Tue, Apr 22, 2025 at 12:19:39PM -0700, Jared Kangas wrote:
> The CMA heap's name in devtmpfs can vary depending on how the heap is
> defined. Its name defaults to "reserved", but if a CMA area is defined
> in the devicetree, the heap takes on the devicetree node's name, such as
> "default-pool" or "linux,cma". To simplify naming, just name it
> "default_cma", and keep a legacy node in place backed by the same
> underlying structure for backwards compatibility.
>
> Signed-off-by: Jared Kangas <jkangas(a)redhat.com>
> ---
> Documentation/userspace-api/dma-buf-heaps.rst | 11 +++++++----
> drivers/dma-buf/heaps/Kconfig | 10 ++++++++++
> drivers/dma-buf/heaps/cma_heap.c | 14 +++++++++++++-
> 3 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
> index 535f49047ce64..577de813ba461 100644
> --- a/Documentation/userspace-api/dma-buf-heaps.rst
> +++ b/Documentation/userspace-api/dma-buf-heaps.rst
> @@ -19,7 +19,10 @@ following heaps:
> - The ``cma`` heap allocates physically contiguous, cacheable,
> buffers. Only present if a CMA region is present. Such a region is
> usually created either through the kernel commandline through the
> - `cma` parameter, a memory region Device-Tree node with the
> - `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
> - `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
> - might be called ``reserved``, ``linux,cma``, or ``default-pool``.
> + ``cma`` parameter, a memory region Device-Tree node with the
> + ``linux,cma-default`` property set, or through the ``CMA_SIZE_MBYTES`` or
> + ``CMA_SIZE_PERCENTAGE`` Kconfig options. The heap's name in devtmpfs is
> + ``default_cma``. For backwards compatibility, when the
> + ``DMABUF_HEAPS_CMA_LEGACY`` Kconfig option is set, a duplicate node is
> + created following legacy naming conventions; the legacy name might be
> + ``reserved``, ``linux,cma``, or ``default-pool``.
It looks like, in addition to documenting the new naming, you also
changed all the backticks to double backticks. Why did you do so? It
seems mostly unrelated to that patch, so it would be better in a
separate patch.
> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index a5eef06c42264..83f3770fa820a 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
> @@ -12,3 +12,13 @@ config DMABUF_HEAPS_CMA
> Choose this option to enable dma-buf CMA heap. This heap is backed
> by the Contiguous Memory Allocator (CMA). If your system has these
> regions, you should say Y here.
> +
> +config DMABUF_HEAPS_CMA_LEGACY
> + bool "DMA-BUF CMA Heap"
> + default y
> + depends on DMABUF_HEAPS_CMA
> + help
> + Add a duplicate CMA-backed dma-buf heap with legacy naming derived
> + from the CMA area's devicetree node, or "reserved" if the area is not
> + defined in the devicetree. This uses the same underlying allocator as
> + CONFIG_DMABUF_HEAPS_CMA.
> diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
> index e998d8ccd1dc6..cd742c961190d 100644
> --- a/drivers/dma-buf/heaps/cma_heap.c
> +++ b/drivers/dma-buf/heaps/cma_heap.c
> @@ -22,6 +22,7 @@
> #include <linux/slab.h>
> #include <linux/vmalloc.h>
>
> +#define DEFAULT_CMA_NAME "default_cma"
I appreciate this is kind of bikeshed-color territory, but I think "cma"
would be a better option here. There's nothing "default" about it.
> struct cma_heap {
> struct dma_heap *heap;
> @@ -394,15 +395,26 @@ static int __init __add_cma_heap(struct cma *cma, const char *name)
> static int __init add_default_cma_heap(void)
> {
> struct cma *default_cma = dev_get_cma_area(NULL);
> + const char *legacy_cma_name;
> int ret;
>
> if (!default_cma)
> return 0;
>
> - ret = __add_cma_heap(default_cma, cma_get_name(default_cma));
> + ret = __add_cma_heap(default_cma, DEFAULT_CMA_NAME);
> if (ret)
> return ret;
>
> + legacy_cma_name = cma_get_name(default_cma);
> +
> + if (IS_ENABLED(CONFIG_DMABUF_HEAPS_CMA_LEGACY) &&
> + strcmp(legacy_cma_name, DEFAULT_CMA_NAME)) {
> + ret = __add_cma_heap(default_cma, legacy_cma_name);
> + if (ret)
> + pr_warn("cma_heap: failed to add legacy heap: %pe\n",
> + ERR_PTR(-ret));
> + }
> +
It would also simplify this part, since you would always create the legacy heap.
Maxime
Hi
Am 05.05.25 um 09:29 schrieb Boris Brezillon:
> On Wed, 30 Apr 2025 11:07:17 +0200
> Simona Vetter <simona.vetter(a)ffwll.ch> wrote:
>
>> On Wed, Apr 16, 2025 at 11:38:03AM +0200, Simona Vetter wrote:
>>> On Wed, Apr 16, 2025 at 08:57:45AM +0200, Thomas Zimmermann wrote:
>>>> Test struct drm_gem_object.import_attach to detect imported objects.
>>>>
>>>> During object clenanup, the dma_buf field might be NULL. Testing it in
>>>> an object's free callback then incorrectly does a cleanup as for native
>>>> objects. Happens for calls to drm_mode_destroy_dumb_ioctl() that
>>>> clears the dma_buf field in drm_gem_object_exported_dma_buf_free().
>>>>
>>>> v3:
>>>> - only test for import_attach (Boris)
>>>> v2:
>>>> - use import_attach.dmabuf instead of dma_buf (Christian)
>>>>
>>>> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
>>>> Fixes: b57aa47d39e9 ("drm/gem: Test for imported GEM buffers with helper")
>>>> Reported-by: Andy Yan <andyshrk(a)163.com>
>>>> Closes: https://lore.kernel.org/dri-devel/38d09d34.4354.196379aa560.Coremail.andysh…
>>>> Tested-by: Andy Yan <andyshrk(a)163.com>
>>>> Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
>>>> Cc: Anusha Srivatsa <asrivats(a)redhat.com>
>>>> Cc: Christian König <christian.koenig(a)amd.com>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
>>>> Cc: Maxime Ripard <mripard(a)kernel.org>
>>>> Cc: David Airlie <airlied(a)gmail.com>
>>>> Cc: Simona Vetter <simona(a)ffwll.ch>
>>>> Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
>>>> Cc: "Christian König" <christian.koenig(a)amd.com>
>>>> Cc: dri-devel(a)lists.freedesktop.org
>>>> Cc: linux-media(a)vger.kernel.org
>>>> Cc: linaro-mm-sig(a)lists.linaro.org
>>> Reviewed-by: Simona Vetter <simona.vetter(a)ffwll.ch>
>> Also quick doc request: We do have a bit of overview documentation for
>> prime here about specifically this lifetime fun, and why there's a chain
>> of references and hence a distinction between imported foreign dma-buf and
>> re-imported native dma-buf:
>>
>> https://dri.freedesktop.org/docs/drm/gpu/drm-mm.html#reference-counting-for…
>>
>> I think it would be good to augment this with more links to functions
>> (like this one recently added and fixed in this patch here) and struct
>> members to that overview. And maybe also link from key function and struct
>> functions back to that overview doc. Otherwise I think the next person
>> will get confused by this rather tricky code again and break a corner
>> cases.
> BTW, could we also backmerge 6.15-rc5 into drm-misc-next so the fix is
> also present in drm-misc-next?
drm-misc-next is now at -rc5
Best regards
Thomas
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
From: Thomas Zimmermann <tzimmermann(a)suse.de>
[ Upstream commit b57aa47d39e94dc47403a745e2024664e544078c ]
Add drm_gem_is_imported() that tests if a GEM object's buffer has
been imported. Update the GEM code accordingly.
GEM code usually tests for imports if import_attach has been set
in struct drm_gem_object. But attaching a dma-buf on import requires
a DMA-capable importer device, which is not the case for many serial
busses like USB or I2C. The new helper tests if a GEM object's dma-buf
has been created from the GEM object.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Anusha Srivatsa <asrivats(a)redhat.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-2-tzimm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_gem.c | 4 ++--
include/drm/drm_gem.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee14..deb93f78ce344 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -322,7 +322,7 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
return -ENOENT;
/* Don't allow imported objects to be mapped */
- if (obj->import_attach) {
+ if (drm_gem_is_imported(obj)) {
ret = -EINVAL;
goto out;
}
@@ -1155,7 +1155,7 @@ void drm_gem_print_info(struct drm_printer *p, unsigned int indent,
drm_vma_node_start(&obj->vma_node));
drm_printf_indent(p, indent, "size=%zu\n", obj->size);
drm_printf_indent(p, indent, "imported=%s\n",
- str_yes_no(obj->import_attach));
+ str_yes_no(drm_gem_is_imported(obj)));
if (obj->funcs->print_info)
obj->funcs->print_info(p, indent, obj);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 7c2ec139c464a..fbfccb96dd17b 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -35,6 +35,7 @@
*/
#include <linux/kref.h>
+#include <linux/dma-buf.h>
#include <linux/dma-resv.h>
#include <linux/list.h>
#include <linux/mutex.h>
@@ -557,6 +558,19 @@ static inline bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_obje
return (obj->handle_count > 1) || obj->dma_buf;
}
+/**
+ * drm_gem_is_imported() - Tests if GEM object's buffer has been imported
+ * @obj: the GEM object
+ *
+ * Returns:
+ * True if the GEM object's buffer has been imported, false otherwise
+ */
+static inline bool drm_gem_is_imported(const struct drm_gem_object *obj)
+{
+ /* The dma-buf's priv field points to the original GEM object. */
+ return obj->dma_buf && (obj->dma_buf->priv != obj);
+}
+
#ifdef CONFIG_LOCKDEP
/**
* drm_gem_gpuva_set_lock() - Set the lock protecting accesses to the gpuva list.
--
2.39.5
From: Thomas Zimmermann <tzimmermann(a)suse.de>
[ Upstream commit b57aa47d39e94dc47403a745e2024664e544078c ]
Add drm_gem_is_imported() that tests if a GEM object's buffer has
been imported. Update the GEM code accordingly.
GEM code usually tests for imports if import_attach has been set
in struct drm_gem_object. But attaching a dma-buf on import requires
a DMA-capable importer device, which is not the case for many serial
busses like USB or I2C. The new helper tests if a GEM object's dma-buf
has been created from the GEM object.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Anusha Srivatsa <asrivats(a)redhat.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-2-tzimm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_gem.c | 4 ++--
include/drm/drm_gem.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 149b8e25da5bb..426d0867882df 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -322,7 +322,7 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
return -ENOENT;
/* Don't allow imported objects to be mapped */
- if (obj->import_attach) {
+ if (drm_gem_is_imported(obj)) {
ret = -EINVAL;
goto out;
}
@@ -1152,7 +1152,7 @@ void drm_gem_print_info(struct drm_printer *p, unsigned int indent,
drm_vma_node_start(&obj->vma_node));
drm_printf_indent(p, indent, "size=%zu\n", obj->size);
drm_printf_indent(p, indent, "imported=%s\n",
- str_yes_no(obj->import_attach));
+ str_yes_no(drm_gem_is_imported(obj)));
if (obj->funcs->print_info)
obj->funcs->print_info(p, indent, obj);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index d8b86df2ec0da..70c0f8c83629d 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -35,6 +35,7 @@
*/
#include <linux/kref.h>
+#include <linux/dma-buf.h>
#include <linux/dma-resv.h>
#include <linux/list.h>
#include <linux/mutex.h>
@@ -570,6 +571,19 @@ static inline bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_obje
return (obj->handle_count > 1) || obj->dma_buf;
}
+/**
+ * drm_gem_is_imported() - Tests if GEM object's buffer has been imported
+ * @obj: the GEM object
+ *
+ * Returns:
+ * True if the GEM object's buffer has been imported, false otherwise
+ */
+static inline bool drm_gem_is_imported(const struct drm_gem_object *obj)
+{
+ /* The dma-buf's priv field points to the original GEM object. */
+ return obj->dma_buf && (obj->dma_buf->priv != obj);
+}
+
#ifdef CONFIG_LOCKDEP
/**
* drm_gem_gpuva_set_lock() - Set the lock protecting accesses to the gpuva list.
--
2.39.5
From: Thomas Zimmermann <tzimmermann(a)suse.de>
[ Upstream commit 8260731ccad0451207b45844bb66eb161a209218 ]
Test struct drm_gem_object.import_attach to detect imported objects.
During object clenanup, the dma_buf field might be NULL. Testing it in
an object's free callback then incorrectly does a cleanup as for native
objects. Happens for calls to drm_mode_destroy_dumb_ioctl() that
clears the dma_buf field in drm_gem_object_exported_dma_buf_free().
v3:
- only test for import_attach (Boris)
v2:
- use import_attach.dmabuf instead of dma_buf (Christian)
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: b57aa47d39e9 ("drm/gem: Test for imported GEM buffers with helper")
Reported-by: Andy Yan <andyshrk(a)163.com>
Closes: https://lore.kernel.org/dri-devel/38d09d34.4354.196379aa560.Coremail.andysh…
Tested-by: Andy Yan <andyshrk(a)163.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Anusha Srivatsa <asrivats(a)redhat.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Simona Vetter <simona(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
Reviewed-by: Simona Vetter <simona.vetter(a)ffwll.ch>
Link: https://lore.kernel.org/r/20250416065820.26076-1-tzimmermann@suse.de
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/drm/drm_gem.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 2bf893eabb4b2..bcd54020d6ba5 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -585,8 +585,7 @@ static inline bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_obje
*/
static inline bool drm_gem_is_imported(const struct drm_gem_object *obj)
{
- /* The dma-buf's priv field points to the original GEM object. */
- return obj->dma_buf && (obj->dma_buf->priv != obj);
+ return !!obj->import_attach;
}
#ifdef CONFIG_LOCKDEP
--
2.39.5
From: Thomas Zimmermann <tzimmermann(a)suse.de>
[ Upstream commit b57aa47d39e94dc47403a745e2024664e544078c ]
Add drm_gem_is_imported() that tests if a GEM object's buffer has
been imported. Update the GEM code accordingly.
GEM code usually tests for imports if import_attach has been set
in struct drm_gem_object. But attaching a dma-buf on import requires
a DMA-capable importer device, which is not the case for many serial
busses like USB or I2C. The new helper tests if a GEM object's dma-buf
has been created from the GEM object.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Anusha Srivatsa <asrivats(a)redhat.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-2-tzimm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_gem.c | 4 ++--
include/drm/drm_gem.h | 14 ++++++++++++++
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index ee811764c3df4..c6240bab3fa55 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -348,7 +348,7 @@ int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
return -ENOENT;
/* Don't allow imported objects to be mapped */
- if (obj->import_attach) {
+ if (drm_gem_is_imported(obj)) {
ret = -EINVAL;
goto out;
}
@@ -1178,7 +1178,7 @@ void drm_gem_print_info(struct drm_printer *p, unsigned int indent,
drm_vma_node_start(&obj->vma_node));
drm_printf_indent(p, indent, "size=%zu\n", obj->size);
drm_printf_indent(p, indent, "imported=%s\n",
- str_yes_no(obj->import_attach));
+ str_yes_no(drm_gem_is_imported(obj)));
if (obj->funcs->print_info)
obj->funcs->print_info(p, indent, obj);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index fdae947682cd0..2bf893eabb4b2 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -35,6 +35,7 @@
*/
#include <linux/kref.h>
+#include <linux/dma-buf.h>
#include <linux/dma-resv.h>
#include <linux/list.h>
#include <linux/mutex.h>
@@ -575,6 +576,19 @@ static inline bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_obje
return (obj->handle_count > 1) || obj->dma_buf;
}
+/**
+ * drm_gem_is_imported() - Tests if GEM object's buffer has been imported
+ * @obj: the GEM object
+ *
+ * Returns:
+ * True if the GEM object's buffer has been imported, false otherwise
+ */
+static inline bool drm_gem_is_imported(const struct drm_gem_object *obj)
+{
+ /* The dma-buf's priv field points to the original GEM object. */
+ return obj->dma_buf && (obj->dma_buf->priv != obj);
+}
+
#ifdef CONFIG_LOCKDEP
/**
* drm_gem_gpuva_set_lock() - Set the lock protecting accesses to the gpuva list.
--
2.39.5
From: Christian König <christian.koenig(a)amd.com>
[ Upstream commit bd22e44ad415ac22e3a4f9a983d2a085f6cb4427 ]
Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.
So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.
v2: use ~0l for jobs without isolation to distinct it from kernel
submissions which uses NULL for the owner. Add some warning when we
are OOM.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 13 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 98 +++++++++++++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 43 ++++------
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 16 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 19 +++++
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h | 1 +
6 files changed, 155 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 98f0c12df12bc..9a61f5fe3245a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1187,9 +1187,15 @@ struct amdgpu_device {
bool debug_enable_ras_aca;
bool debug_exp_resets;
- bool enforce_isolation[MAX_XCP];
- /* Added this mutex for cleaner shader isolation between GFX and compute processes */
+ /* Protection for the following isolation structure */
struct mutex enforce_isolation_mutex;
+ bool enforce_isolation[MAX_XCP];
+ struct amdgpu_isolation {
+ void *owner;
+ struct dma_fence *spearhead;
+ struct amdgpu_sync active;
+ struct amdgpu_sync prev;
+ } isolation[MAX_XCP];
struct amdgpu_init_level *init_lvl;
};
@@ -1470,6 +1476,9 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev,
struct dma_fence *amdgpu_device_get_gang(struct amdgpu_device *adev);
struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
struct dma_fence *gang);
+struct dma_fence *amdgpu_device_enforce_isolation(struct amdgpu_device *adev,
+ struct amdgpu_ring *ring,
+ struct amdgpu_job *job);
bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev);
ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 71e8a76180ad6..e298b48488c22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4232,6 +4232,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
mutex_init(&adev->gfx.reset_sem_mutex);
/* Initialize the mutex for cleaner shader isolation between GFX and compute processes */
mutex_init(&adev->enforce_isolation_mutex);
+ for (i = 0; i < MAX_XCP; ++i) {
+ adev->isolation[i].spearhead = dma_fence_get_stub();
+ amdgpu_sync_create(&adev->isolation[i].active);
+ amdgpu_sync_create(&adev->isolation[i].prev);
+ }
mutex_init(&adev->gfx.kfd_sch_mutex);
amdgpu_device_init_apu_flags(adev);
@@ -4731,7 +4736,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
void amdgpu_device_fini_sw(struct amdgpu_device *adev)
{
- int idx;
+ int i, idx;
bool px;
amdgpu_device_ip_fini(adev);
@@ -4739,6 +4744,11 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
amdgpu_ucode_release(&adev->firmware.gpu_info_fw);
adev->accel_working = false;
dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));
+ for (i = 0; i < MAX_XCP; ++i) {
+ dma_fence_put(adev->isolation[i].spearhead);
+ amdgpu_sync_free(&adev->isolation[i].active);
+ amdgpu_sync_free(&adev->isolation[i].prev);
+ }
amdgpu_reset_fini(adev);
@@ -6875,6 +6885,92 @@ struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
return NULL;
}
+/**
+ * amdgpu_device_enforce_isolation - enforce HW isolation
+ * @adev: the amdgpu device pointer
+ * @ring: the HW ring the job is supposed to run on
+ * @job: the job which is about to be pushed to the HW ring
+ *
+ * Makes sure that only one client at a time can use the GFX block.
+ * Returns: The dependency to wait on before the job can be pushed to the HW.
+ * The function is called multiple times until NULL is returned.
+ */
+struct dma_fence *amdgpu_device_enforce_isolation(struct amdgpu_device *adev,
+ struct amdgpu_ring *ring,
+ struct amdgpu_job *job)
+{
+ struct amdgpu_isolation *isolation = &adev->isolation[ring->xcp_id];
+ struct drm_sched_fence *f = job->base.s_fence;
+ struct dma_fence *dep;
+ void *owner;
+ int r;
+
+ /*
+ * For now enforce isolation only for the GFX block since we only need
+ * the cleaner shader on those rings.
+ */
+ if (ring->funcs->type != AMDGPU_RING_TYPE_GFX &&
+ ring->funcs->type != AMDGPU_RING_TYPE_COMPUTE)
+ return NULL;
+
+ /*
+ * All submissions where enforce isolation is false are handled as if
+ * they come from a single client. Use ~0l as the owner to distinct it
+ * from kernel submissions where the owner is NULL.
+ */
+ owner = job->enforce_isolation ? f->owner : (void *)~0l;
+
+ mutex_lock(&adev->enforce_isolation_mutex);
+
+ /*
+ * The "spearhead" submission is the first one which changes the
+ * ownership to its client. We always need to wait for it to be
+ * pushed to the HW before proceeding with anything.
+ */
+ if (&f->scheduled != isolation->spearhead &&
+ !dma_fence_is_signaled(isolation->spearhead)) {
+ dep = isolation->spearhead;
+ goto out_grab_ref;
+ }
+
+ if (isolation->owner != owner) {
+
+ /*
+ * Wait for any gang to be assembled before switching to a
+ * different owner or otherwise we could deadlock the
+ * submissions.
+ */
+ if (!job->gang_submit) {
+ dep = amdgpu_device_get_gang(adev);
+ if (!dma_fence_is_signaled(dep))
+ goto out_return_dep;
+ dma_fence_put(dep);
+ }
+
+ dma_fence_put(isolation->spearhead);
+ isolation->spearhead = dma_fence_get(&f->scheduled);
+ amdgpu_sync_move(&isolation->active, &isolation->prev);
+ isolation->owner = owner;
+ }
+
+ /*
+ * Specifying the ring here helps to pipeline submissions even when
+ * isolation is enabled. If that is not desired for testing NULL can be
+ * used instead of the ring to enforce a CPU round trip while switching
+ * between clients.
+ */
+ dep = amdgpu_sync_peek_fence(&isolation->prev, ring);
+ r = amdgpu_sync_fence(&isolation->active, &f->finished, GFP_NOWAIT);
+ if (r)
+ DRM_WARN("OOM tracking isolation\n");
+
+out_grab_ref:
+ dma_fence_get(dep);
+out_return_dep:
+ mutex_unlock(&adev->enforce_isolation_mutex);
+ return dep;
+}
+
bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev)
{
switch (adev->asic_type) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 8e712a11aba5d..9008b7388e897 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -287,40 +287,27 @@ static int amdgpu_vmid_grab_reserved(struct amdgpu_vm *vm,
(*id)->flushed_updates < updates ||
!(*id)->last_flush ||
((*id)->last_flush->context != fence_context &&
- !dma_fence_is_signaled((*id)->last_flush))) {
+ !dma_fence_is_signaled((*id)->last_flush)))
+ needs_flush = true;
+
+ if ((*id)->owner != vm->immediate.fence_context ||
+ (!adev->vm_manager.concurrent_flush && needs_flush)) {
struct dma_fence *tmp;
- /* Wait for the gang to be assembled before using a
- * reserved VMID or otherwise the gang could deadlock.
+ /* Don't use per engine and per process VMID at the
+ * same time
*/
- tmp = amdgpu_device_get_gang(adev);
- if (!dma_fence_is_signaled(tmp) && tmp != job->gang_submit) {
+ if (adev->vm_manager.concurrent_flush)
+ ring = NULL;
+
+ /* to prevent one context starved by another context */
+ (*id)->pd_gpu_addr = 0;
+ tmp = amdgpu_sync_peek_fence(&(*id)->active, ring);
+ if (tmp) {
*id = NULL;
- *fence = tmp;
+ *fence = dma_fence_get(tmp);
return 0;
}
- dma_fence_put(tmp);
-
- /* Make sure the id is owned by the gang before proceeding */
- if (!job->gang_submit ||
- (*id)->owner != vm->immediate.fence_context) {
-
- /* Don't use per engine and per process VMID at the
- * same time
- */
- if (adev->vm_manager.concurrent_flush)
- ring = NULL;
-
- /* to prevent one context starved by another context */
- (*id)->pd_gpu_addr = 0;
- tmp = amdgpu_sync_peek_fence(&(*id)->active, ring);
- if (tmp) {
- *id = NULL;
- *fence = dma_fence_get(tmp);
- return 0;
- }
- }
- needs_flush = true;
}
/* Good we can use this VMID. Remember this submission as
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f044759435..685c61a05af85 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -342,17 +342,24 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
{
struct amdgpu_ring *ring = to_amdgpu_ring(s_entity->rq->sched);
struct amdgpu_job *job = to_amdgpu_job(sched_job);
- struct dma_fence *fence = NULL;
+ struct dma_fence *fence;
int r;
r = drm_sched_entity_error(s_entity);
if (r)
goto error;
- if (job->gang_submit)
+ if (job->gang_submit) {
fence = amdgpu_device_switch_gang(ring->adev, job->gang_submit);
+ if (fence)
+ return fence;
+ }
+
+ fence = amdgpu_device_enforce_isolation(ring->adev, ring, job);
+ if (fence)
+ return fence;
- if (!fence && job->vm && !job->vmid) {
+ if (job->vm && !job->vmid) {
r = amdgpu_vmid_grab(job->vm, ring, job, &fence);
if (r) {
dev_err(ring->adev->dev, "Error getting VM ID (%d)\n", r);
@@ -365,9 +372,10 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
*/
if (!fence)
job->vm = NULL;
+ return fence;
}
- return fence;
+ return NULL;
error:
dma_fence_set_error(&job->base.s_fence->finished, r);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index c586ab4c911bf..d75715b3f1870 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -399,6 +399,25 @@ int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone)
return 0;
}
+/**
+ * amdgpu_sync_move - move all fences from src to dst
+ *
+ * @src: source of the fences, empty after function
+ * @dst: destination for the fences
+ *
+ * Moves all fences from source to destination. All fences in destination are
+ * freed and source is empty after the function call.
+ */
+void amdgpu_sync_move(struct amdgpu_sync *src, struct amdgpu_sync *dst)
+{
+ unsigned int i;
+
+ amdgpu_sync_free(dst);
+
+ for (i = 0; i < HASH_SIZE(src->fences); ++i)
+ hlist_move_list(&src->fences[i], &dst->fences[i]);
+}
+
/**
* amdgpu_sync_push_to_job - push fences into job
* @sync: sync object to get the fences from
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
index e3272dce798d7..a91a8eaf808b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
@@ -56,6 +56,7 @@ struct dma_fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync,
struct amdgpu_ring *ring);
struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync);
int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone);
+void amdgpu_sync_move(struct amdgpu_sync *src, struct amdgpu_sync *dst);
int amdgpu_sync_push_to_job(struct amdgpu_sync *sync, struct amdgpu_job *job);
int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr);
void amdgpu_sync_free(struct amdgpu_sync *sync);
--
2.39.5
Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to
perform per-buffer accounting with debugfs which is not suitable for
production environments. Eventually we discovered the overhead with
per-buffer sysfs file creation/removal was significantly impacting
allocation and free times, and exacerbated kernfs lock contention. [2]
dma_buf_stats_setup() is responsible for 39% of single-page buffer
creation duration, or 74% of single-page dma_buf_export() duration when
stressing dmabuf allocations and frees.
I prototyped a change from per-buffer to per-exporter statistics with a
RCU protected list of exporter allocations that accommodates most (but
not all) of our use-cases and avoids almost all of the sysfs overhead.
While that adds less overhead than per-buffer sysfs, and less even than
the maintenance of the dmabuf debugfs_list, it's still *additional*
overhead on top of the debugfs_list and doesn't give us per-buffer info.
This series uses the existing dmabuf debugfs_list to implement a BPF
dmabuf iterator, which adds no overhead to buffer allocation/free and
provides per-buffer info. The list has been moved outside of
CONFIG_DEBUG_FS scope so that it is always populated. The BPF program
loaded by userspace that extracts per-buffer information gets to define
its own interface which avoids the lack of ABI stability with debugfs.
As this is a replacement for our use of CONFIG_DMABUF_SYSFS_STATS, the
last patch is a RFC for removing it from the kernel. Please see my
suggestion there regarding the timeline for that.
[1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.…
[2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com
v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com
v1 -> v2:
Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König
Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot
Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu
Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu
Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov
Add open-coded iterator and selftest per Alexei Starovoitov
Add a second test buffer from the system dmabuf heap to selftests
Use the BPF program we'll use in production for selftest per Alexei Starovoitov
https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.chttps://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm…
T.J. Mercier (6):
dma-buf: Rename and expose debugfs symbols
bpf: Add dmabuf iterator
bpf: Add open coded dmabuf iterator
selftests/bpf: Add test for dmabuf_iter
selftests/bpf: Add test for open coded dmabuf_iter
RFC: dma-buf: Remove DMA-BUF statistics
.../ABI/testing/sysfs-kernel-dmabuf-buffers | 24 --
Documentation/driver-api/dma-buf.rst | 5 -
drivers/dma-buf/Kconfig | 15 -
drivers/dma-buf/Makefile | 1 -
drivers/dma-buf/dma-buf-sysfs-stats.c | 202 --------------
drivers/dma-buf/dma-buf-sysfs-stats.h | 35 ---
drivers/dma-buf/dma-buf.c | 58 +---
include/linux/dma-buf.h | 6 +-
kernel/bpf/Makefile | 3 +
kernel/bpf/dmabuf_iter.c | 177 ++++++++++++
kernel/bpf/helpers.c | 5 +
.../testing/selftests/bpf/bpf_experimental.h | 5 +
tools/testing/selftests/bpf/config | 3 +
.../selftests/bpf/prog_tests/dmabuf_iter.c | 258 ++++++++++++++++++
.../testing/selftests/bpf/progs/dmabuf_iter.c | 91 ++++++
15 files changed, 561 insertions(+), 327 deletions(-)
delete mode 100644 Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.c
delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.h
create mode 100644 kernel/bpf/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.49.0.906.g1f30a19c02-goog
From: Rob Clark <robdclark(a)chromium.org>
Conversion to DRM GPU VA Manager[1], and adding support for Vulkan Sparse
Memory[2] in the form of:
1. A new VM_BIND submitqueue type for executing VM MSM_SUBMIT_BO_OP_MAP/
MAP_NULL/UNMAP commands
2. A new VM_BIND ioctl to allow submitting batches of one or more
MAP/MAP_NULL/UNMAP commands to a VM_BIND submitqueue
I did not implement support for synchronous VM_BIND commands. Since
userspace could just immediately wait for the `SUBMIT` to complete, I don't
think we need this extra complexity in the kernel. Synchronous/immediate
VM_BIND operations could be implemented with a 2nd VM_BIND submitqueue.
The corresponding mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533
Changes in v4:
- Replace selftests_running flag with IO_PGTABLE_QUIRK_NO_WARN_ON [Robin
Murphy]
- Rework msm_gem_vm_sm_step_remap() for cases that orig_vma is evicted
to solve some crashes
- Block when drm_file is closed until pending VM_BIND ops complete, before
tearing down the VM's scheduler, to solve some memory leaks.
- Link to v3: https://lore.kernel.org/all/20250428205619.227835-1-robdclark@gmail.com/
Changes in v3:
- Switched to separate VM_BIND ioctl. This makes the UABI a bit
cleaner, but OTOH the userspace code was cleaner when the end result
of either type of VkQueue lead to the same ioctl. So I'm a bit on
the fence.
- Switched to doing the gpuvm bookkeeping synchronously, and only
deferring the pgtable updates. This avoids needing to hold any resv
locks in the fence signaling path, resolving the last shrinker related
lockdep complaints. OTOH it means userspace can trigger invalid
pgtable updates with multiple VM_BIND queues. In this case, we ensure
that unmaps happen completely (to prevent userspace from using this to
access free'd pages), mark the context as unusable, and move on with
life.
- Link to v2: https://lore.kernel.org/all/20250319145425.51935-1-robdclark@gmail.com/
Changes in v2:
- Dropped Bibek Kumar Patro's arm-smmu patches[3], which have since been
merged.
- Pre-allocate all the things, and drop HACK patch which disabled shrinker.
This includes ensuring that vm_bo objects are allocated up front, pre-
allocating VMA objects, and pre-allocating pages used for pgtable updates.
The latter utilizes io_pgtable_cfg callbacks for pgtable alloc/free, that
were initially added for panthor.
- Add back support for BO dumping for devcoredump.
- Link to v1 (RFC): https://lore.kernel.org/dri-devel/20241207161651.410556-1-robdclark@gmail.c…
[1] https://www.kernel.org/doc/html/next/gpu/drm-mm.html#drm-gpuvm
[2] https://docs.vulkan.org/spec/latest/chapters/sparsemem.html
[3] https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=909700
Rob Clark (33):
drm/gpuvm: Don't require obj lock in destructor path
drm/gpuvm: Allow VAs to hold soft reference to BOs
iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()
drm/msm: Rename msm_file_private -> msm_context
drm/msm: Improve msm_context comments
drm/msm: Rename msm_gem_address_space -> msm_gem_vm
drm/msm: Remove vram carveout support
drm/msm: Collapse vma allocation and initialization
drm/msm: Collapse vma close and delete
drm/msm: Don't close VMAs on purge
drm/msm: drm_gpuvm conversion
drm/msm: Convert vm locking
drm/msm: Use drm_gpuvm types more
drm/msm: Split out helper to get iommu prot flags
drm/msm: Add mmu support for non-zero offset
drm/msm: Add PRR support
drm/msm: Rename msm_gem_vma_purge() -> _unmap()
drm/msm: Lazily create context VM
drm/msm: Add opt-in for VM_BIND
drm/msm: Mark VM as unusable on GPU hangs
drm/msm: Add _NO_SHARE flag
drm/msm: Crashdump prep for sparse mappings
drm/msm: rd dumping prep for sparse mappings
drm/msm: Crashdec support for sparse
drm/msm: rd dumping support for sparse
drm/msm: Extract out syncobj helpers
drm/msm: Use DMA_RESV_USAGE_BOOKKEEP/KERNEL
drm/msm: Add VM_BIND submitqueue
drm/msm: Support IO_PGTABLE_QUIRK_NO_WARN_ON
drm/msm: Support pgtable preallocation
drm/msm: Split out map/unmap ops
drm/msm: Add VM_BIND ioctl
drm/msm: Bump UAPI version
drivers/gpu/drm/drm_gpuvm.c | 15 +-
drivers/gpu/drm/msm/Kconfig | 1 +
drivers/gpu/drm/msm/Makefile | 1 +
drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 25 +-
drivers/gpu/drm/msm/adreno/a2xx_gpummu.c | 5 +-
drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 17 +-
drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 17 +-
drivers/gpu/drm/msm/adreno/a5xx_debugfs.c | 4 +-
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 22 +-
drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +-
drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 10 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 32 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 49 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 6 +-
drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 10 +-
drivers/gpu/drm/msm/adreno/adreno_device.c | 4 -
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 88 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 23 +-
.../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c | 14 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c | 18 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_formats.h | 2 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 18 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 14 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_plane.h | 4 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 6 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 28 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 12 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 4 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 19 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 12 +-
drivers/gpu/drm/msm/dsi/dsi_host.c | 14 +-
drivers/gpu/drm/msm/msm_drv.c | 183 +--
drivers/gpu/drm/msm/msm_drv.h | 35 +-
drivers/gpu/drm/msm/msm_fb.c | 18 +-
drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 489 +++----
drivers/gpu/drm/msm/msm_gem.h | 217 ++-
drivers/gpu/drm/msm/msm_gem_prime.c | 15 +
drivers/gpu/drm/msm/msm_gem_shrinker.c | 4 +-
drivers/gpu/drm/msm/msm_gem_submit.c | 295 ++--
drivers/gpu/drm/msm/msm_gem_vma.c | 1288 +++++++++++++++--
drivers/gpu/drm/msm/msm_gpu.c | 171 ++-
drivers/gpu/drm/msm/msm_gpu.h | 132 +-
drivers/gpu/drm/msm/msm_iommu.c | 298 +++-
drivers/gpu/drm/msm/msm_kms.c | 18 +-
drivers/gpu/drm/msm/msm_kms.h | 2 +-
drivers/gpu/drm/msm/msm_mmu.h | 38 +-
drivers/gpu/drm/msm/msm_rd.c | 62 +-
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 +-
drivers/gpu/drm/msm/msm_submitqueue.c | 96 +-
drivers/gpu/drm/msm/msm_syncobj.c | 172 +++
drivers/gpu/drm/msm/msm_syncobj.h | 37 +
drivers/iommu/io-pgtable-arm.c | 27 +-
include/drm/drm_gpuvm.h | 12 +-
include/linux/io-pgtable.h | 8 +
include/uapi/drm/msm_drm.h | 149 +-
57 files changed, 3043 insertions(+), 1227 deletions(-)
create mode 100644 drivers/gpu/drm/msm/msm_syncobj.c
create mode 100644 drivers/gpu/drm/msm/msm_syncobj.h
--
2.49.0
Hi,
kernel test robot noticed the following build warnings:
[auto build test WARNING on jic23-iio/togreg]
[also build test WARNING on char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus usb/usb-testing usb/usb-next usb/usb-linus xen-tip/linux-next linus/master v6.15-rc4]
[cannot apply to tegra/for-next drm-xe/drm-xe-next rmk-arm/drm-armada-devel rmk-arm/drm-armada-fixes next-20250501]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/oushixiong1025-163-com/drm-p…
base: https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
patch link: https://lore.kernel.org/r/20250430085658.540746-2-oushixiong1025%40163.com
patch subject: [PATCH 2/3] drm/prime: Support importing DMA-BUF without sg_table
config: arc-randconfig-002-20250501 (https://download.01.org/0day-ci/archive/20250502/202505022224.FCDQ8TCB-lkp@…)
compiler: arc-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250502/202505022224.FCDQ8TCB-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505022224.FCDQ8TCB-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/gpu/drm/drm_prime.c:925:24: warning: no previous prototype for 'drm_gem_prime_import_dev_skip_map' [-Wmissing-prototypes]
925 | struct drm_gem_object *drm_gem_prime_import_dev_skip_map(struct drm_device *dev,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/drm_gem_prime_import_dev_skip_map +925 drivers/gpu/drm/drm_prime.c
913
914 /**
915 * drm_gem_prime_import_dev_skip_map - core implementation of the import callback
916 * @dev: drm_device to import into
917 * @dma_buf: dma-buf object to import
918 * @attach_dev: struct device to dma_buf attach
919 *
920 * This function exports a dma-buf without get it's scatter/gather table.
921 *
922 * Drivers who need to get an scatter/gather table for objects need to call
923 * drm_gem_prime_import_dev() instead.
924 */
> 925 struct drm_gem_object *drm_gem_prime_import_dev_skip_map(struct drm_device *dev,
926 struct dma_buf *dma_buf,
927 struct device *attach_dev)
928 {
929 struct dma_buf_attachment *attach;
930 struct drm_gem_object *obj;
931 int ret;
932
933 if (dma_buf->ops == &drm_gem_prime_dmabuf_ops) {
934 obj = dma_buf->priv;
935 if (obj->dev == dev) {
936 /*
937 * Importing dmabuf exported from our own gem increases
938 * refcount on gem itself instead of f_count of dmabuf.
939 */
940 drm_gem_object_get(obj);
941 return obj;
942 }
943 }
944
945 attach = dma_buf_attach(dma_buf, attach_dev, true);
946 if (IS_ERR(attach))
947 return ERR_CAST(attach);
948
949 get_dma_buf(dma_buf);
950
951 obj = dev->driver->gem_prime_import_attachment(dev, attach);
952 if (IS_ERR(obj)) {
953 ret = PTR_ERR(obj);
954 goto fail_detach;
955 }
956
957 obj->import_attach = attach;
958 obj->resv = dma_buf->resv;
959
960 return obj;
961
962 fail_detach:
963 dma_buf_detach(dma_buf, attach);
964 dma_buf_put(dma_buf);
965
966 return ERR_PTR(ret);
967 }
968 EXPORT_SYMBOL(drm_gem_prime_import_dev_skip_map);
969
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi,
kernel test robot noticed the following build warnings:
[auto build test WARNING on jic23-iio/togreg]
[also build test WARNING on char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus usb/usb-testing usb/usb-next usb/usb-linus xen-tip/linux-next linus/master v6.15-rc4]
[cannot apply to tegra/for-next drm-xe/drm-xe-next rmk-arm/drm-armada-devel rmk-arm/drm-armada-fixes next-20250430]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/oushixiong1025-163-com/drm-p…
base: https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
patch link: https://lore.kernel.org/r/20250430085658.540746-2-oushixiong1025%40163.com
patch subject: [PATCH 2/3] drm/prime: Support importing DMA-BUF without sg_table
config: arm64-randconfig-003-20250501 (https://download.01.org/0day-ci/archive/20250501/202505011655.qTmh4UA7-lkp@…)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250501/202505011655.qTmh4UA7-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505011655.qTmh4UA7-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/gpu/drm/drm_prime.c:925:24: warning: no previous prototype for function 'drm_gem_prime_import_dev_skip_map' [-Wmissing-prototypes]
925 | struct drm_gem_object *drm_gem_prime_import_dev_skip_map(struct drm_device *dev,
| ^
drivers/gpu/drm/drm_prime.c:925:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
925 | struct drm_gem_object *drm_gem_prime_import_dev_skip_map(struct drm_device *dev,
| ^
| static
1 warning generated.
vim +/drm_gem_prime_import_dev_skip_map +925 drivers/gpu/drm/drm_prime.c
913
914 /**
915 * drm_gem_prime_import_dev_skip_map - core implementation of the import callback
916 * @dev: drm_device to import into
917 * @dma_buf: dma-buf object to import
918 * @attach_dev: struct device to dma_buf attach
919 *
920 * This function exports a dma-buf without get it's scatter/gather table.
921 *
922 * Drivers who need to get an scatter/gather table for objects need to call
923 * drm_gem_prime_import_dev() instead.
924 */
> 925 struct drm_gem_object *drm_gem_prime_import_dev_skip_map(struct drm_device *dev,
926 struct dma_buf *dma_buf,
927 struct device *attach_dev)
928 {
929 struct dma_buf_attachment *attach;
930 struct drm_gem_object *obj;
931 int ret;
932
933 if (dma_buf->ops == &drm_gem_prime_dmabuf_ops) {
934 obj = dma_buf->priv;
935 if (obj->dev == dev) {
936 /*
937 * Importing dmabuf exported from our own gem increases
938 * refcount on gem itself instead of f_count of dmabuf.
939 */
940 drm_gem_object_get(obj);
941 return obj;
942 }
943 }
944
945 attach = dma_buf_attach(dma_buf, attach_dev, true);
946 if (IS_ERR(attach))
947 return ERR_CAST(attach);
948
949 get_dma_buf(dma_buf);
950
951 obj = dev->driver->gem_prime_import_attachment(dev, attach);
952 if (IS_ERR(obj)) {
953 ret = PTR_ERR(obj);
954 goto fail_detach;
955 }
956
957 obj->import_attach = attach;
958 obj->resv = dma_buf->resv;
959
960 return obj;
961
962 fail_detach:
963 dma_buf_detach(dma_buf, attach);
964 dma_buf_put(dma_buf);
965
966 return ERR_PTR(ret);
967 }
968 EXPORT_SYMBOL(drm_gem_prime_import_dev_skip_map);
969
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki