Hi
Am 03.07.25 um 19:23 schrieb Bert Karwatzki:
> Am Donnerstag, dem 03.07.2025 um 18:09 +0200 schrieb Thomas Zimmermann:
>> Hi,
>>
>> before I give up on the issue, could you please test the attached patch?
>>
>> Best regards
>> Thomas
>>
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Frankenstrasse 146, 90461 Nuernberg, Germany
>> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
>> HRB 36809 (AG Nuernberg)
> I applied the patch on top of next-20250703
>
> $ git log --oneline
> 18ee3ed3cb60 (HEAD -> drm_gem_object_handle_put) drm/amdgpu: Provide custom framebuffer destroy function
> 8d6c58332c7a (tag: next-20250703, origin/master, origin/HEAD, master) Add linux-next specific files for 20250703
>
> and it solves the issue for me (i.e. no warnings).
Great, thanks for testing. If nothing else, that's the minimal workaround.
Here's another patch, which should solve the problem for all drivers.
Could you please revert the old fix and apply the new one and test again?
Best regards
Thomas
>
> Bert Karwatzki
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
[Note: it would be really useful to Cc all relevant maintainers]
On Fri, Jun 27, 2025 at 04:10:27PM +0100, Pavel Begunkov wrote:
> This series implements it for read/write io_uring requests. The uAPI
> looks similar to normal registered buffers, the user will need to
> register a dmabuf in io_uring first and then use it as any other
> registered buffer. On registration the user also specifies a file
> to map the dmabuf for.
Just commenting from the in-kernel POV here, where the interface
feels wrong.
You can't just expose 'the DMA device' up file operations, because
there can be and often is more than one. Similarly stuffing a
dma_addr_t into an iovec is rather dangerous.
The model that should work much better is to have file operations
to attach to / detach from a dma_buf, and then have an iter that
specifies a dmabuf and offsets into. That way the code behind the
file operations can forward the attachment to all the needed
devices (including more/less while it remains attached to the file)
and can pick the right dma address for each device.
I also remember some discussion that new dma-buf importers should
use the dynamic imported model for long-term imports, but as I'm
everything but an expert in that area I'll let the dma-buf folks
speak.
Am Freitag, 6. Juni 2025, 08:28:23 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
> sides, as all buffers are expected to be accessed from both.
>
> v2:
> - Sync the IOMMUs for the other cores when mapping and unmapping.
>
> v3:
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
>
> v6:
> - Use mutexes guard (Markus Elfring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
> Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
> ---
> diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_gem.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */
> +
> +#include <drm/drm_device.h>
> +#include <drm/drm_utils.h>
> +#include <drm/rocket_accel.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +
> +#include "rocket_device.h"
> +#include "rocket_drv.h"
> +#include "rocket_gem.h"
> +
> +static void rocket_gem_bo_free(struct drm_gem_object *obj)
> +{
> + struct rocket_device *rdev = to_rocket_device(obj->dev);
> + struct rocket_gem_object *bo = to_rocket_bo(obj);
> + size_t unmapped;
> +
> + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
This should probably be
drm_WARN_ON(obj->dev, refcount_read(&bo->base.pages_use_count) > 1);
as pages_use_count is of type refcount_t since
commit 051b6646d36d ("drm/shmem-helper: Use refcount_t for pages_use_count")
Heiko
On Fri, Jun 6, 2025 at 1:29 AM Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote:
>
> Using the DRM GPU scheduler infrastructure, with a scheduler for each
> core.
>
> Userspace can decide for a series of tasks to be executed sequentially
> in the same core, so SRAM locality can be taken advantage of.
>
> The job submission code was initially based on Panfrost.
>
> v2:
> - Remove hardcoded number of cores
> - Misc. style fixes (Jeffrey Hugo)
> - Repack IOCTL struct (Jeffrey Hugo)
>
> v3:
> - Adapt to a split of the register block in the DT bindings (Nicolas
> Frattaroli)
> - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
> - Use drm_* logging functions (Thomas Zimmermann)
> - Rename reg i/o macros (Thomas Zimmermann)
> - Add padding to ioctls and check for zero (Jeff Hugo)
> - Improve error handling (Nicolas Frattaroli)
>
> v6:
> - Use mutexes guard (Markus Elfring)
> - Use u64_to_user_ptr (Jeff Hugo)
> - Drop rocket_fence (Rob Herring)
>
> v7:
> - Assign its own IOMMU domain to each client, for isolation (Daniel
> Stone and Robin Murphy)
>
> Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
> ---
[...]
> --- a/include/uapi/drm/rocket_accel.h
> +++ b/include/uapi/drm/rocket_accel.h
> @@ -12,8 +12,10 @@ extern "C" {
> #endif
>
> #define DRM_ROCKET_CREATE_BO 0x00
> +#define DRM_ROCKET_SUBMIT 0x01
>
> #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
> +#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
>
> /**
> * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
> @@ -37,6 +39,68 @@ struct drm_rocket_create_bo {
> __u64 offset;
> };
>
> +/**
> + * struct drm_rocket_task - A task to be run on the NPU
> + *
> + * A task is the smallest unit of work that can be run on the NPU.
> + */
> +struct drm_rocket_task {
> + /** Input: DMA address to NPU mapping of register command buffer */
> + __u64 regcmd;
> +
> + /** Input: Number of commands in the register command buffer */
> + __u32 regcmd_count;
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
> +
> +/**
> + * struct drm_rocket_job - A job to be run on the NPU
> + *
> + * The kernel will schedule the execution of this job taking into account its
> + * dependencies with other jobs. All tasks in the same job will be executed
> + * sequentially on the same core, to benefit from memory residency in SRAM.
> + */
> +struct drm_rocket_job {
> + /** Input: Pointer to an array of struct drm_rocket_task. */
> + __u64 tasks;
> +
> + /** Input: Pointer to a u32 array of the BOs that are read by the job. */
> + __u64 in_bo_handles;
> +
> + /** Input: Pointer to a u32 array of the BOs that are written to by the job. */
> + __u64 out_bo_handles;
> +
> + /** Input: Number of tasks passed in. */
> + __u32 task_count;
> +
> + /** Input: Number of input BO handles passed in (size is that times 4). */
> + __u32 in_bo_handle_count;
> +
> + /** Input: Number of output BO handles passed in (size is that times 4). */
> + __u32 out_bo_handle_count;
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};
> +
> +/**
> + * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU.
> + *
> + * The kernel will schedule the execution of these jobs in dependency order.
> + */
> +struct drm_rocket_submit {
> + /** Input: Pointer to an array of struct drm_rocket_job. */
> + __u64 jobs;
> +
> + /** Input: Number of jobs passed in. */
> + __u32 job_count;
Isn't there a problem if you need to expand drm_rocket_job beyond
using the 1 reserved field? You can't add to the struct because then
you don't know the size here. So you have to modify drm_rocket_submit
to modify drm_rocket_job. Maybe better if you plan for that now rather
than later by making the size explicit.
Though etnaviv at least has similar issues.
Rob
> +
> + /** Reserved, must be zero. */
> + __u32 reserved;
> +};