On Thu, Oct 16, 2025 at 11:25:34PM -0700, Matthew Brost wrote:
> On Thu, Oct 16, 2025 at 04:06:05PM -0500, Rob Herring (Arm) wrote:
> > Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> > relatively simple interface with single command stream to describe
> > buffers, operation settings, and network operations. It supports up to 8
> > memory regions (though no h/w bounds on a region). The Ethos NPUs
> > are designed to use an SRAM for scratch memory. Region 2 is reserved
> > for SRAM (like the downstream driver stack and compiler). Userspace
> > doesn't need access to the SRAM.
Thanks for the review.
[...]
> > +static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
> > +{
> > + struct ethosu_job *job = to_ethosu_job(sched_job);
> > + struct ethosu_device *dev = job->dev;
> > + struct dma_fence *fence = NULL;
> > + int ret;
> > +
> > + if (unlikely(job->base.s_fence->finished.error))
> > + return NULL;
> > +
> > + fence = ethosu_fence_create(dev);
>
> Another reclaim issue: ethosu_fence_create allocates memory using
> GFP_KERNEL. Since we're already in the DMA fence signaling path
> (reclaim), this can lead to a deadlock.
>
> Without too much thought, you likely want to move this allocation to
> ethosu_job_do_push, but before taking dev->sched_lock or calling
> drm_sched_job_arm.
>
> We really should fix the DRM scheduler work queue to be tainted with
> reclaim. If I recall correctly, we'd need to update the work queue
> layer. Let me look into that—I've seen this type of bug several times,
> and lockdep should be able to catch it.
Likely the rocket driver suffers from the same issues...
>
> > + if (IS_ERR(fence))
> > + return fence;
> > +
> > + if (job->done_fence)
> > + dma_fence_put(job->done_fence);
> > + job->done_fence = dma_fence_get(fence);
> > +
> > + ret = pm_runtime_get_sync(dev->base.dev);
>
> I haven't looked at your PM design, but this generally looks quite
> dangerous with respect to reclaim. For example, if your PM resume paths
> allocate memory or take locks that allocate memory underneath, you're
> likely to run into issues.
>
> A better approach would be to attach a PM reference to your job upon
> creation and release it upon job destruction. That would be safer and
> save you headaches in the long run.
Our PM is nothing more than clock enable/disable and register init.
If the runtime PM API doesn't work and needs special driver wrappers,
then I'm inclined to just not use it and manage clocks directly (as
that's all it is doing).
>
> This is what we do in Xe [1] [2].
>
> Also, in general, this driver has been reviewed (RB’d), but it's not
> great that I spotted numerous issues within just five minutes. I suggest
> taking a step back and thoroughly evaluating everything this driver is
> doing.
Well, if it is hard to get simple drivers right, then it's a problem
with the subsystem APIs IMO.
Rob
For retrieving a pointer to the struct dma_resv for a given GEM object. We
also introduce it in a new trait, BaseObjectPrivate, which we automatically
implement for all gem objects and don't expose to users outside of the
crate.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
rust/kernel/drm/gem/mod.rs | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 981fbb931e952..760fcd61da0b7 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -199,6 +199,18 @@ fn create_mmap_offset(&self) -> Result<u64> {
impl<T: IntoGEMObject> BaseObject for T {}
+/// Crate-private base operations shared by all GEM object classes.
+#[expect(unused)]
+pub(crate) trait BaseObjectPrivate: IntoGEMObject {
+ /// Return a pointer to this object's dma_resv.
+ fn raw_dma_resv(&self) -> *mut bindings::dma_resv {
+ // SAFETY: `as_gem_obj()` always returns a valid pointer to the base DRM gem object
+ unsafe { (*self.as_raw()).resv }
+ }
+}
+
+impl<T: IntoGEMObject> BaseObjectPrivate for T {}
+
/// A base GEM object.
///
/// Invariants
--
2.51.0
The Arm Ethos-U65/85 NPUs are designed for edge AI inference
applications[0].
The driver works with Mesa Teflon. The Ethos support was merged on
10/15. The UAPI should also be compatible with the downstream (open
source) driver stack[2] and Vela compiler though that has not been
implemented.
Testing so far has been on i.MX93 boards with Ethos-U65 and a FVP model
with Ethos-U85. More work is needed in mesa for handling U85 command
stream differences, but that doesn't affect the UAPI.
A git tree is here[3].
Rob
[0] https://www.arm.com/products/silicon-ip-cpu?families=ethos%20npus
[2] https://gitlab.arm.com/artificial-intelligence/ethos-u/
[3] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git ethos-v5
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
---
Changes in v5:
- Rework Runtime PM init in probe
- Use __free() cleanups where possible
- Use devm_mutex_init()
- Handle U85 NPU_SET_WEIGHT2_BASE and NPU_SET_WEIGHT2_LENGTH
- Link to v4: https://lore.kernel.org/r/20251015-ethos-v4-0-81025a3dcbf3@kernel.org
Changes in v4:
- Use bulk clk API
- Various whitespace fixes mostly due to ethos->ethosu rename
- Drop error check on dma_set_mask_and_coherent()
- Drop unnecessary pm_runtime_mark_last_busy() call
- Move variable declarations out of switch (a riscv/clang build failure)
- Use lowercase hex in all defines
- Drop unused ethosu_device.coherent member
- Add comments on all locks
- Link to v3: https://lore.kernel.org/r/20250926-ethos-v3-0-6bd24373e4f5@kernel.org
Changes in v3:
- Rework and improve job submit validation
- Rename ethos to ethosu. There was an Ethos-Nxx that's unrelated.
- Add missing init for sched_lock mutex
- Drop some prints to debug level
- Fix i.MX93 SRAM accesses (AXI config)
- Add U85 AXI configuration and test on FVP with U85
- Print the current cmd value on timeout
- Link to v2: https://lore.kernel.org/r/20250811-ethos-v2-0-a219fc52a95b@kernel.org
Changes in v2:
- Rebase on v6.17-rc1 adapting to scheduler changes
- scheduler: Drop the reset workqueue. According to the scheduler docs,
we don't need it since we have a single h/w queue.
- scheduler: Rework the timeout handling to continue running if we are
making progress. Fixes timeouts on larger jobs.
- Reset the NPU on resume so it's in a known state
- Add error handling on clk_get() calls
- Fix drm_mm splat on module unload. We were missing a put on the
cmdstream BO in the scheduler clean-up.
- Fix 0-day report needing explicit bitfield.h include
- Link to v1: https://lore.kernel.org/r/20250722-ethos-v1-0-cc1c5a0cbbfb@kernel.org
---
Rob Herring (Arm) (2):
dt-bindings: npu: Add Arm Ethos-U65/U85
accel: Add Arm Ethos-U NPU driver
.../devicetree/bindings/npu/arm,ethos.yaml | 79 +++
MAINTAINERS | 9 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/ethosu/Kconfig | 10 +
drivers/accel/ethosu/Makefile | 4 +
drivers/accel/ethosu/ethosu_device.h | 195 ++++++
drivers/accel/ethosu/ethosu_drv.c | 403 ++++++++++++
drivers/accel/ethosu/ethosu_drv.h | 15 +
drivers/accel/ethosu/ethosu_gem.c | 704 +++++++++++++++++++++
drivers/accel/ethosu/ethosu_gem.h | 46 ++
drivers/accel/ethosu/ethosu_job.c | 540 ++++++++++++++++
drivers/accel/ethosu/ethosu_job.h | 41 ++
include/uapi/drm/ethosu_accel.h | 261 ++++++++
14 files changed, 2309 insertions(+)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
change-id: 20250715-ethos-3fdd39ef6f19
Best regards,
--
Rob Herring (Arm) <robh(a)kernel.org>
On Thu, 25 Sep 2025 17:30:33 +0530, Jyothi Kumar Seerapu wrote:
> The I2C driver gets an interrupt upon transfer completion.
> When handling multiple messages in a single transfer, this
> results in N interrupts for N messages, leading to significant
> software interrupt latency.
>
> To mitigate this latency, utilize Block Event Interrupt (BEI)
> mechanism. Enabling BEI instructs the hardware to prevent interrupt
> generation and BEI is disabled when an interrupt is necessary.
>
> [...]
Applied, thanks!
[1/2] dmaengine: qcom: gpi: Add GPI Block event interrupt support
commit: 4e8331317e73902e8b2663352c8766227e633901
[2/2] i2c: i2c-qcom-geni: Add Block event interrupt support
commit: 398035178503bf662281bbffb4bebce1460a4bc5
Best regards,
--
~Vinod
On Wed, Oct 15, 2025 at 4:02 PM Frank Li <Frank.li(a)nxp.com> wrote:
>
> On Wed, Oct 15, 2025 at 03:36:05PM -0500, Rob Herring wrote:
> > On Wed, Oct 15, 2025 at 2:39 PM Frank Li <Frank.li(a)nxp.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 12:47:40PM -0500, Rob Herring (Arm) wrote:
> > > > Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> > > > relatively simple interface with single command stream to describe
> > > > buffers, operation settings, and network operations. It supports up to 8
> > > > memory regions (though no h/w bounds on a region). The Ethos NPUs
> > > > are designed to use an SRAM for scratch memory. Region 2 is reserved
> > > > for SRAM (like the downstream driver stack and compiler). Userspace
> > > > doesn't need access to the SRAM.
> > > > +static int ethosu_init(struct ethosu_device *ethosudev)
> > > > +{
> > > > + int ret;
> > > > + u32 id, config;
> > > > +
> > > > + ret = devm_pm_runtime_enable(ethosudev->base.dev);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = pm_runtime_resume_and_get(ethosudev->base.dev);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + pm_runtime_set_autosuspend_delay(ethosudev->base.dev, 50);
> > > > + pm_runtime_use_autosuspend(ethosudev->base.dev);
> > > > +
> > > > + /* If PM is disabled, we need to call ethosu_device_resume() manually. */
> > > > + if (!IS_ENABLED(CONFIG_PM)) {
> > > > + ret = ethosu_device_resume(ethosudev->base.dev);
> > > > + if (ret)
> > > > + return ret;
> > > > + }
> > >
> > > I think it should call ethosu_device_resume() unconditional before
> > > devm_pm_runtime_enable();
> > >
> > > ethosu_device_resume();
> > > pm_runtime_set_active();
> > > pm_runtime_set_autosuspend_delay(ethosudev->base.dev, 50);
> > > devm_pm_runtime_enable();
> >
> > Why do you think this? Does this do a get?
> >
> > I don't think it is good to call the resume hook on our own, but we
> > have no choice with !CONFIG_PM. With CONFIG_PM, we should only use the
> > pm_runtime API.
>
> Enable clock and do some init work at probe() is quite common. But I never
> seen IS_ENABLED(CONFIG_PM) check. It is quite weird and not necessary to
> check CONFIG_PM flags. The most CONFIG_PM is enabled, so the branch !CONFIG_PM
> almost never tested.
Okay, I get what you meant.
>
> probe()
> {
> devm_clk_bulk_get_all_enabled();
>
> ... did some init work
>
> pm_runtime_set_active();
> devm_pm_runtime_enable();
>
> ...
> pm_runtime_put_autosuspend(ethosudev->base.dev);
> }
I think we still need a pm_runtime_get_noresume() in here since we do
a put later on. Here's what I have now:
ret = ethosu_device_resume(ethosudev->base.dev);
if (ret)
return ret;
pm_runtime_set_autosuspend_delay(ethosudev->base.dev, 50);
pm_runtime_use_autosuspend(ethosudev->base.dev);
ret = devm_pm_runtime_set_active_enabled(ethosudev->base.dev);
if (ret)
return ret;
pm_runtime_get_noresume(ethosudev->base.dev);
Rob
On 03-10-25, 20:50, Andi Shyti wrote:
> On Thu, Sep 25, 2025 at 05:30:35PM +0530, Jyothi Kumar Seerapu wrote:
> > From: Jyothi Kumar Seerapu <quic_jseerapu(a)quicinc.com>
> >
> > The I2C driver gets an interrupt upon transfer completion.
> > When handling multiple messages in a single transfer, this
> > results in N interrupts for N messages, leading to significant
> > software interrupt latency.
> >
> > To mitigate this latency, utilize Block Event Interrupt (BEI)
> > mechanism. Enabling BEI instructs the hardware to prevent interrupt
> > generation and BEI is disabled when an interrupt is necessary.
> >
> > Large I2C transfer can be divided into chunks of messages internally.
> > Interrupts are not expected for the messages for which BEI bit set,
> > only the last message triggers an interrupt, indicating the completion of
> > N messages. This BEI mechanism enhances overall transfer efficiency.
> >
> > BEI optimizations are currently implemented for I2C write transfers only,
> > as there is no use case for multiple I2C read messages in a single transfer
> > at this time.
> >
> > Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu(a)quicinc.com>
>
> Because this series is touching multiple subsystems, I'm going to
> ack it:
>
> Acked-by: Andi Shyti <andi.shyti(a)kernel.org>
>
> We are waiting for someone from DMA to ack it (Vinod or Sinan).
Thanks, I will pick it with your ack
--
~Vinod
On Thu, Oct 16, 2025 at 02:09:12AM +0000, Kriish Sharma wrote:
> diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
> index a0979440d2a4..c0035dc257e0 100644
> --- a/Documentation/userspace-api/dma-buf-heaps.rst
> +++ b/Documentation/userspace-api/dma-buf-heaps.rst
> @@ -26,6 +26,7 @@ following heaps:
> ``DMABUF_HEAPS_CMA_LEGACY`` Kconfig option is set, a duplicate node is
> created following legacy naming conventions; the legacy name might be
> ``reserved``, ``linux,cma``, or ``default-pool``.
> +
> Naming Convention
> =================
>
LGTM, thanks!
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
--
An old man doll... just what I always wanted! - Clara
On Wed, Oct 15, 2025 at 09:09:53PM -0700, Nicolin Chen wrote:
> Hi Leon,
>
> On Mon, Oct 13, 2025 at 06:26:10PM +0300, Leon Romanovsky wrote:
> > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> > INIT_LIST_HEAD(&vdev->dummy_resources_list);
> > INIT_LIST_HEAD(&vdev->ioeventfds_list);
> > INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> > + ret = pcim_p2pdma_init(vdev->pdev);
> > + if (ret != -EOPNOTSUPP)
> > + return ret;
> > init_rwsem(&vdev->memory_lock);
> > xa_init(&vdev->ctx);
>
> I think this should be:
> if (ret && ret != -EOPNOTSUPP)
> return ret;
>
> Otherwise, init_rwsem() and xa_init() would be missed if ret==0.
You absolutely right.
Thanks
>
> Thanks
> Nicolin
>
On Wed, Oct 15, 2025 at 06:34:02PM +0000, Kriish Sharma wrote:
> Fixes: 1fdbb3ff1233 ("Add linux-next specific files for 20251015")
The correct blamed fixes should've been:
Fixes: 507211e3c7a1 ("Documentation: dma-buf: heaps: Add naming guidelines")
Thanks.
--
An old man doll... just what I always wanted! - Clara