On Fri, 2025-10-17 at 14:28 -0700, Matthew Brost wrote:
> On Fri, Oct 17, 2025 at 11:31:47AM +0200, Philipp Stanner wrote:
> > It seems that DMA_FENCE_FLAG_SEQNO64_BIT has no real effects anymore,
> > since seqno is a u64 everywhere.
> >
> > Remove the unneeded flag.
> >
> > Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
> > ---
> > Seems to me that this flag doesn't really do anything anymore?
> >
> > I *suspect* that it could be that some drivers pass a u32 to
> > dma_fence_init()? I guess they could be ported, couldn't they.
> >
>
> Xe uses 32-bit hardware fence sequence numbers—see [1] and [2]. We could
> switch to 64-bit hardware fence sequence numbers, but that would require
> changes on the driver side. If you sent this to our CI, I’m fairly
> certain we’d see a bunch of failures. I suspect this would also break
> several other drivers.
What exactly breaks? Help me out here; if you pass a u32 for a u64,
doesn't the C standard guarantee that the higher, unused 32 bits will
be 0?
Because the only thing the flag still does is do this lower_32 check in
fence_is_later.
P.
>
> As I mentioned, all Xe-supported platforms could be updated since their
> rings support 64-bit store instructions. However, I suspect that very
> old i915 platforms don’t support such instructions in the ring. I agree
> this is a legacy issue, and we should probably use 64-bit sequence
> numbers in Xe. But again, platforms and drivers that are decades old
> might break as a result.
>
> Matt
>
> [1] https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/xe/xe_hw_fe…
> [2] https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/xe/xe_hw_fe…
>
> > P.
> > ---
> > drivers/dma-buf/dma-fence.c | 3 +--
> > include/linux/dma-fence.h | 10 +---------
> > 2 files changed, 2 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 3f78c56b58dc..24794c027813 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -1078,8 +1078,7 @@ void
> > dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > spinlock_t *lock, u64 context, u64 seqno)
> > {
> > - __dma_fence_init(fence, ops, lock, context, seqno,
> > - BIT(DMA_FENCE_FLAG_SEQNO64_BIT));
> > + __dma_fence_init(fence, ops, lock, context, seqno, 0);
> > }
> > EXPORT_SYMBOL(dma_fence_init64);
> >
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 64639e104110..4eca2db28625 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -98,7 +98,6 @@ struct dma_fence {
> > };
> >
> > enum dma_fence_flag_bits {
> > - DMA_FENCE_FLAG_SEQNO64_BIT,
> > DMA_FENCE_FLAG_SIGNALED_BIT,
> > DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > @@ -470,14 +469,7 @@ dma_fence_is_signaled(struct dma_fence *fence)
> > */
> > static inline bool __dma_fence_is_later(struct dma_fence *fence, u64 f1, u64 f2)
> > {
> > - /* This is for backward compatibility with drivers which can only handle
> > - * 32bit sequence numbers. Use a 64bit compare when the driver says to
> > - * do so.
> > - */
> > - if (test_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags))
> > - return f1 > f2;
> > -
> > - return (int)(lower_32_bits(f1) - lower_32_bits(f2)) > 0;
> > + return f1 > f2;
> > }
> >
> > /**
> > --
> > 2.49.0
> >
From: Matthew Auld <matthew.auld(a)intel.com>
[ Upstream commit edb1745fc618ba8ef63a45ce3ae60de1bdf29231 ]
Since the dma-resv is shared we don't need to reserve and add a fence
slot fence twice, plus no need to loop through the dependencies.
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt(a)intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250829164715.720735-2-matthew.auld@intel.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
Explanation
- What it fixes
- Removes redundant dma-resv operations when a backup BO shares the
same reservation object as the original BO, preventing the same
fence from being reserved/added twice to the same `dma_resv`.
- Avoids scanning the same dependency set twice when source and
destination BOs share the same `dma_resv`.
- Why the change is correct
- The backup object is created to share the parent’s reservation
object, so a single reserve/add is sufficient:
- The backup BO is initialized with the parent’s resv:
`drivers/gpu/drm/xe/xe_bo.c:1309` (`xe_bo_init_locked(...,
bo->ttm.base.resv, ...)`), ensuring `bo->ttm.base.resv ==
backup->ttm.base.resv`.
- The patch adds an explicit invariant check to document and enforce
this: `drivers/gpu/drm/xe/xe_bo.c:1225` (`xe_assert(xe,
bo->ttm.base.resv == backup->ttm.base.resv)`).
- With shared `dma_resv`, adding the same fence twice is at best
redundant (wasting fence slots and memory) and at worst error-prone.
Reserving fence slots only once and adding the fence once is the
correct behavior.
- Specific code changes and effects
- Evict path (GPU migration copy case):
- Before: reserves and adds fence on both `bo->ttm.base.resv` and
`backup->ttm.base.resv`.
- After: reserves and adds exactly once, guarded by the shared-resv
assertion.
- See single reserve and add: `drivers/gpu/drm/xe/xe_bo.c:1226`
(reserve) and `drivers/gpu/drm/xe/xe_bo.c:1237` (add fence). This
is the core fix; the removed second reserve/add on the backup is
the redundant part eliminated.
- Restore path (migration copy back):
- Same simplification: reserve once, add once on the shared
`dma_resv`.
- See single reserve and add: `drivers/gpu/drm/xe/xe_bo.c:1375`
(reserve) and `drivers/gpu/drm/xe/xe_bo.c:1387` (add fence).
- Dependency handling in migrate:
- Before: added deps for both src and dst based only on `src_bo !=
dst_bo`.
- After: only add dst deps if the resv objects differ, avoiding
double-walking the same `dma_resv`.
- See updated condition: `drivers/gpu/drm/xe/xe_migrate.c:932`
(`src_bo->ttm.base.resv != dst_bo->ttm.base.resv`).
- User-visible impact without the patch
- Duplicate `dma_resv_add_fence()` calls on the same reservation
object can:
- Consume extra shared-fence slots and memory.
- Inflate dependency lists, causing unnecessary scheduler waits and
overhead.
- Increase failure likelihood of `dma_resv_reserve_fences()` under
memory pressure.
- These paths are exercised during suspend/resume flows of pinned VRAM
BOs (evict/restore), so reliability and performance in power
transitions can be affected.
- Scope and risk
- Small, focused changes localized to the Intel Xe driver
migration/evict/restore paths:
- Files: `drivers/gpu/drm/xe/xe_bo.c`,
`drivers/gpu/drm/xe/xe_migrate.c`.
- No API changes or architectural refactors; logic strictly reduces
redundant operations.
- The `xe_assert` acts as a safety net to catch unexpected non-shared
`resv` usage; normal runtime behavior is unchanged when the
invariant holds.
- The CPU copy fallback paths are untouched.
- Stable backport considerations
- This is a clear correctness and robustness fix, not a feature.
- Low regression risk if the stable branch also creates the backup BO
with the parent’s `dma_resv` (as shown by the use of
`xe_bo_init_locked(..., bo->ttm.base.resv, ...)` in
`drivers/gpu/drm/xe/xe_bo.c:1309`).
- If a stable branch diverges and the backup BO does not share the
resv, this patch would need adjustment (i.e., keep dual reserve/add
in that case). The added `xe_assert` helps surface such mismatches
during testing.
Conclusion: This commit fixes a real bug (duplicate fence reserve/add
and duplicate dependency scanning on a shared `dma_resv`) with a
minimal, well-scoped change. It aligns with stable rules (important
bugfix, low risk, contained), so it should be backported.
drivers/gpu/drm/xe/xe_bo.c | 13 +------------
drivers/gpu/drm/xe/xe_migrate.c | 2 +-
2 files changed, 2 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index d07e23eb1a54d..5a61441d68af5 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1242,14 +1242,11 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
else
migrate = mem_type_to_migrate(xe, bo->ttm.resource->mem_type);
+ xe_assert(xe, bo->ttm.base.resv == backup->ttm.base.resv);
ret = dma_resv_reserve_fences(bo->ttm.base.resv, 1);
if (ret)
goto out_backup;
- ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
- if (ret)
- goto out_backup;
-
fence = xe_migrate_copy(migrate, bo, backup, bo->ttm.resource,
backup->ttm.resource, false);
if (IS_ERR(fence)) {
@@ -1259,8 +1256,6 @@ int xe_bo_evict_pinned(struct xe_bo *bo)
dma_resv_add_fence(bo->ttm.base.resv, fence,
DMA_RESV_USAGE_KERNEL);
- dma_resv_add_fence(backup->ttm.base.resv, fence,
- DMA_RESV_USAGE_KERNEL);
dma_fence_put(fence);
} else {
ret = xe_bo_vmap(backup);
@@ -1338,10 +1333,6 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
if (ret)
goto out_unlock_bo;
- ret = dma_resv_reserve_fences(backup->ttm.base.resv, 1);
- if (ret)
- goto out_unlock_bo;
-
fence = xe_migrate_copy(migrate, backup, bo,
backup->ttm.resource, bo->ttm.resource,
false);
@@ -1352,8 +1343,6 @@ int xe_bo_restore_pinned(struct xe_bo *bo)
dma_resv_add_fence(bo->ttm.base.resv, fence,
DMA_RESV_USAGE_KERNEL);
- dma_resv_add_fence(backup->ttm.base.resv, fence,
- DMA_RESV_USAGE_KERNEL);
dma_fence_put(fence);
} else {
ret = xe_bo_vmap(backup);
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 2a627ed64b8f8..ba9b8590eccb2 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -901,7 +901,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
if (!fence) {
err = xe_sched_job_add_deps(job, src_bo->ttm.base.resv,
DMA_RESV_USAGE_BOOKKEEP);
- if (!err && src_bo != dst_bo)
+ if (!err && src_bo->ttm.base.resv != dst_bo->ttm.base.resv)
err = xe_sched_job_add_deps(job, dst_bo->ttm.base.resv,
DMA_RESV_USAGE_BOOKKEEP);
if (err)
--
2.51.0
For retrieving a pointer to the struct dma_resv for a given GEM object. We
also introduce it in a new trait, BaseObjectPrivate, which we automatically
implement for all gem objects and don't expose to users outside of the
crate.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
rust/kernel/drm/gem/mod.rs | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 32bff2e8463f4..67813cfb0db42 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -200,6 +200,18 @@ fn create_mmap_offset(&self) -> Result<u64> {
impl<T: IntoGEMObject> BaseObject for T {}
+/// Crate-private base operations shared by all GEM object classes.
+#[expect(unused)]
+pub(crate) trait BaseObjectPrivate: IntoGEMObject {
+ /// Return a pointer to this object's dma_resv.
+ fn raw_dma_resv(&self) -> *mut bindings::dma_resv {
+ // SAFETY: `as_gem_obj()` always returns a valid pointer to the base DRM gem object
+ unsafe { (*self.as_raw()).resv }
+ }
+}
+
+impl<T: IntoGEMObject> BaseObjectPrivate for T {}
+
/// A base GEM object.
///
/// Invariants
--
2.51.0
On Tue, 14 Oct 2025 16:26:06 +0530 Meghana Malladi wrote:
> This series adds AF_XDP zero coppy support to icssg driver.
>
> Tests were performed on AM64x-EVM with xdpsock application [1].
>
> A clear improvement is seen Transmit (txonly) and receive (rxdrop)
> for 64 byte packets. 1500 byte test seems to be limited by line
> rate (1G link) so no improvement seen there in packet rate
>
> Having some issue with l2fwd as the benchmarking numbers show 0
> for 64 byte packets after forwading first batch packets and I am
> currently looking into it.
This series stopped applying, could you please respin?
--
pw-bot: cr
On Sat, Oct 18, 2025 at 12:42:30AM -0700, Matthew Brost wrote:
> On Fri, Oct 17, 2025 at 11:43:51PM -0700, Matthew Brost wrote:
> > On Fri, Oct 17, 2025 at 10:37:46AM -0500, Rob Herring wrote:
> > > On Thu, Oct 16, 2025 at 11:25:34PM -0700, Matthew Brost wrote:
> > > > On Thu, Oct 16, 2025 at 04:06:05PM -0500, Rob Herring (Arm) wrote:
> > > > > Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> > > > > relatively simple interface with single command stream to describe
> > > > > buffers, operation settings, and network operations. It supports up to 8
> > > > > memory regions (though no h/w bounds on a region). The Ethos NPUs
> > > > > are designed to use an SRAM for scratch memory. Region 2 is reserved
> > > > > for SRAM (like the downstream driver stack and compiler). Userspace
> > > > > doesn't need access to the SRAM.
> > >
> > > Thanks for the review.
> > >
> > > [...]
> > >
> > > > > +static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
> > > > > +{
> > > > > + struct ethosu_job *job = to_ethosu_job(sched_job);
> > > > > + struct ethosu_device *dev = job->dev;
> > > > > + struct dma_fence *fence = NULL;
> > > > > + int ret;
> > > > > +
> > > > > + if (unlikely(job->base.s_fence->finished.error))
> > > > > + return NULL;
> > > > > +
> > > > > + fence = ethosu_fence_create(dev);
> > > >
> > > > Another reclaim issue: ethosu_fence_create allocates memory using
> > > > GFP_KERNEL. Since we're already in the DMA fence signaling path
> > > > (reclaim), this can lead to a deadlock.
> > > >
> > > > Without too much thought, you likely want to move this allocation to
> > > > ethosu_job_do_push, but before taking dev->sched_lock or calling
> > > > drm_sched_job_arm.
> > > >
> > > > We really should fix the DRM scheduler work queue to be tainted with
> > > > reclaim. If I recall correctly, we'd need to update the work queue
> > > > layer. Let me look into that—I've seen this type of bug several times,
> > > > and lockdep should be able to catch it.
> > >
> > > Likely the rocket driver suffers from the same issues...
> > >
> >
> > I am not surprised by this statement.
> >
> > > >
> > > > > + if (IS_ERR(fence))
> > > > > + return fence;
> > > > > +
> > > > > + if (job->done_fence)
> > > > > + dma_fence_put(job->done_fence);
> > > > > + job->done_fence = dma_fence_get(fence);
> > > > > +
> > > > > + ret = pm_runtime_get_sync(dev->base.dev);
> > > >
> > > > I haven't looked at your PM design, but this generally looks quite
> > > > dangerous with respect to reclaim. For example, if your PM resume paths
> > > > allocate memory or take locks that allocate memory underneath, you're
> > > > likely to run into issues.
> > > >
> > > > A better approach would be to attach a PM reference to your job upon
> > > > creation and release it upon job destruction. That would be safer and
> > > > save you headaches in the long run.
> > >
> > > Our PM is nothing more than clock enable/disable and register init.
> > >
> > > If the runtime PM API doesn't work and needs special driver wrappers,
> > > then I'm inclined to just not use it and manage clocks directly (as
> > > that's all it is doing).
> > >
> >
> > Yes, then you’re probably fine. More complex drivers can do all sorts of
> > things during a PM wake, which is why PM wakes should generally be the
> > outermost layer. I still suggest, to future-proof your code, that you
> > move the PM reference to an outer layer.
> >
>
> Also, taking a PM reference in a function call — as opposed to tying it
> to a object's lifetime — is risky. It can quickly lead to imbalances in
> PM references if things go sideways or function calls become unbalanced.
> Depending on how your driver uses the DRM scheduler, this seems like a
> real possibility.
>
> Matt
>
> > > >
> > > > This is what we do in Xe [1] [2].
> > > >
> > > > Also, in general, this driver has been reviewed (RB’d), but it's not
> > > > great that I spotted numerous issues within just five minutes. I suggest
> > > > taking a step back and thoroughly evaluating everything this driver is
> > > > doing.
> > >
> > > Well, if it is hard to get simple drivers right, then it's a problem
> > > with the subsystem APIs IMO.
> > >
> >
> > Yes, agreed. We should have assertions and lockdep annotations in place
> > to catch driver-side misuses. This is the second driver I’ve randomly
> > looked at over the past year that has broken DMA fencing and reclaim
> > rules. I’ll take an action item to fix this in the DRM scheduler, but
> > I’m afraid I’ll likely break multiple drivers in the process as misuess
> > / lockdep will complain.
I've posted a series [1] for the DRM scheduler which will complain about the
things I've pointed out here.
Matt
[1] https://patchwork.freedesktop.org/series/156284/
> >
> > Matt
> >
> > > Rob
dma-buf: improve dma_buf_show_fdinfo output
Improve the readability of /proc/<pid>/fdinfo output for DMA-BUF by
including file flags and ensuring consistent format specifiers for size
and other fields.
This patch also fixes incorrect format specifiers and removes references
to obsolete struct members (num_attachments and num_mappings) that no
longer exist in the DMA-BUF framework.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510220802.svbgdYsJ-lkp@intel.com/
---
drivers/dma-buf/dma-buf.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 1c0035601c4f..4541f8ec5d62 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -571,24 +571,22 @@ static long dma_buf_ioctl(struct file *file,
}
}
-static void dma_buf_show_fdinfo(struct seq_file *s, struct file *f)
+static void dma_buf_show_fdinfo(struct seq_file *s, struct file *file)
{
- struct dma_buf *dmabuf = f->private_data;
+ struct dma_buf *dmabuf;
- if (!dmabuf)
- return;
+ dmabuf = file->private_data;
+ if (!dmabuf)
+ return;
- seq_printf(s, "flags:\t%lu\n", f->f_flags);
- seq_printf(s, "size:\t%llu\n", dmabuf->size);
- seq_printf(s, "count:\t%ld\n", file_count(dmabuf->file) - 1);
- seq_printf(s, "attachments:\t%d\n", atomic_read(&dmabuf->num_attachments));
- seq_printf(s, "mappings:\t%d\n", atomic_read(&dmabuf->num_mappings));
- seq_printf(s, "exp_name:\t%s\n", dmabuf->exp_name ? dmabuf->exp_name : "N/A");
+ seq_printf(s, "size:\t%zu\n", dmabuf->size);
+ seq_printf(s, "count:\t%ld\n", file_count(dmabuf->file) - 1);
+ seq_printf(s, "exp_name:\t%s\n", dmabuf->exp_name ? dmabuf->exp_name : "N/A");
- spin_lock(&dmabuf->name_lock);
- if (dmabuf->name)
- seq_printf(s, "name:\t%s\n", dmabuf->name);
- spin_unlock(&dmabuf->name_lock);
+ spin_lock(&dmabuf->name_lock);
+ if (dmabuf->name)
+ seq_printf(s, "name:\t%s\n", dmabuf->name);
+ spin_unlock(&dmabuf->name_lock);
}
--
2.43.0
--
::DISCLAIMER::
---------------------------------------------------------------------
The
contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. Views or opinions, if any,
presented in
this email are solely those of the author and may not
necessarily reflect
the views or opinions of SSN Institutions (SSN) or its
affiliates. Any form
of reproduction, dissemination, copying, disclosure,
modification,
distribution and / or publication of this message without the
prior written
consent of authorized representative of SSN is strictly
prohibited. If you
have received this email in error please delete it and
notify the sender
immediately.
---------------------------------------------------------------------
Header of this mail should have a valid DKIM signature for the domain
ssn.edu.in <http://www.ssn.edu.in/>