On Sun, 09 Nov 2025 23:37:57 +0200 Roger Quadros wrote:
> In am65_cpsw_nuss_rx_poll() there is a possibility that irq_disabled flag
> is cleared but the IRQ is not enabled.
>
> This patch fixes by that by clearing irq_disabled flag right when enabling
> the irq.
>
> Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
> Signed-off-by: Roger Quadros <rogerq(a)kernel.org>
This looks independent from the series, it needs to go to net.
--
pw-bot: cr
On Sun, 09 Nov 2025 23:37:50 +0200 Roger Quadros wrote:
> This series adds AF_XDP zero coppy support to am65-cpsw driver.
>
> Tests were performed on AM62x-sk with xdpsock application [1].
>
> A clear improvement is seen in 64 byte packets on Transmit (txonly)
> and receive (rxdrop).
> 1500 byte test seems to be limited by line rate (1G link) so no
> improvement seen there in packet rate. A test on higher speed link
> (or PHY-less setup) might be worthwile.
>
> There is some issue during l2fwd with 64 byte packets and benchmark
> results show 0. This issue needs to be debugged further.
> A 512 byte l2fwd test result has been added to compare instead.
It appears that the drivers/net/ethernet/ti/am65-* files do not fall
under any MAINTAINERS entry. Please add one or extend the existing CPSW
entry as the first patch of the series.
On 11/10/25 21:42, Alex Williamson wrote:
> On Thu, 6 Nov 2025 16:16:45 +0200
> Leon Romanovsky <leon(a)kernel.org> wrote:
>
>> Changelog:
>> v7:
>> * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
>> to reverse loop.
>> * Fixed spelling errors in documentation patch.
>> * Rebased on top of v6.18-rc3.
>> * Added include to stddef.h to vfio.h, to keep uapi header file independent.
>
> I think we're winding down on review comments. It'd be great to get
> p2pdma and dma-buf acks on this series. Otherwise it's been posted
> enough that we'll assume no objections. Thanks,
Already have it on my TODO list to take a closer look, but no idea when that will be.
This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.
Christian.
>
> Alex
On (25/11/10 19:40), Andy Shevchenko wrote:
[..]
> + dev_dbg(smi_info->io.dev, "**%s: %ptSp\n", msg, &t);
Strictly speaking, this is not exactly equivalent to %lld.%9.9ld
or %lld.%6.6ld but I don't know if that's of any importance.
On Mon, Nov 10, 2025 at 01:05:34PM -0700, Alex Williamson wrote:
> On Thu, 6 Nov 2025 16:16:56 +0200
> Leon Romanovsky <leon(a)kernel.org> wrote:
>
> > From: Jason Gunthorpe <jgg(a)nvidia.com>
> >
> > Call vfio_pci_core_fill_phys_vec() with the proper physical ranges for the
> > synthetic BAR 2 and BAR 4 regions. Otherwise use the normal flow based on
> > the PCI bar.
> >
> > This demonstrates a DMABUF that follows the region info report to only
> > allow mapping parts of the region that are mmapable. Since the BAR is
> > power of two sized and the "CXL" region is just page aligned the there can
> > be a padding region at the end that is not mmaped or passed into the
> > DMABUF.
> >
> > The "CXL" ranges that are remapped into BAR 2 and BAR 4 areas are not PCI
> > MMIO, they actually run over the CXL-like coherent interconnect and for
> > the purposes of DMA behave identically to DRAM. We don't try to model this
> > distinction between true PCI BAR memory that takes a real PCI path and the
> > "CXL" memory that takes a different path in the p2p framework for now.
> >
> > Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
> > Tested-by: Alex Mastro <amastro(a)fb.com>
> > Tested-by: Nicolin Chen <nicolinc(a)nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
> > ---
> > drivers/vfio/pci/nvgrace-gpu/main.c | 56 +++++++++++++++++++++++++++++++++++++
> > 1 file changed, 56 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> > index e346392b72f6..7d7ab2c84018 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> > @@ -7,6 +7,7 @@
> > #include <linux/vfio_pci_core.h>
> > #include <linux/delay.h>
> > #include <linux/jiffies.h>
> > +#include <linux/pci-p2pdma.h>
> >
> > /*
> > * The device memory usable to the workloads running in the VM is cached
> > @@ -683,6 +684,54 @@ nvgrace_gpu_write(struct vfio_device *core_vdev,
> > return vfio_pci_core_write(core_vdev, buf, count, ppos);
> > }
> >
> > +static int nvgrace_get_dmabuf_phys(struct vfio_pci_core_device *core_vdev,
> > + struct p2pdma_provider **provider,
> > + unsigned int region_index,
> > + struct dma_buf_phys_vec *phys_vec,
> > + struct vfio_region_dma_range *dma_ranges,
> > + size_t nr_ranges)
> > +{
> > + struct nvgrace_gpu_pci_core_device *nvdev = container_of(
> > + core_vdev, struct nvgrace_gpu_pci_core_device, core_device);
> > + struct pci_dev *pdev = core_vdev->pdev;
> > +
> > + if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) {
> > + /*
> > + * The P2P properties of the non-BAR memory is the same as the
> > + * BAR memory, so just use the provider for index 0. Someday
> > + * when CXL gets P2P support we could create CXLish providers
> > + * for the non-BAR memory.
> > + */
> > + *provider = pcim_p2pdma_provider(pdev, 0);
> > + if (!*provider)
> > + return -EINVAL;
> > + return vfio_pci_core_fill_phys_vec(phys_vec, dma_ranges,
> > + nr_ranges,
> > + nvdev->resmem.memphys,
> > + nvdev->resmem.memlength);
> > + } else if (region_index == USEMEM_REGION_INDEX) {
> > + /*
> > + * This is actually cachable memory and isn't treated as P2P in
> > + * the chip. For now we have no way to push cachable memory
> > + * through everything and the Grace HW doesn't care what caching
> > + * attribute is programmed into the SMMU. So use BAR 0.
> > + */
> > + *provider = pcim_p2pdma_provider(pdev, 0);
> > + if (!*provider)
> > + return -EINVAL;
> > + return vfio_pci_core_fill_phys_vec(phys_vec, dma_ranges,
> > + nr_ranges,
> > + nvdev->usemem.memphys,
> > + nvdev->usemem.memlength);
> > + }
> > + return vfio_pci_core_get_dmabuf_phys(core_vdev, provider, region_index,
> > + phys_vec, dma_ranges, nr_ranges);
> > +}
>
>
> Unless my eyes deceive, we could reduce the redundancy a bit:
>
> struct mem_region *mem_region = NULL;
>
> if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) {
> /*
> * The P2P properties of the non-BAR memory is the same as the
> * BAR memory, so just use the provider for index 0. Someday
> * when CXL gets P2P support we could create CXLish providers
> * for the non-BAR memory.
> */
> mem_region = &nvdev->resmem;
> } else if (region_index == USEMEM_REGION_INDEX) {
> /*
> * This is actually cachable memory and isn't treated as P2P in
> * the chip. For now we have no way to push cachable memory
> * through everything and the Grace HW doesn't care what caching
> * attribute is programmed into the SMMU. So use BAR 0.
> */
> mem_region = &nvdev->usemem;
> }
>
> if (mem_region) {
> *provider = pcim_p2pdma_provider(pdev, 0);
> if (!*provider)
> return -EINVAL;
> return vfio_pci_core_fill_phys_vec(phys_vec, dma_ranges,
> nr_ranges,
> mem_region->memphys,
> mem_region->memlength);
> }
>
> return vfio_pci_core_get_dmabuf_phys(core_vdev, provider, region_index,
> phys_vec, dma_ranges, nr_ranges);
Yes, this will work too.
Thanks
>
> Thanks,
> Alex
On Mon, 10 Nov 2025 19:40:42 +0100
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> wrote:
> Use %ptSp instead of open coded variants to print content of
> struct timespec64 in human readable format.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
> ---
> kernel/trace/trace_output.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
Acked-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
-- Steve
Changelog:
v7:
* Dropped restore_revoke flag and added vfio_pci_dma_buf_move
to reverse loop.
* Fixed spelling errors in documentation patch.
* Rebased on top of v6.18-rc3.
* Added include to stddef.h to vfio.h, to keep uapi header file independent.
v6: https://patch.msgid.link/20251102-dmabuf-vfio-v6-0-d773cff0db9f@nvidia.com
* Fixed wrong error check from pcim_p2pdma_init().
* Documented pcim_p2pdma_provider() function.
* Improved commit messages.
* Added VFIO DMA-BUF selftest, not sent yet.
* Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
* Fixed error unwind when dma_buf_fd() fails.
* Document latest changes to p2pmem.
* Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
* Moved DMA mapping logic to DMA-BUF.
* Removed types patch to avoid dependencies between subsystems.
* Moved vfio_pci_dma_buf_move() in err_undo block.
* Added nvgrace patch.
v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org
* Rebased on top of v6.18-rc1.
* Added more validation logic to make sure that DMA-BUF length doesn't
overflow in various scenarios.
* Hide kernel config from the users.
* Fixed type conversion issue. DMA ranges are exposed with u64 length,
but DMA-BUF uses "unsigned int" as a length for SG entries.
* Added check to prevent from VFIO drivers which reports BAR size
different from PCI, do not use DMA-BUF functionality.
v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org
* Split pcim_p2pdma_provider() to two functions, one that initializes
array of providers and another to return right provider pointer.
v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org
* Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
* Cache provider in vfio_pci_dma_buf struct instead of BAR index.
* Removed misleading comment from pcim_p2pdma_provider().
* Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
* Added extra patch which adds new CONFIG, so next patches can reuse
* it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole
* device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
---
Jason Gunthorpe (2):
PCI/P2PDMA: Document DMABUF model
vfio/nvgrace: Support get_dmabuf_phys
Leon Romanovsky (7):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
dma-buf: provide phys_vec to scatter-gather mapping routine
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature functions
Documentation/driver-api/pci/p2pdma.rst | 95 +++++++---
block/blk-mq-dma.c | 2 +-
drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 182 +++++++++++++-----
drivers/vfio/pci/Kconfig | 3 +
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++
drivers/vfio/pci/vfio_pci.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 22 ++-
drivers/vfio/pci/vfio_pci_core.c | 53 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 +++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 18 ++
include/linux/pci-p2pdma.h | 120 +++++++-----
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 42 +++++
include/uapi/linux/vfio.h | 28 +++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
21 files changed, 1074 insertions(+), 140 deletions(-)
---
base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
change-id: 20251016-dmabuf-vfio-6cef732adf5a
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
This series is the start of adding full DMABUF support to
iommufd. Currently it is limited to only work with VFIO's DMABUF exporter.
It sits on top of Leon's series to add a DMABUF exporter to VFIO:
https://lore.kernel.org/r/20251106-dmabuf-vfio-v7-0-2503bf390699@nvidia.com
The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but
otherwise works the same as it does today for a memfd. The user can select
a slice of the FD to map into the ioas and if the underliyng alignment
requirements are met it will be placed in the iommu_domain.
Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR
memory from VFIO to an iommu_domain controlled by iommufd. This is used
for PCI Peer to Peer support in VMs, and is the last feature that the VFIO
type 1 container has that iommufd couldn't do.
The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime
control and is a use-after-free security problem.
Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there
should be no access to the MMIO it can shoot down the mapping in iommufd
which will unmap it from the iommu_domain. There is no automatic remap,
this is a safety protocol so the kernel doesn't get stuck. Userspace is
expected to know it is doing something that will revoke the dmabuf and
map/unmap it around the activity. Eg when QEMU goes to issue FLR it should
do the map/unmap to iommufd.
Since DMABUF is missing some key general features for this use case it
relies on a "private interconnect" between VFIO and iommufd via the
vfio_pci_dma_buf_iommufd_map() call.
The call confirms the DMABUF has revoke semantics and delivers a phys_addr
for the memory suitable for use with iommu_map().
Medium term there is a desire to expand the supported DMABUFs to include
GPU drivers to support DPDK/SPDK type use cases so future series will work
to add a general concept of revoke and a general negotiation of
interconnect to remove vfio_pci_dma_buf_iommufd_map().
I also plan another series to modify iommufd's vfio_compat to
transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI
of type1.
The latest series for interconnect negotation to exchange a phys_addr is:
https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com
And the discussion for design of revoke is here:
https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf
v2:
- Rebase on Leon's v7
- Fix mislocking in an iopt_fill_domain() error path
v1: https://patch.msgid.link/r/0-v1-64bed2430cdb+31b-iommufd_dmabuf_jgg@nvidia.…
Jason Gunthorpe (9):
vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
iommufd: Add DMABUF to iopt_pages
iommufd: Do not map/unmap revoked DMABUFs
iommufd: Allow a DMABUF to be revoked
iommufd: Allow MMIO pages in a batch
iommufd: Have pfn_reader process DMABUF iopt_pages
iommufd: Have iopt_map_file_pages convert the fd to a file
iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
iommufd/selftest: Add some tests for the dmabuf flow
drivers/iommu/iommufd/io_pagetable.c | 78 +++-
drivers/iommu/iommufd/io_pagetable.h | 53 ++-
drivers/iommu/iommufd/ioas.c | 8 +-
drivers/iommu/iommufd/iommufd_private.h | 14 +-
drivers/iommu/iommufd/iommufd_test.h | 10 +
drivers/iommu/iommufd/main.c | 10 +
drivers/iommu/iommufd/pages.c | 407 ++++++++++++++++--
drivers/iommu/iommufd/selftest.c | 142 ++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 34 ++
include/linux/vfio_pci_core.h | 4 +
tools/testing/selftests/iommu/iommufd.c | 43 ++
tools/testing/selftests/iommu/iommufd_utils.h | 44 ++
12 files changed, 781 insertions(+), 66 deletions(-)
base-commit: bb04e92c86b44b3e36532099b68de1e889acfee7
--
2.43.0