Hi Jyothi,
kernel test robot noticed the following build warnings:
[auto build test WARNING on vkoul-dmaengine/next]
[also build test WARNING on andi-shyti/i2c/i2c-host linus/master v6.17-rc4 next-20250905]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jyothi-Kumar-Seerapu/dmaengi…
base: https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git next
patch link: https://lore.kernel.org/r/20250903073059.2151837-3-quic_jseerapu%40quicinc.…
patch subject: [PATCH v7 2/2] i2c: i2c-qcom-geni: Add Block event interrupt support
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20250906/202509062008.lSOdhd4U-lkp@…)
compiler: gcc-13 (Debian 13.3.0-16) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250906/202509062008.lSOdhd4U-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509062008.lSOdhd4U-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/i2c/busses/i2c-qcom-geni.c: In function 'geni_i2c_gpi_multi_desc_unmap':
>> drivers/i2c/busses/i2c-qcom-geni.c:576:42: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
576 | NULL, (dma_addr_t)NULL);
| ^
vim +576 drivers/i2c/busses/i2c-qcom-geni.c
555
556 /**
557 * geni_i2c_gpi_multi_desc_unmap() - Unmaps DMA buffers post multi message TX transfers
558 * @gi2c: I2C dev handle
559 * @msgs: Array of I2C messages
560 * @peripheral: Pointer to gpi_i2c_config
561 */
562 static void geni_i2c_gpi_multi_desc_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
563 struct gpi_i2c_config *peripheral)
564 {
565 u32 msg_xfer_cnt, wr_idx = 0;
566 struct geni_i2c_gpi_multi_desc_xfer *tx_multi_xfer = &gi2c->i2c_multi_desc_config;
567
568 msg_xfer_cnt = gi2c->err ? tx_multi_xfer->msg_idx_cnt : tx_multi_xfer->irq_cnt;
569
570 /* Unmap the processed DMA buffers based on the received interrupt count */
571 for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; tx_multi_xfer->unmap_msg_cnt++) {
572 wr_idx = tx_multi_xfer->unmap_msg_cnt;
573 geni_i2c_gpi_unmap(gi2c, &msgs[wr_idx],
574 tx_multi_xfer->dma_buf[wr_idx],
575 tx_multi_xfer->dma_addr[wr_idx],
> 576 NULL, (dma_addr_t)NULL);
577
578 if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs - 1) {
579 kfree(tx_multi_xfer->dma_buf);
580 kfree(tx_multi_xfer->dma_addr);
581 break;
582 }
583 }
584 }
585
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Thierry Reding <treding(a)nvidia.com>
Hi,
This series adds support for the video protection region (VPR) used on
Tegra SoC devices. It's a special region of memory that is protected
from accesses by the CPU and used to store DRM protected content (both
decrypted stream data as well as decoded video frames).
Patches 1 and 2 add DT binding documentation for the VPR and add the VPR
to the list of memory-region items for display and host1x.
Patch 3 introduces new APIs needed by the Tegra VPR implementation that
allow CMA areas to be dynamically created at runtime rather than using
the fixed, system-wide list. This is used in this driver specifically
because it can use an arbitrary number of these areas (though they are
currently limited to 4).
Patch 4 adds some infrastructure for DMA heap implementations to provide
information through debugfs.
The Tegra VPR implementation is added in patch 5. See its commit message
for more details about the specifics of this implementation.
Finally, patches 6-9 add the VPR placeholder node on Tegra234 and hook
it up to the host1x and GPU nodes so that they can make use of this
region.
Thierry
Thierry Reding (9):
dt-bindings: reserved-memory: Document Tegra VPR
dt-bindings: display: tegra: Document memory regions
mm/cma: Allow dynamically creating CMA areas
dma-buf: heaps: Add debugfs support
dma-buf: heaps: Add support for Tegra VPR
arm64: tegra: Add VPR placeholder node on Tegra234
arm64: tegra: Add GPU node on Tegra234
arm64: tegra: Hook up VPR to host1x
arm64: tegra: Hook up VPR to the GPU
.../display/tegra/nvidia,tegra186-dc.yaml | 10 +
.../display/tegra/nvidia,tegra20-dc.yaml | 10 +-
.../display/tegra/nvidia,tegra20-host1x.yaml | 7 +
.../nvidia,tegra-video-protection-region.yaml | 55 ++
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 57 ++
drivers/dma-buf/dma-heap.c | 56 ++
drivers/dma-buf/heaps/Kconfig | 7 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/tegra-vpr.c | 831 ++++++++++++++++++
include/linux/cma.h | 16 +
include/linux/dma-heap.h | 2 +
include/trace/events/tegra_vpr.h | 57 ++
mm/cma.c | 89 +-
13 files changed, 1175 insertions(+), 23 deletions(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.yaml
create mode 100644 drivers/dma-buf/heaps/tegra-vpr.c
create mode 100644 include/trace/events/tegra_vpr.h
--
2.50.0
On Wed, Sep 03, 2025 at 09:41:18AM -0700, Frank van der Linden wrote:
> On Wed, Sep 3, 2025 at 9:05 AM Thierry Reding <thierry.reding(a)gmail.com> wrote:
> >
> > On Tue, Sep 02, 2025 at 10:27:01AM -0700, Frank van der Linden wrote:
> > > On Tue, Sep 2, 2025 at 8:46 AM Thierry Reding <thierry.reding(a)gmail.com> wrote:
> > > >
> > > > From: Thierry Reding <treding(a)nvidia.com>
> > > >
> > > > There is no technical reason why there should be a limited number of CMA
> > > > regions, so extract some code into helpers and use them to create extra
> > > > functions (cma_create() and cma_free()) that allow creating and freeing,
> > > > respectively, CMA regions dynamically at runtime.
> > > >
> > > > Note that these dynamically created CMA areas are treated specially and
> > > > do not contribute to the number of total CMA pages so that this count
> > > > still only applies to the fixed number of CMA areas.
> > > >
> > > > Signed-off-by: Thierry Reding <treding(a)nvidia.com>
> > > > ---
> > > > include/linux/cma.h | 16 ++++++++
> > > > mm/cma.c | 89 ++++++++++++++++++++++++++++++++++-----------
> > > > 2 files changed, 83 insertions(+), 22 deletions(-)
> > [...]
> > > I agree that supporting dynamic CMA areas would be good. However, by
> > > doing it like this, these CMA areas are invisible to the rest of the
> > > system. E.g. cma_for_each_area() does not know about them. It seems a
> > > bit inconsistent that there will now be some areas that are globally
> > > known, and some that are not.
> >
> > That was kind of the point of this experiment. When I started on this I
> > ran into the case where I was running out of predefined CMA areas and as
> > I went looking for ways on how to fix this, I realized that there's not
> > much reason to keep a global list of these areas. And even less reason
> > to limit the number of CMA areas to this predefined list. Very little
> > code outside of the core CMA code even uses this.
> >
> > There's one instance of cma_for_each_area() that I don't grok. There's
> > another early MMU fixup for CMA areas in 32-bit ARM that. Other than
> > that there's a few places where the total CMA page count is shown for
> > informational purposes and I don't know how useful that really is
> > because totalcma_pages doesn't really track how many pages are used for
> > CMA, but pages that could potentially be used for CMA.
> >
> > And that's about it.
> >
> > It seems like there are cases where we might really need to globally
> > know about some of these areas, specifically ones that are allocated
> > very early during boot and then used for very specific purposes.
> >
> > However, it seems to me like CMA is more universally useful than just
> > for these cases and I don't see the usefulness of tracking these more
> > generic uses.
> >
> > > I am being somewhat selfish here, as I have some WIP code that needs
> > > the global list :-) But I think the inconsistency is a more general
> > > point than just what I want (and the s390 code does use
> > > cma_for_each_area()). Maybe you could keep maintaining a global
> > > structure containing all areas?
> >
> > If it's really useful to be able to access all CMA areas, then we could
> > easily just add them all to a global linked list upon activation (we may
> > still want/need to keep the predefined list around for all those early
> > allocation cases). That way we'd get the best of both worlds.
> >
> > > What do you think are the chances of running out of the global count
> > > of areas?
> >
> > Well, I did run out of CMA areas during the early VPR testing because I
> > was initially testing with 16 areas and a different allocation scheme
> > that turned out to cause too many resizes in common cases.
> >
> > However, given that the default is 8 on normal systems (20 on NUMA) and
> > is configurable, it means that even with restricting this to 4 for VPR
> > doesn't always guarantee that all 4 are available. Again, yes, we could
> > keep bumping that number, but why not turn this into something a bit
> > more robust where nobody has to know or care about how many there are?
> >
> > > Also, you say that "these are treated specially and do not contribute
> > > to the number of total CMA pages". But, if I'm reading this right, you
> > > do call cma_activate_area(), which will do
> > > init_cma_reserved_pageblock() for each pageblock in it. Which adjusts
> > > the CMA counters for the zone they are in. But your change does not
> > > adjust totalcma_pages for dynamically created areas. That seems
> > > inconsistent, too.
> >
> > I was referring to just totalcma_pages that isn't impacted by these
> > dynamically allocated regions. This is, again, because I don't see why
> > that information would be useful. It's a fairly easy change to update
> > that value, so if people prefer that, I can add that.
> >
> > I don't see an immediate connection between totalcma_pages and
> > init_cma_reserved_pageblock(). I thought the latter was primarily useful
> > for making sure that the CMA pages can be migrated, which is still
> > critical for this use-case.
>
> My comment was about statistics, they would be inconsistent after your
> change. E.g. currently, totalcma_pages is equal to the sum of CMA
> pages in each zone. But that would no longer be true, and applications
> / administrators looking at those statistics might see the
> inconsistency (between meminfo and vmstat) and wonder what's going on.
> It seems best to keep those numbers in sync.
>
> In general, I think it's fine to support dynamic allocation, and I
> agree with your arguments that it doesn't seem right to set the number
> of CMA areas via a config option. I would just like there to be a
> canonical way to find all CMA areas.
Okay, so judging by your and David's feedback, it sounds like I should
add a bit of code to track dynamically allocated areas within a global
list, along with the existing predefined regions and keep totalcma_pages
updated so that the global view is consistent.
I'll look into that. Thanks for the feedback.
Thierry
On Mon, 1 Sep 2025 15:32:22 +0530 Meghana Malladi wrote:
> if (!emac->xdpi.prog && !prog)
> return 0;
>
> - WRITE_ONCE(emac->xdp_prog, prog);
> + if (netif_running(emac->ndev)) {
> + prueth_destroy_txq(emac);
> + prueth_destroy_rxq(emac);
> + }
> +
> + old_prog = xchg(&emac->xdp_prog, prog);
> + if (old_prog)
> + bpf_prog_put(old_prog);
> +
> + if (netif_running(emac->ndev)) {
> + ret = prueth_create_rxq(emac);
shutting the device down and freeing all rx memory for reconfig is not
okay. If the system is low on memory the Rx buffer allocations may fail
and system may drop off the network. You must either pre-allocate or
avoid freeing the memory, and just restart the queues.
> + if (ret) {
> + netdev_err(emac->ndev, "Failed to create RX queue: %d\n", ret);
> + return ret;
> + }
> +
> + ret = prueth_create_txq(emac);
> + if (ret) {
> + netdev_err(emac->ndev, "Failed to create TX queue: %d\n", ret);
> + prueth_destroy_rxq(emac);
> + emac->xdp_prog = NULL;
> + return ret;
> + }
> + }
>
> xdp_attachment_setup(&emac->xdpi, bpf);
On Sat, Aug 30, 2025 at 4:58 PM Barry Song <21cnbao(a)gmail.com> wrote:
>
> From: Barry Song <v-songbaohua(a)oppo.com>
>
> We can allocate high-order pages, but mapping them one by
> one is inefficient. This patch changes the code to map
> as large a chunk as possible. The code looks somewhat
> complicated mainly because supporting mmap with a
> non-zero offset is a bit tricky.
>
> Using the micro-benchmark below, we see that mmap becomes
> 3.5X faster:
...
It's been awhile since I've done mm things, so take it with a pinch of
salt, but this seems reasonable to me.
Though, one thought below...
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index bbe7881f1360..4c782fe33fd4 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -186,20 +186,35 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> struct system_heap_buffer *buffer = dmabuf->priv;
> struct sg_table *table = &buffer->sg_table;
> unsigned long addr = vma->vm_start;
> - struct sg_page_iter piter;
> - int ret;
> + unsigned long pgoff = vma->vm_pgoff;
> + struct scatterlist *sg;
> + int i, ret;
> +
> + for_each_sgtable_sg(table, sg, i) {
> + unsigned long n = sg->length >> PAGE_SHIFT;
>
> - for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
> - struct page *page = sg_page_iter_page(&piter);
> + if (pgoff < n)
> + break;
> + pgoff -= n;
> + }
> +
> + for (; sg && addr < vma->vm_end; sg = sg_next(sg)) {
> + unsigned long n = (sg->length >> PAGE_SHIFT) - pgoff;
> + struct page *page = sg_page(sg) + pgoff;
> + unsigned long size = n << PAGE_SHIFT;
> +
> + if (addr + size > vma->vm_end)
> + size = vma->vm_end - addr;
>
> - ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE,
> - vma->vm_page_prot);
> + ret = remap_pfn_range(vma, addr, page_to_pfn(page),
> + size, vma->vm_page_prot);
It feels like this sort of mapping loop for higher order pages
wouldn't be a unique pattern to just this code. Would this be better
worked into a helper so it would be more generally usable?
Otherwise,
Acked-by: John Stultz <jstultz(a)google.com>
thanks
-john
>> +>> +struct cma *__init cma_create(phys_addr_t base, phys_addr_t size,
>> + unsigned int order_per_bit, const char *name)
>> +{
>> + struct cma *cma;
>> + int ret;
>> +
>> + ret = cma_check_memory(base, size);
>> + if (ret < 0)
>> + return ERR_PTR(ret);
>> +
>> + cma = kzalloc(sizeof(*cma), GFP_KERNEL);
>> + if (!cma)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + cma_init_area(cma, name, size, order_per_bit);
>> + cma->ranges[0].base_pfn = PFN_DOWN(base);
>> + cma->ranges[0].early_pfn = PFN_DOWN(base);
>> + cma->ranges[0].count = cma->count;
>> + cma->nranges = 1;
>> +
>> + cma_activate_area(cma);
>> +
>> + return cma;
>> +}
>> +
>> +void cma_free(struct cma *cma)
>> +{
>> + kfree(cma);
>> +}
>> --
>> 2.50.0
>
>
> I agree that supporting dynamic CMA areas would be good. However, by
> doing it like this, these CMA areas are invisible to the rest of the
> system. E.g. cma_for_each_area() does not know about them. It seems a
> bit inconsistent that there will now be some areas that are globally
> known, and some that are not.
Yeah, I'm not a fan of that.
What is the big problem we are trying to solve here? Why do they have to
be dynamic, why do they even have to support freeing?
--
Cheers
David / dhildenb
On Tue, Sep 02, 2025 at 10:27:01AM -0700, Frank van der Linden wrote:
> On Tue, Sep 2, 2025 at 8:46 AM Thierry Reding <thierry.reding(a)gmail.com> wrote:
> >
> > From: Thierry Reding <treding(a)nvidia.com>
> >
> > There is no technical reason why there should be a limited number of CMA
> > regions, so extract some code into helpers and use them to create extra
> > functions (cma_create() and cma_free()) that allow creating and freeing,
> > respectively, CMA regions dynamically at runtime.
> >
> > Note that these dynamically created CMA areas are treated specially and
> > do not contribute to the number of total CMA pages so that this count
> > still only applies to the fixed number of CMA areas.
> >
> > Signed-off-by: Thierry Reding <treding(a)nvidia.com>
> > ---
> > include/linux/cma.h | 16 ++++++++
> > mm/cma.c | 89 ++++++++++++++++++++++++++++++++++-----------
> > 2 files changed, 83 insertions(+), 22 deletions(-)
[...]
> I agree that supporting dynamic CMA areas would be good. However, by
> doing it like this, these CMA areas are invisible to the rest of the
> system. E.g. cma_for_each_area() does not know about them. It seems a
> bit inconsistent that there will now be some areas that are globally
> known, and some that are not.
That was kind of the point of this experiment. When I started on this I
ran into the case where I was running out of predefined CMA areas and as
I went looking for ways on how to fix this, I realized that there's not
much reason to keep a global list of these areas. And even less reason
to limit the number of CMA areas to this predefined list. Very little
code outside of the core CMA code even uses this.
There's one instance of cma_for_each_area() that I don't grok. There's
another early MMU fixup for CMA areas in 32-bit ARM that. Other than
that there's a few places where the total CMA page count is shown for
informational purposes and I don't know how useful that really is
because totalcma_pages doesn't really track how many pages are used for
CMA, but pages that could potentially be used for CMA.
And that's about it.
It seems like there are cases where we might really need to globally
know about some of these areas, specifically ones that are allocated
very early during boot and then used for very specific purposes.
However, it seems to me like CMA is more universally useful than just
for these cases and I don't see the usefulness of tracking these more
generic uses.
> I am being somewhat selfish here, as I have some WIP code that needs
> the global list :-) But I think the inconsistency is a more general
> point than just what I want (and the s390 code does use
> cma_for_each_area()). Maybe you could keep maintaining a global
> structure containing all areas?
If it's really useful to be able to access all CMA areas, then we could
easily just add them all to a global linked list upon activation (we may
still want/need to keep the predefined list around for all those early
allocation cases). That way we'd get the best of both worlds.
> What do you think are the chances of running out of the global count
> of areas?
Well, I did run out of CMA areas during the early VPR testing because I
was initially testing with 16 areas and a different allocation scheme
that turned out to cause too many resizes in common cases.
However, given that the default is 8 on normal systems (20 on NUMA) and
is configurable, it means that even with restricting this to 4 for VPR
doesn't always guarantee that all 4 are available. Again, yes, we could
keep bumping that number, but why not turn this into something a bit
more robust where nobody has to know or care about how many there are?
> Also, you say that "these are treated specially and do not contribute
> to the number of total CMA pages". But, if I'm reading this right, you
> do call cma_activate_area(), which will do
> init_cma_reserved_pageblock() for each pageblock in it. Which adjusts
> the CMA counters for the zone they are in. But your change does not
> adjust totalcma_pages for dynamically created areas. That seems
> inconsistent, too.
I was referring to just totalcma_pages that isn't impacted by these
dynamically allocated regions. This is, again, because I don't see why
that information would be useful. It's a fairly easy change to update
that value, so if people prefer that, I can add that.
I don't see an immediate connection between totalcma_pages and
init_cma_reserved_pageblock(). I thought the latter was primarily useful
for making sure that the CMA pages can be migrated, which is still
critical for this use-case.
Thierry
Hi Amirreza,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 33bcf93b9a6b028758105680f8b538a31bc563cf]
url: https://github.com/intel-lab-lkp/linux/commits/Amirreza-Zarrabi/tee-allow-a…
base: 33bcf93b9a6b028758105680f8b538a31bc563cf
patch link: https://lore.kernel.org/r/20250901-qcom-tee-using-tee-ss-without-mem-obj-v9…
patch subject: [PATCH v9 06/11] firmware: qcom: scm: add support for object invocation
config: i386-buildonly-randconfig-001-20250903 (https://download.01.org/0day-ci/archive/20250903/202509030554.WR3MNpCE-lkp@…)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250903/202509030554.WR3MNpCE-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509030554.WR3MNpCE-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from include/linux/device.h:15,
from include/linux/dma-mapping.h:5,
from drivers/firmware/qcom/qcom_scm.c:13:
drivers/firmware/qcom/qcom_scm.c: In function 'qcom_scm_qtee_init':
>> drivers/firmware/qcom/qcom_scm.c:2208:35: warning: format '%d' expects argument of type 'int', but argument 3 has type 'long int' [-Wformat=]
2208 | dev_err(scm->dev, "qcomtee: register failed: %d\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/dev_printk.h:110:30: note: in definition of macro 'dev_printk_index_wrap'
110 | _p_func(dev, fmt, ##__VA_ARGS__); \
| ^~~
include/linux/dev_printk.h:154:56: note: in expansion of macro 'dev_fmt'
154 | dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~
drivers/firmware/qcom/qcom_scm.c:2208:17: note: in expansion of macro 'dev_err'
2208 | dev_err(scm->dev, "qcomtee: register failed: %d\n",
| ^~~~~~~
drivers/firmware/qcom/qcom_scm.c:2208:63: note: format string is defined here
2208 | dev_err(scm->dev, "qcomtee: register failed: %d\n",
| ~^
| |
| int
| %ld
vim +2208 drivers/firmware/qcom/qcom_scm.c
2188
2189 static void qcom_scm_qtee_init(struct qcom_scm *scm)
2190 {
2191 struct platform_device *qtee_dev;
2192 u64 result, response_type;
2193 int ret;
2194
2195 /*
2196 * Probe for smcinvoke support. This will fail due to invalid buffers,
2197 * but first, it checks whether the call is supported in QTEE syscall
2198 * handler. If it is not supported, -EIO is returned.
2199 */
2200 ret = qcom_scm_qtee_invoke_smc(0, 0, 0, 0, &result, &response_type);
2201 if (ret == -EIO)
2202 return;
2203
2204 /* Setup QTEE interface device. */
2205 qtee_dev = platform_device_register_data(scm->dev, "qcomtee",
2206 PLATFORM_DEVID_NONE, NULL, 0);
2207 if (IS_ERR(qtee_dev)) {
> 2208 dev_err(scm->dev, "qcomtee: register failed: %d\n",
2209 PTR_ERR(qtee_dev));
2210 return;
2211 }
2212
2213 ret = devm_add_action_or_reset(scm->dev, qcom_scm_qtee_free, qtee_dev);
2214 if (ret)
2215 dev_err(scm->dev, "qcomtee: add action failed: %d\n", ret);
2216 }
2217
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Changelog:
v1:
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v1 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO regions
from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
page memory with controlled lifetime management. This allows RDMA and other
subsystems to import dma-buf FDs and build them into memory regions for PCI
P2P operations.
The series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
Leon Romanovsky (8):
PCI/P2PDMA: Remove redundant bus_offset from map state
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory
allocation
PCI/P2PDMA: Export pci_p2pdma_map_type() function
types: move phys_vec definition to common header
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature
functions
block/blk-mq-dma.c | 7 +-
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 154 ++++++++----
drivers/vfio/pci/Kconfig | 20 ++
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/vfio_pci_config.c | 22 +-
drivers/vfio/pci/vfio_pci_core.c | 59 +++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 390 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 ++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 1 +
include/linux/pci-p2pdma.h | 114 +++++----
include/linux/types.h | 5 +
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 25 ++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
18 files changed, 715 insertions(+), 125 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
--
2.50.1