On Thu, Nov 20, 2025 at 08:04:37AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg(a)nvidia.com>
> > Sent: Saturday, November 8, 2025 12:50 AM
> > +
> > +static int pfn_reader_fill_dmabuf(struct pfn_reader_dmabuf *dmabuf,
> > + struct pfn_batch *batch,
> > + unsigned long start_index,
> > + unsigned long last_index)
> > +{
> > + unsigned long start = dmabuf->start_offset + start_index * PAGE_SIZE;
> > +
> > + /*
> > + * This works in PAGE_SIZE indexes, if the dmabuf is sliced and
> > + * starts/ends at a sub page offset then the batch to domain code will
> > + * adjust it.
> > + */
>
> dmabuf->start_offset comes from pages->dmabuf.start, which is initialized as:
>
> pages->dmabuf.start = start - start_byte;
>
> so it's always page-aligned. Where is the sub-page offset coming from?
I need to go over this again to check it, this sub-page stuff is
a bit convoluted. start_offset should include the sub page offset
here..
> > @@ -1687,6 +1737,12 @@ static void __iopt_area_unfill_domain(struct
> > iopt_area *area,
> >
> > lockdep_assert_held(&pages->mutex);
> >
> > + if (iopt_is_dmabuf(pages)) {
> > + iopt_area_unmap_domain_range(area, domain, start_index,
> > + last_index);
> > + return;
> > + }
> > +
>
> this belongs to patch3?
This is part of programming the domain with the dmabuf, the patch3 was
about the revoke which is a slightly different topic though they are
both similar.
Thanks,
Jason
On Thu, Nov 20, 2025 at 07:55:04AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg(a)nvidia.com>
> > Sent: Saturday, November 8, 2025 12:50 AM
> >
> >
> > @@ -2031,7 +2155,10 @@ int iopt_pages_rw_access(struct iopt_pages
> > *pages, unsigned long start_byte,
> > if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable)
> > return -EPERM;
> >
> > - if (pages->type == IOPT_ADDRESS_FILE)
> > + if (iopt_is_dmabuf(pages))
> > + return -EINVAL;
> > +
>
> probably also add helpers for other types, e.g.:
>
> iopt_is_user()
> iopt_is_memfd()
The helper was to integrate the IS_ENABLED() check for DMABUF, there
are not so many others uses, I think leave it to not bloat the patch.
> > + if (pages->type != IOPT_ADDRESS_USER)
> > return iopt_pages_rw_slow(pages, start_index, last_index,
> > start_byte % PAGE_SIZE, data,
> > length,
> > flags);
> > --
>
> then the following WARN_ON() becomes useless:
>
> if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
> WARN_ON(pages->type != IOPT_ADDRESS_USER))
> return -EINVAL;
Yep
Thanks,
Jason
On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote:
> On Thu, 20 Nov 2025 11:28:29 +0200
> Leon Romanovsky <leon(a)kernel.org> wrote:
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 142b84b3f225..51a3bcc26f8b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> ...
> > @@ -2487,8 +2500,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> >
> > err_undo:
> > list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> > - vdev.dev_set_list)
> > + vdev.dev_set_list) {
> > + if (__vfio_pci_memory_enabled(vdev))
> > + vfio_pci_dma_buf_move(vdev, false);
> > up_write(&vdev->memory_lock);
> > + }
>
> I ran into a bug here. In the hot reset path we can have dev_sets
> where one or more devices are not opened by the user. The vconfig
> buffer for the device is established on open. However:
>
> bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev)
> {
> struct pci_dev *pdev = vdev->pdev;
> u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
> ...
>
> Leads to a NULL pointer dereference.
>
> I think the most straightforward fix is simply to test the open_count
> on the vfio_device, which is also protected by the dev_set->lock that
> we already hold here:
>
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> err_undo:
> list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> vdev.dev_set_list) {
> - if (__vfio_pci_memory_enabled(vdev))
> + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev))
> vfio_pci_dma_buf_move(vdev, false);
> up_write(&vdev->memory_lock);
> }
>
> Any other suggestions? This should be the only reset path with this
> nuance of affecting non-opened devices. Thanks,
It seems right to me.
Thanks
>
> Alex
Hi Barry,
On Fri, 21 Nov 2025 at 06:54, Barry Song <21cnbao(a)gmail.com> wrote:
>
> Hi Sumit,
>
> >
> > Using the micro-benchmark below, we see that mmap becomes
> > 3.5X faster:
>
>
> Marcin pointed out to me off-tree that it is actually 35x faster,
> not 3.5x faster. Sorry for my poor math. I assume you can fix this
> when merging it?
Sure, I corrected this, and is merged to drm-misc-next
Thanks,
Sumit.
>
> >
> > W/ patch:
> >
> > ~ # ./a.out
> > mmap 512MB took 200266.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 198151.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 197069.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 196781.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 198102.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 195552.000 us, verify OK
> >
> > W/o patch:
> >
> > ~ # ./a.out
> > mmap 512MB took 6987470.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6970739.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6984383.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6971311.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6991680.000 us, verify OK
>
>
> Thanks
> Barry
On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote:
> @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> err_undo:
> list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> vdev.dev_set_list) {
> - if (__vfio_pci_memory_enabled(vdev))
> + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev))
> vfio_pci_dma_buf_move(vdev, false);
> up_write(&vdev->memory_lock);
> }
>
> Any other suggestions? This should be the only reset path with this
> nuance of affecting non-opened devices. Thanks,
Seems reasonable, but should it be in __vfio_pci_memory_enabled() just
to be robust?
Jason
On Thu, Nov 20, 2025 at 07:59:19AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg(a)nvidia.com>
> > Sent: Saturday, November 8, 2025 12:50 AM
> >
> > +enum batch_kind {
> > + BATCH_CPU_MEMORY = 0,
> > + BATCH_MMIO,
> > +};
>
> with 'CPU_MEMORY' (instead of plain 'MEMORY') implies future
> support of 'DEV_MEMORY'?
Maybe, but I don't have an immediate thought on this. CXL "MMIO" that
is cachable is a thing but we can also label it as CPU_MEMORY.
We might have something for CC shared/protected memory down the road.
Thanks,
Jason