On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote:
> On Thu, 20 Nov 2025 11:28:29 +0200
> Leon Romanovsky <leon(a)kernel.org> wrote:
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 142b84b3f225..51a3bcc26f8b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> ...
> > @@ -2487,8 +2500,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> >
> > err_undo:
> > list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> > - vdev.dev_set_list)
> > + vdev.dev_set_list) {
> > + if (__vfio_pci_memory_enabled(vdev))
> > + vfio_pci_dma_buf_move(vdev, false);
> > up_write(&vdev->memory_lock);
> > + }
>
> I ran into a bug here. In the hot reset path we can have dev_sets
> where one or more devices are not opened by the user. The vconfig
> buffer for the device is established on open. However:
>
> bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev)
> {
> struct pci_dev *pdev = vdev->pdev;
> u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
> ...
>
> Leads to a NULL pointer dereference.
>
> I think the most straightforward fix is simply to test the open_count
> on the vfio_device, which is also protected by the dev_set->lock that
> we already hold here:
>
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> err_undo:
> list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> vdev.dev_set_list) {
> - if (__vfio_pci_memory_enabled(vdev))
> + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev))
> vfio_pci_dma_buf_move(vdev, false);
> up_write(&vdev->memory_lock);
> }
>
> Any other suggestions? This should be the only reset path with this
> nuance of affecting non-opened devices. Thanks,
It seems right to me.
Thanks
>
> Alex
Hi Barry,
On Fri, 21 Nov 2025 at 06:54, Barry Song <21cnbao(a)gmail.com> wrote:
>
> Hi Sumit,
>
> >
> > Using the micro-benchmark below, we see that mmap becomes
> > 3.5X faster:
>
>
> Marcin pointed out to me off-tree that it is actually 35x faster,
> not 3.5x faster. Sorry for my poor math. I assume you can fix this
> when merging it?
Sure, I corrected this, and is merged to drm-misc-next
Thanks,
Sumit.
>
> >
> > W/ patch:
> >
> > ~ # ./a.out
> > mmap 512MB took 200266.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 198151.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 197069.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 196781.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 198102.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 195552.000 us, verify OK
> >
> > W/o patch:
> >
> > ~ # ./a.out
> > mmap 512MB took 6987470.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6970739.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6984383.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6971311.000 us, verify OK
> > ~ # ./a.out
> > mmap 512MB took 6991680.000 us, verify OK
>
>
> Thanks
> Barry
On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote:
> @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> err_undo:
> list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> vdev.dev_set_list) {
> - if (__vfio_pci_memory_enabled(vdev))
> + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev))
> vfio_pci_dma_buf_move(vdev, false);
> up_write(&vdev->memory_lock);
> }
>
> Any other suggestions? This should be the only reset path with this
> nuance of affecting non-opened devices. Thanks,
Seems reasonable, but should it be in __vfio_pci_memory_enabled() just
to be robust?
Jason
On Thu, Nov 20, 2025 at 07:59:19AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg(a)nvidia.com>
> > Sent: Saturday, November 8, 2025 12:50 AM
> >
> > +enum batch_kind {
> > + BATCH_CPU_MEMORY = 0,
> > + BATCH_MMIO,
> > +};
>
> with 'CPU_MEMORY' (instead of plain 'MEMORY') implies future
> support of 'DEV_MEMORY'?
Maybe, but I don't have an immediate thought on this. CXL "MMIO" that
is cachable is a thing but we can also label it as CPU_MEMORY.
We might have something for CC shared/protected memory down the road.
Thanks,
Jason
On 11/18/25 17:03, Tvrtko Ursulin wrote:
>>>> @@ -448,13 +465,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>>>> static inline bool
>>>> dma_fence_is_signaled(struct dma_fence *fence)
>>>> {
>>>> + const struct dma_fence_ops *ops;
>>>> +
>>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>> return true;
>>>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>>> + rcu_read_lock();
>>>> + ops = rcu_dereference(fence->ops);
>>>> + if (ops->signaled && ops->signaled(fence)) {
>>>> + rcu_read_unlock();
>>>
>>> With the unlocked version two threads could race and one could make the fence->lock go away just around here, before the dma_fence_signal below will take it. It seems it is only safe to rcu_read_unlock before signaling if using the embedded fence (later in the series). Can you think of a downside to holding the rcu read lock to after signaling? that would make it safe I think.
>>
>> Well it's good to talk about it but I think that it is not necessary to protect the lock in this particular case.
>>
>> See the RCU protection is only for the fence->ops pointer, but the lock can be taken way after the fence is already signaled.
>>
>> That's why I came up with the patch to move the lock into the fence in the first place.
>
> Right. And you think there is nothing to gain with the option of keeping the rcu_read_unlock() to after signalling? Ie. why not plug a potential race if we can for no negative effect.
I thought quite a bit over that, but at least of hand I can't come up with a reason why we should do this. The signaling path doesn't need the RCU read side lock as far as I can see.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
WW mutexes and dma-resv objects, which embed them, typically have a
number of locks belocking to the same lock class. However
code using them typically want to verify the locking on
object granularity, not lock-class granularity.
This series add ww_mutex functions to facilitate that,
(patch 1) and utilizes these functions in the dma-resv lock
checks.
Thomas Hellström (2):
kernel/locking/ww_mutex: Add per-lock lock-check helpers
dma-buf/dma-resv: Improve the dma-resv lockdep checks
include/linux/dma-resv.h | 7 +++++--
include/linux/ww_mutex.h | 18 ++++++++++++++++++
kernel/locking/mutex.c | 10 ++++++++++
3 files changed, 33 insertions(+), 2 deletions(-)
--
2.51.1