dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.
This series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
generic and safe P2P mappings.
This series goes after the "Break up ioctl dispatch functions to one
function per ioctl" series.
This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
v2:
- Name the new file dma_buf.c
- Restore orig_nents before freeing
- Fix reversed logic around priv->revoked
- Set priv->index
- Rebased on v2 "Break up ioctl dispatch functions"
v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com
Cc: linux-rdma(a)vger.kernel.org
Cc: Oded Gabbay <ogabbay(a)kernel.org>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Leon Romanovsky <leon(a)kernel.org>
Cc: Maor Gottlieb <maorg(a)nvidia.com>
Cc: dri-devel(a)lists.freedesktop.org
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Jason Gunthorpe (4):
dma-buf: Add dma_buf_try_get()
vfio: Add vfio_device_get()
vfio_pci: Do not open code pci_try_reset_function()
vfio/pci: Allow MMIO regions to be exported through dma-buf
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/dma_buf.c | 269 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_config.c | 22 ++-
drivers/vfio/pci/vfio_pci_core.c | 33 +++-
drivers/vfio/pci/vfio_pci_priv.h | 24 +++
drivers/vfio/vfio_main.c | 3 +-
include/linux/dma-buf.h | 13 ++
include/linux/vfio.h | 6 +
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 18 ++
10 files changed, 368 insertions(+), 22 deletions(-)
create mode 100644 drivers/vfio/pci/dma_buf.c
base-commit: 285fef0ff7f1a97d8acd380971c061985d8dafb5
--
2.37.2
TTM, GEM, DRM or the core DMA-buf framework are needs
to enable software signaling before the fence is signaled.
The core DMA-buf framework software can forget to call
enable_signaling before the fence is signaled. It means
framework code can forget to call dma_fence_enable_sw_signaling()
before calling dma_fence_is_signaled(). To avoid this scenario
on debug kernel, check the DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT bit
status before checking the MA_FENCE_FLAG_SIGNALED_BIT bit status
to confirm that software signaling is enabled.
Arvind Yadav (4):
dma-buf: Check status of enable-signaling bit on debug
drm/sched: Add callback and enable signaling on debug
dma-buf: Add callback and enable signaling on debug
dma-buf: Add callback and enable signaling on debug
drivers/dma-buf/dma-fence.c | 17 ++++++++
drivers/dma-buf/st-dma-fence-chain.c | 17 ++++++++
drivers/dma-buf/st-dma-fence-unwrap.c | 54 +++++++++++++++++++++++++
drivers/dma-buf/st-dma-fence.c | 34 +++++++++++++++-
drivers/dma-buf/st-dma-resv.c | 30 ++++++++++++++
drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++
drivers/gpu/drm/scheduler/sched_main.c | 4 +-
include/linux/dma-fence.h | 5 +++
8 files changed, 171 insertions(+), 2 deletions(-)
--
2.25.1
Hi Daniel Vetter,
The patch https://patchwork.freedesktop.org/patch/414455/:
"dma-buf: Add debug option" from Jan. 15, 2021, leads to the following expection:
Backtrace:
[<ffffffc0081a2258>] atomic_notifier_call_chain+0x9c/0xe8
[<ffffffc0081a2d54>] notify_die+0x114/0x19c
[<ffffffc0080348d8>] __die+0xec/0x468
[<ffffffc008034648>] die+0x54/0x1f8
[<ffffffc0080631e8>] die_kernel_fault+0x80/0xbc
[<ffffffc0080630fc>] __do_kernel_fault+0x268/0x2d4
[<ffffffc008062c4c>] do_bad_area+0x68/0x148
[<ffffffc00a6dab34>] do_translation_fault+0xbc/0x108
[<ffffffc0080619f8>] do_mem_abort+0x6c/0x1e8
[<ffffffc00a68f5cc>] el1_abort+0x3c/0x64
[<ffffffc00a68f54c>] el1h_64_sync_handler+0x5c/0xa0
[<ffffffc008011ae4>] el1h_64_sync+0x78/0x80
[<ffffffc008063b9c>] dcache_inval_poc+0x40/0x58
[<ffffffc009236104>] iommu_dma_sync_sg_for_cpu+0x144/0x280
[<ffffffc0082b4870>] dma_sync_sg_for_cpu+0xbc/0x110
[<ffffffc002c7538c>] system_heap_dma_buf_begin_cpu_access+0x144/0x1e0 [system_heap]
[<ffffffc0094154e4>] dma_buf_begin_cpu_access+0xa4/0x10c
[<ffffffc004888df4>] isp71_allocate_working_buffer+0x3b0/0xe8c [mtk_hcp]
[<ffffffc004884a20>] mtk_hcp_allocate_working_buffer+0xc0/0x108 [mtk_hcp]
Because of CONFIG_DMABUF_DEBUG will default enable when DMA_API_DEBUG enable,
and when not support dma coherent, since the main function of user calling
dma_buf_begin_cpu_access and dma_buf_end_cpu_access is to do cache sync during
dma_buf_map_attachment and dma_buf_unmap_attachment, which get PA error from
sgtable by sg_phys(sg), this leads to the expection.
1.dma_buf_map_attachement()
-.> mangle_sg_table(sg) // "sg->page_link ^= ~0xffUL" to rotate PA in this patch.
2.dma_buf_begin_cpu_access()
-.> system_heap_dma_buf_begin_cpu_access() in system_heap.c // do cache sync if mapped attachment before
-.> iommu_dma_sync_sg_for_cpu() in dma-iommu.c
-.> arch_sync_dma_for_device(sg_phys(sg), sg->length, dir) // get PA error since PA mix up
3.dma_buf_end_cpu_access() and dma_buf_begin_cpu_access are similar.
4.dma_buf_unmap_attachement()
-.> mangle_sg_table(sg) // "sg->page_link ^= ~0xffUL" to rotate PA
drivers/dma-buf/Kconfig:
config DMABUF_DEBUG
bool "DMA-BUF debug checks"
default y if DMA_API_DEBUG
drivers/dma-buf/dma-buf.c:
static void mangle_sg_table(struct sg_table *sg_table)
{
#ifdef CONFIG_DMABUF_DEBUG
int i;
struct scatterlist *sg;
/* To catch abuse of the underlying struct page by importers mix
* up the bits, but take care to preserve the low SG_ bits to
* not corrupt the sgt. The mixing is undone in __unmap_dma_buf
* before passing the sgt back to the exporter. */
for_each_sgtable_sg(sg_table, sg, i)
sg->page_link ^= ~0xffUL;
#endif
}
drivers/iommu/dma-iommu.c:
static void iommu_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nelems,
enum dma_data_direction dir)
{
struct scatterlist *sg;
int i;
if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
for_each_sg(sgl, sg, nelems, i) {
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
if (is_swiotlb_buffer(sg_phys(sg)))
swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
dir, SYNC_FOR_CPU);
}
}
Thanks,
Yunfei.