Currently this has no practial relevance I think because there's not
many who can pull off a setup with panfrost and another gpu in the
same system. But the rules are that if you're setting an exclusive
fence, indicating a gpu write access in the implicit fencing system,
then you need to wait for all fences, not just the previous exclusive
fence.
panfrost against itself has no problem, because it always sets the
exclusive fence (but that's probably something that will need to be
fixed for vulkan and/or multi-engine gpus, or you'll suffer badly).
Also no problem with that against display.
With the prep work done to switch over to the dependency helpers this
is now a oneliner.
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Rob Herring <robh(a)kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com>
Cc: Steven Price <steven.price(a)arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig(a)collabora.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/gpu/drm/panfrost/panfrost_job.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71cd43fa1b36..ef004d587dc4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -203,9 +203,8 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
int i, ret;
for (i = 0; i < bo_count; i++) {
- struct dma_fence *fence = dma_resv_get_excl_unlocked(bos[i]->resv);
-
- ret = drm_gem_fence_array_add(deps, fence);
+ /* panfrost always uses write mode in its current uapi */
+ ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
if (ret)
return ret;
}
--
2.32.0.rc2
Am 22.06.21 um 17:40 schrieb Oded Gabbay:
> On Tue, Jun 22, 2021 at 6:31 PM Christian König
> <christian.koenig(a)amd.com> wrote:
>>
>>
>> Am 22.06.21 um 17:28 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:24:08PM +0200, Christian König wrote:
>>>
>>>>>> I will take two GAUDI devices and use one as an exporter and one as an
>>>>>> importer. I want to see that the solution works end-to-end, with real
>>>>>> device DMA from importer to exporter.
>>>>> I can tell you it doesn't. Stuffing physical addresses directly into
>>>>> the sg list doesn't involve any of the IOMMU code so any configuration
>>>>> that requires IOMMU page table setup will not work.
>>>> Sure it does. See amdgpu_vram_mgr_alloc_sgt:
>>>>
>>>> amdgpu_res_first(res, offset, length, &cursor);
>>> ^^^^^^^^^^
>>>
>>> I'm not talking about the AMD driver, I'm talking about this patch.
>>>
>>> + bar_address = hdev->dram_pci_bar_start +
>>> + (pages[cur_page] - prop->dram_base_address);
>>> + sg_dma_address(sg) = bar_address;
>> Yeah, that is indeed not working.
>>
>> Oded you need to use dma_map_resource() for this.
>>
>> Christian.
> Yes, of course.
> But will it be enough ?
> Jason said that supporting IOMMU isn't nice when we don't have struct pages.
> I fail to understand the connection, I need to dig into this.
Question is what you want to do with this?
A struct page is always needed if you want to do stuff like HMM with it,
if you only want P2P between device I actually recommend to avoid it.
Christian.
>
> Oded
>
>>
>>
>>> Jason
On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> >
> > > > 1) Setting sg_page to NULL
> > > > 2) 'mapping' pages for P2P DMA without going through the iommu
> > > > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > > > can work at all in the first place.
> > > >
> > > > All of these result in functional bugs in certain system
> > > > configurations.
> > > >
> > > > Jason
> > >
> > > Hi Jason,
> > > Thanks for the feedback.
> > > Regarding point 1, why is that a problem if we disable the option to
> > > mmap the dma-buf from user-space ?
> >
> > Userspace has nothing to do with needing struct pages or not
> >
> > Point 1 and 2 mostly go together, you supporting the iommu is not nice
> > if you dont have struct pages.
> >
> > You should study Logan's patches I pointed you at as they are solving
> > exactly this problem.
> Yes, I do need to study them. I agree with you here. It appears I
> have a hole in my understanding. I'm missing the connection between
> iommu support (which I must have of course) and struct pages.
Chistian explained what the AMD driver is doing by calling
dma_map_resource().
Which is a hacky and slow way of achieving what Logan's series is
doing.
> > No, the design of the dmabuf requires the exporter to do the dma maps
> > and so it is only the exporter that is wrong to omit all the iommu and
> > p2p logic.
> >
> > RDMA is OK today only because nobody has implemented dma buf support
> > in rxe/si - mainly because the only implementations of exporters don't
>
> Can you please educate me, what is rxe/si ?
Sorry, rxe/siw - these are the all-software implementations of RDMA
and they require the struct page to do a SW memory copy. They can't
implement dmabuf without it.
> ok...
> so how come that patch-set was merged into 5.12 if it's buggy ?
We only implemented true dma devices for RDMA DMABUF support, so it is
isn't buggy right now.
> Yes, that's what I expect to see. But I want to see it with my own
> eyes and then figure out how to solve this.
It might be tricky to test because you have to ensure the iommu is
turned on and has a non-idenity page table. Basically if it doesn't
trigger a IOMMU failure then the IOMMU isn't setup properly.
Jason
On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
> > 1) Setting sg_page to NULL
> > 2) 'mapping' pages for P2P DMA without going through the iommu
> > 3) Allowing P2P DMA without using the p2p dma API to validate that it
> > can work at all in the first place.
> >
> > All of these result in functional bugs in certain system
> > configurations.
> >
> > Jason
>
> Hi Jason,
> Thanks for the feedback.
> Regarding point 1, why is that a problem if we disable the option to
> mmap the dma-buf from user-space ?
Userspace has nothing to do with needing struct pages or not
Point 1 and 2 mostly go together, you supporting the iommu is not nice
if you dont have struct pages.
You should study Logan's patches I pointed you at as they are solving
exactly this problem.
> In addition, I didn't see any problem with sg_page being NULL in the
> RDMA p2p dma-buf code. Did I miss something here ?
No, the design of the dmabuf requires the exporter to do the dma maps
and so it is only the exporter that is wrong to omit all the iommu and
p2p logic.
RDMA is OK today only because nobody has implemented dma buf support
in rxe/si - mainly because the only implementations of exporters don't
set the struct page and are thus buggy.
> I will take two GAUDI devices and use one as an exporter and one as an
> importer. I want to see that the solution works end-to-end, with real
> device DMA from importer to exporter.
I can tell you it doesn't. Stuffing physical addresses directly into
the sg list doesn't involve any of the IOMMU code so any configuration
that requires IOMMU page table setup will not work.
Jason
On Tue, Jun 22, 2021 at 03:04:30PM +0300, Oded Gabbay wrote:
> On Tue, Jun 22, 2021 at 3:01 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote:
> >
> > On Tue, Jun 22, 2021 at 11:42:27AM +0300, Oded Gabbay wrote:
> > > On Tue, Jun 22, 2021 at 9:37 AM Christian König
> > > <ckoenig.leichtzumerken(a)gmail.com> wrote:
> > > >
> > > > Am 22.06.21 um 01:29 schrieb Jason Gunthorpe:
> > > > > On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> > > > >
> > > > >> Another thing I want to emphasize is that we are doing p2p only
> > > > >> through the export/import of the FD. We do *not* allow the user to
> > > > >> mmap the dma-buf as we do not support direct IO. So there is no access
> > > > >> to these pages through the userspace.
> > > > > Arguably mmaping the memory is a better choice, and is the direction
> > > > > that Logan's series goes in. Here the use of DMABUF was specifically
> > > > > designed to allow hitless revokation of the memory, which this isn't
> > > > > even using.
> > > >
> > > > The major problem with this approach is that DMA-buf is also used for
> > > > memory which isn't CPU accessible.
> >
> > That isn't an issue here because the memory is only intended to be
> > used with P2P transfers so it must be CPU accessible.
> >
> > > > That was one of the reasons we didn't even considered using the mapping
> > > > memory approach for GPUs.
> >
> > Well, now we have DEVICE_PRIVATE memory that can meet this need
> > too.. Just nobody has wired it up to hmm_range_fault()
> >
> > > > > So you are taking the hit of very limited hardware support and reduced
> > > > > performance just to squeeze into DMABUF..
> > >
> > > Thanks Jason for the clarification, but I honestly prefer to use
> > > DMA-BUF at the moment.
> > > It gives us just what we need (even more than what we need as you
> > > pointed out), it is *already* integrated and tested in the RDMA
> > > subsystem, and I'm feeling comfortable using it as I'm somewhat
> > > familiar with it from my AMD days.
> >
> > You still have the issue that this patch is doing all of this P2P
> > stuff wrong - following the already NAK'd AMD approach.
>
> Could you please point me exactly to the lines of code that are wrong
> in your opinion ?
1) Setting sg_page to NULL
2) 'mapping' pages for P2P DMA without going through the iommu
3) Allowing P2P DMA without using the p2p dma API to validate that it
can work at all in the first place.
All of these result in functional bugs in certain system
configurations.
Jason
On Mon, Jun 21, 2021 at 10:24:16PM +0300, Oded Gabbay wrote:
> Another thing I want to emphasize is that we are doing p2p only
> through the export/import of the FD. We do *not* allow the user to
> mmap the dma-buf as we do not support direct IO. So there is no access
> to these pages through the userspace.
Arguably mmaping the memory is a better choice, and is the direction
that Logan's series goes in. Here the use of DMABUF was specifically
designed to allow hitless revokation of the memory, which this isn't
even using.
So you are taking the hit of very limited hardware support and reduced
performance just to squeeze into DMABUF..
Jason
On Mon, Jun 21, 2021 at 07:26:14PM +0300, Oded Gabbay wrote:
> On Mon, Jun 21, 2021 at 5:12 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote:
> >
> > On Mon, Jun 21, 2021 at 03:02:10PM +0200, Greg KH wrote:
> > > On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> >
> > > > Also I'm wondering which is the other driver that we share buffers
> > > > with. The gaudi stuff doesn't have real struct pages as backing
> > > > storage, it only fills out the dma_addr_t. That tends to blow up with
> > > > other drivers, and the only place where this is guaranteed to work is
> > > > if you have a dynamic importer which sets the allow_peer2peer flag.
> > > > Adding maintainers from other subsystems who might want to chime in
> > > > here. So even aside of the big question as-is this is broken.
> > >
> > > From what I can tell this driver is sending the buffers to other
> > > instances of the same hardware,
> >
> > A dmabuf is consumed by something else in the kernel calling
> > dma_buf_map_attachment() on the FD.
> >
> > What is the other side of this? I don't see any
> > dma_buf_map_attachment() calls in drivers/misc, or added in this patch
> > set.
>
> This patch-set is only to enable the support for the exporter side.
> The "other side" is any generic RDMA networking device that will want
> to perform p2p communication over PCIe with our GAUDI accelerator.
> An example is indeed the mlnx5 card which has already integrated
> support for being an "importer".
It raises the question of how you are testing this if you aren't using
it with the only intree driver: mlx5.
Jason
On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay <ogabbay(a)kernel.org> wrote:
> User process might want to share the device memory with another
> driver/device, and to allow it to access it over PCIe (P2P).
>
> To enable this, we utilize the dma-buf mechanism and add a dma-buf
> exporter support, so the other driver can import the device memory and
> access it.
>
> The device memory is allocated using our existing allocation uAPI,
> where the user will get a handle that represents the allocation.
>
> The user will then need to call the new
> uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
>
> The driver will return a FD that represents the DMA-BUF object that
> was created to match that allocation.
>
> Signed-off-by: Oded Gabbay <ogabbay(a)kernel.org>
> Reviewed-by: Tomer Tayar <ttayar(a)habana.ai>
Mission acomplished, we've gone full circle, and the totally-not-a-gpu
driver is now trying to use gpu infrastructure. And seems to have
gained vram meanwhile too. Next up is going to be synchronization
using dma_fence so you can pass buffers back&forth without stalls
among drivers.
Bonus points for this being at v3 before it shows up on dri-devel and
cc's dma-buf folks properly (not quite all, I added the missing
people).
I think we roughly have two options here
a) Greg continues to piss off dri-devel folks while trying to look
cute&cuddly and steadfastly claiming that this accelator doesn't work
like any of the other accelerator drivers we have in drivers/gpu/drm.
All while the driver ever more looks like one of these other accel
drivers.
b) We finally do what we should have done years back and treat this as
a proper driver submission and review it on dri-devel instead of
sneaking it in through other channels because the merge criteria
dri-devel has are too onerous and people who don't have experience
with accel stacks for the past 20 years or so don't like them.
"But this probably means a new driver and big disruption!"
Not my problem, I'm not the dude who has to come up with an excuse for
this because I didn't merge the driver in the first place. I do get to
throw a "we all told you so" in though, but that's not helping.
Also I'm wondering which is the other driver that we share buffers
with. The gaudi stuff doesn't have real struct pages as backing
storage, it only fills out the dma_addr_t. That tends to blow up with
other drivers, and the only place where this is guaranteed to work is
if you have a dynamic importer which sets the allow_peer2peer flag.
Adding maintainers from other subsystems who might want to chime in
here. So even aside of the big question as-is this is broken.
Currently only 2 drivers set allow_peer2peer, so those are the only
ones who can consume these buffers from device memory. Pinging those
folks specifically.
Doug/Jason from infiniband: Should we add linux-rdma to the dma-buf
wildcard match so that you can catch these next time around too? At
least when people use scripts/get_maintainers.pl correctly. All the
other subsystems using dma-buf are on there already (dri-devel,
linux-media and linaro-mm-sig for android/arm embedded stuff).
Cheers, Daniel
> ---
> include/uapi/misc/habanalabs.h | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
> index a47a731e4527..aa3d8e0ba060 100644
> --- a/include/uapi/misc/habanalabs.h
> +++ b/include/uapi/misc/habanalabs.h
> @@ -808,6 +808,10 @@ union hl_wait_cs_args {
> #define HL_MEM_OP_UNMAP 3
> /* Opcode to map a hw block */
> #define HL_MEM_OP_MAP_BLOCK 4
> +/* Opcode to create DMA-BUF object for an existing device memory allocation
> + * and to export an FD of that DMA-BUF back to the caller
> + */
> +#define HL_MEM_OP_EXPORT_DMABUF_FD 5
>
> /* Memory flags */
> #define HL_MEM_CONTIGUOUS 0x1
> @@ -878,11 +882,26 @@ struct hl_mem_in {
> /* Virtual address returned from HL_MEM_OP_MAP */
> __u64 device_virt_addr;
> } unmap;
> +
> + /* HL_MEM_OP_EXPORT_DMABUF_FD */
> + struct {
> + /* Handle returned from HL_MEM_OP_ALLOC. In Gaudi,
> + * where we don't have MMU for the device memory, the
> + * driver expects a physical address (instead of
> + * a handle) in the device memory space.
> + */
> + __u64 handle;
> + /* Size of memory allocation. Relevant only for GAUDI */
> + __u64 mem_size;
> + } export_dmabuf_fd;
> };
>
> /* HL_MEM_OP_* */
> __u32 op;
> - /* HL_MEM_* flags */
> + /* HL_MEM_* flags.
> + * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the
> + * DMA-BUF file/FD flags.
> + */
> __u32 flags;
> /* Context ID - Currently not in use */
> __u32 ctx_id;
> @@ -919,6 +938,13 @@ struct hl_mem_out {
>
> __u32 pad;
> };
> +
> + /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the
> + * DMA-BUF object that was created to describe a memory
> + * allocation on the device's memory space. The FD should be
> + * passed to the importer driver
> + */
> + __u64 fd;
> };
> };
>
> --
> 2.25.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch