Most of the other cross-driver gfx infrastructure (dma_buf, dma_fence)
also gets cross posted to all the relevant gfx/memory lists. Doing the
same for ION means people won't miss relevant patches.
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: devel(a)driverdev.osuosl.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org (moderated for non-subscribers)
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 555db72d4eb7..d43cdfca3eb5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -902,6 +902,8 @@ ANDROID ION DRIVER
M: Laura Abbott <labbott(a)redhat.com>
M: Sumit Semwal <sumit.semwal(a)linaro.org>
L: devel(a)driverdev.osuosl.org
+L: dri-devel(a)lists.freedesktop.org
+L: linaro-mm-sig(a)lists.linaro.org (moderated for non-subscribers)
S: Supported
F: drivers/staging/android/ion
F: drivers/staging/android/uapi/ion.h
--
2.16.2
On Thu, Mar 29, 2018 at 09:58:54PM -0400, Jerome Glisse wrote:
> dma_map_resource() is the right API (thought its current implementation
> is fill with x86 assumptions). So i would argue that arch can decide to
> implement it or simply return dma error address which trigger fallback
> path into the caller (at least for GPU drivers). SG variant can be added
> on top.
It isn't in general. It doesn't integrate with scatterlists (see my
comment to page one), and it doesn't integrate with all the subsystems
that also need a kernel virtual address.
Am 29.03.2018 um 18:25 schrieb Logan Gunthorpe:
>
> On 29/03/18 10:10 AM, Christian König wrote:
>> Why not? I mean the dma_map_resource() function is for P2P while other
>> dma_map_* functions are only for system memory.
> Oh, hmm, I wasn't aware dma_map_resource was exclusively for mapping
> P2P. Though it's a bit odd seeing we've been working under the
> assumption that PCI P2P is different as it has to translate the PCI bus
> address. Where as P2P for devices on other buses is a big unknown.
Yeah, completely agree. On my TODO list (but rather far down) is
actually supporting P2P with USB devices.
And no, I don't have the slightest idea how to do this at the moment.
>>> And this is necessary to
>>> check if the DMA ops in use support it or not. We can't have the
>>> dma_map_X() functions do the wrong thing because they don't support it yet.
>> Well that sounds like we should just return an error from
>> dma_map_resources() when an architecture doesn't support P2P yet as Alex
>> suggested.
> Yes, well except in our patch-set we can't easily use
> dma_map_resources() as we either have SGLs to deal with or we need to
> create whole new interfaces to a number of subsystems.
Agree as well. I was also in clear favor of extending the SGLs to have a
flag for this instead of the dma_map_resource() interface, but for some
reason that didn't made it into the kernel.
>> You don't seem to understand the implications: The devices do have a
>> common upstream bridge! In other words your code would currently claim
>> that P2P is supported, but in practice it doesn't work.
> Do they? They don't on any of the Intel machines I'm looking at. The
> previous version of the patchset not only required a common upstream
> bridge but two layers of upstream bridges on both devices which would
> effectively limit transfers to PCIe switches only. But Bjorn did not
> like this.
At least to me that sounds like a good idea, it would at least disable
(the incorrect) auto detection of P2P for such devices.
>> You need to include both drivers which participate in the P2P
>> transaction to make sure that both supports this and give them
>> opportunity to chicken out and in the case of AMD APUs even redirect the
>> request to another location (e.g. participate in the DMA translation).
> I don't think it's the drivers responsibility to reject P2P . The
> topology is what governs support or not. The discussions we had with
> Bjorn settled on if the devices are all behind the same bridge they can
> communicate with each other. This is essentially guaranteed by the PCI spec.
Well it is not only rejecting P2P, see the devices I need to worry about
are essentially part of the CPU. Their resources looks like a PCI BAR to
the BIOS and OS, but are actually backed by stolen system memory.
So as crazy as it sounds what you get is an operation which starts as
P2P, but then the GPU drivers sees it and says: Hey please don't write
that to my PCIe BAR, but rather system memory location X.
>> DMA-buf fortunately seems to handle all this already, that's why we
>> choose it as base for our implementation.
> Well, unfortunately DMA-buf doesn't help for the drivers we are working
> with as neither the block layer nor the RDMA subsystem have any
> interfaces for it.
A fact that gives me quite some sleepless nights as well. I think we
sooner or later need to extend those interfaces to work with DMA-bufs as
well.
I will try to give your patch set a review when I'm back from vacation
and rebase my DMA-buf work on top of that.
Regards,
Christian.
>
> Logan
Am 29.03.2018 um 17:45 schrieb Logan Gunthorpe:
>
> On 29/03/18 05:44 AM, Christian König wrote:
>> Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe:
>>> On 28/03/18 01:44 PM, Christian König wrote:
>>>> Well, isn't that exactly what dma_map_resource() is good for? As far as
>>>> I can see it makes sure IOMMU is aware of the access route and
>>>> translates a CPU address into a PCI Bus address.
>>>> I'm using that with the AMD IOMMU driver and at least there it works
>>>> perfectly fine.
>>> Yes, it would be nice, but no arch has implemented this yet. We are just
>>> lucky in the x86 case because that arch is simple and doesn't need to do
>>> anything for P2P (partially due to the Bus and CPU addresses being the
>>> same). But in the general case, you can't rely on it.
>> Well, that an arch hasn't implemented it doesn't mean that we don't have
>> the right interface to do it.
> Yes, but right now we don't have a performant way to check if we are
> doing P2P or not in the dma_map_X() wrappers.
Why not? I mean the dma_map_resource() function is for P2P while other
dma_map_* functions are only for system memory.
> And this is necessary to
> check if the DMA ops in use support it or not. We can't have the
> dma_map_X() functions do the wrong thing because they don't support it yet.
Well that sounds like we should just return an error from
dma_map_resources() when an architecture doesn't support P2P yet as Alex
suggested.
>> Devices integrated in the CPU usually only "claim" to be PCIe devices.
>> In reality their memory request path go directly through the integrated
>> north bridge. The reason for this is simple better throughput/latency.
> These are just more reasons why our patchset restricts to devices behind
> a switch. And more mess for someone to deal with if they need to relax
> that restriction.
You don't seem to understand the implications: The devices do have a
common upstream bridge! In other words your code would currently claim
that P2P is supported, but in practice it doesn't work.
You need to include both drivers which participate in the P2P
transaction to make sure that both supports this and give them
opportunity to chicken out and in the case of AMD APUs even redirect the
request to another location (e.g. participate in the DMA translation).
DMA-buf fortunately seems to handle all this already, that's why we
choose it as base for our implementation.
Regards,
Christian.
Sorry, didn't mean to drop the lists here. re-adding.
On Wed, Mar 28, 2018 at 4:05 PM, Alex Deucher <alexdeucher(a)gmail.com> wrote:
> On Wed, Mar 28, 2018 at 3:53 PM, Logan Gunthorpe <logang(a)deltatee.com> wrote:
>>
>>
>> On 28/03/18 01:44 PM, Christian König wrote:
>>> Well, isn't that exactly what dma_map_resource() is good for? As far as
>>> I can see it makes sure IOMMU is aware of the access route and
>>> translates a CPU address into a PCI Bus address.
>>
>>> I'm using that with the AMD IOMMU driver and at least there it works
>>> perfectly fine.
>>
>> Yes, it would be nice, but no arch has implemented this yet. We are just
>> lucky in the x86 case because that arch is simple and doesn't need to do
>> anything for P2P (partially due to the Bus and CPU addresses being the
>> same). But in the general case, you can't rely on it.
>
> Could we do something for the arches where it works? I feel like peer
> to peer has dragged out for years because everyone is trying to boil
> the ocean for all arches. There are a huge number of use cases for
> peer to peer on these "simple" architectures which actually represent
> a good deal of the users that want this.
>
> Alex
>
>>
>>>>> Yeah, but not for ours. See if you want to do real peer 2 peer you need
>>>>> to keep both the operation as well as the direction into account.
>>>> Not sure what you are saying here... I'm pretty sure we are doing "real"
>>>> peer 2 peer...
>>>>
>>>>> For example when you can do writes between A and B that doesn't mean
>>>>> that writes between B and A work. And reads are generally less likely to
>>>>> work than writes. etc...
>>>> If both devices are behind a switch then the PCI spec guarantees that A
>>>> can both read and write B and vice versa.
>>>
>>> Sorry to say that, but I know a whole bunch of PCI devices which
>>> horrible ignores that.
>>
>> Can you elaborate? As far as the device is concerned it shouldn't know
>> whether a request comes from a peer or from the host. If it does do
>> crazy stuff like that it's well out of spec. It's up to the switch (or
>> root complex if good support exists) to route the request to the device
>> and it's the root complex that tends to be what drops the load requests
>> which causes the asymmetries.
>>
>> Logan
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx(a)lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe:
>
> On 28/03/18 01:44 PM, Christian König wrote:
>> Well, isn't that exactly what dma_map_resource() is good for? As far as
>> I can see it makes sure IOMMU is aware of the access route and
>> translates a CPU address into a PCI Bus address.
>> I'm using that with the AMD IOMMU driver and at least there it works
>> perfectly fine.
> Yes, it would be nice, but no arch has implemented this yet. We are just
> lucky in the x86 case because that arch is simple and doesn't need to do
> anything for P2P (partially due to the Bus and CPU addresses being the
> same). But in the general case, you can't rely on it.
Well, that an arch hasn't implemented it doesn't mean that we don't have
the right interface to do it.
>>>> Yeah, but not for ours. See if you want to do real peer 2 peer you need
>>>> to keep both the operation as well as the direction into account.
>>> Not sure what you are saying here... I'm pretty sure we are doing "real"
>>> peer 2 peer...
>>>
>>>> For example when you can do writes between A and B that doesn't mean
>>>> that writes between B and A work. And reads are generally less likely to
>>>> work than writes. etc...
>>> If both devices are behind a switch then the PCI spec guarantees that A
>>> can both read and write B and vice versa.
>> Sorry to say that, but I know a whole bunch of PCI devices which
>> horrible ignores that.
> Can you elaborate? As far as the device is concerned it shouldn't know
> whether a request comes from a peer or from the host. If it does do
> crazy stuff like that it's well out of spec. It's up to the switch (or
> root complex if good support exists) to route the request to the device
> and it's the root complex that tends to be what drops the load requests
> which causes the asymmetries.
Devices integrated in the CPU usually only "claim" to be PCIe devices.
In reality their memory request path go directly through the integrated
north bridge. The reason for this is simple better throughput/latency.
That is hidden from the software, for example the BIOS just allocates
address space for the BARs as if it's a normal PCIe device.
The only crux is when you then do peer2peer your request simply go into
nirvana and are not handled by anything because the BARs are only
visible from the CPU side of the northbridge.
Regards,
Christian.
>
> Logan
> _______________________________________________
> amd-gfx mailing list
> amd-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Am 28.03.2018 um 20:57 schrieb Logan Gunthorpe:
>
> On 28/03/18 12:28 PM, Christian König wrote:
>> I'm just using amdgpu as blueprint because I'm the co-maintainer of it
>> and know it mostly inside out.
> Ah, I see.
>
>> The resource addresses are translated using dma_map_resource(). As far
>> as I know that should be sufficient to offload all the architecture
>> specific stuff to the DMA subsystem.
> It's not. The dma_map infrastructure currently has no concept of
> peer-to-peer mappings and is designed for system memory only. No
> architecture I'm aware of will translate PCI CPU addresses into PCI Bus
> addresses which is necessary for any transfer that doesn't go through
> the root complex (though on arches like x86 the CPU and Bus address
> happen to be the same). There's a lot of people that would like to see
> this change but it's likely going to be a long road before it does.
Well, isn't that exactly what dma_map_resource() is good for? As far as
I can see it makes sure IOMMU is aware of the access route and
translates a CPU address into a PCI Bus address.
> Furthermore, one of the reasons our patch-set avoids going through the
> root complex at all is that IOMMU drivers will need to be made aware
> that it is operating on P2P memory and do arch-specific things
> accordingly. There will also need to be flags that indicate whether a
> given IOMMU driver supports this. None of this work is done or easy.
I'm using that with the AMD IOMMU driver and at least there it works
perfectly fine.
>> Yeah, but not for ours. See if you want to do real peer 2 peer you need
>> to keep both the operation as well as the direction into account.
> Not sure what you are saying here... I'm pretty sure we are doing "real"
> peer 2 peer...
>
>> For example when you can do writes between A and B that doesn't mean
>> that writes between B and A work. And reads are generally less likely to
>> work than writes. etc...
> If both devices are behind a switch then the PCI spec guarantees that A
> can both read and write B and vice versa.
Sorry to say that, but I know a whole bunch of PCI devices which
horrible ignores that.
For example all AMD APUs fall under that category...
> Only once you involve root
> complexes do you have this problem. Ie. you have unknown support which
> may be no support, or partial support (stores but not loads); or
> sometimes bad performance; or a combination of both... and you need some
> way to figure out all this mess and that is hard. Whoever tries to
> implement a white list will have to sort all this out.
Yes, exactly and unfortunately it looks like I'm the poor guy who needs
to do this :)
Regards,
Christian.
>
> Logan
> _______________________________________________
> amd-gfx mailing list
> amd-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Am 28.03.2018 um 18:25 schrieb Logan Gunthorpe:
>
> On 28/03/18 10:02 AM, Christian König wrote:
>> Yeah, that looks very similar to what I picked up from the older
>> patches, going to read up on that after my vacation.
> Yeah, I was just reading through your patchset and there are a lot of
> similarities. Though, I'm not sure what you're trying to accomplish as I
> could not find a cover letter and it seems to only enable one driver.
Yeah, it was the last day before my easter vacation and I wanted it out
of the door.
> Is it meant to enable DMA transactions only between two AMD GPUs?
Not really, DMA-buf is a general framework for sharing buffers between
device drivers.
It is widely used in the GFX stack on laptops with both Intel+AMD,
Intel+NVIDIA or AMD+AMD graphics devices.
Additional to that ARM uses it quite massively for their GFX stacks
because they have rendering and displaying device separated.
I'm just using amdgpu as blueprint because I'm the co-maintainer of it
and know it mostly inside out.
> I also don't see where you've taken into account the PCI bus address. On
> some architectures this is not the same as the CPU physical address.
The resource addresses are translated using dma_map_resource(). As far
as I know that should be sufficient to offload all the architecture
specific stuff to the DMA subsystem.
>
>> Just in general why are you interested in the "distance" of the devices?
> We've taken a general approach where some drivers may provide p2p memory
> (ie. an NVMe card or an RDMA NIC) and other drivers make use of it (ie.
> the NVMe-of driver). The orchestrator driver needs to find the most
> applicable provider device for a transaction in a situation that may
> have multiple providers and multiple clients. So the most applicable
> provider is the one that's closest ("distance"-wise) to all the clients
> for the P2P transaction.
That seems to make sense.
>
>> And BTW: At least for writes that Peer 2 Peer transactions between
>> different root complexes work is actually more common than the other way
>> around.
> Maybe on x86 with hardware made in the last few years. But on PowerPC,
> ARM64, and likely a lot more the chance of support is *much* less. Also,
> hardware that only supports P2P stores is hardly full support and is
> insufficient for our needs.
Yeah, but not for ours. See if you want to do real peer 2 peer you need
to keep both the operation as well as the direction into account.
For example when you can do writes between A and B that doesn't mean
that writes between B and A work. And reads are generally less likely to
work than writes. etc...
Since the use case I'm targeting for is GFX or GFX+V4L (or GFX+NIC in
the future) I really need to handle all such use cases as well.
>
>> So I'm a bit torn between using a blacklist or a whitelist. A whitelist
>> is certainly more conservative approach, but that could get a bit long.
> I think a whitelist approach is correct. Given old hardware and other
> architectures, a black list is going to be too long and too difficult to
> comprehensively populate.
Yeah, it would certainly be better if we have something in the root
complex capabilities. But you're right that a whitelist sounds the less
painful way.
Regards,
Christian.
>
> Logan
> _______________________________________________
> amd-gfx mailing list
> amd-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Am 28.03.2018 um 17:47 schrieb Logan Gunthorpe:
>
> On 28/03/18 09:07 AM, Christian König wrote:
>> Am 28.03.2018 um 14:38 schrieb Christoph Hellwig:
>>> On Sun, Mar 25, 2018 at 12:59:54PM +0200, Christian König wrote:
>>>> From: "wdavis(a)nvidia.com" <wdavis(a)nvidia.com>
>>>>
>>>> Add an interface to find the first device which is upstream of both
>>>> devices.
>>> Please work with Logan and base this on top of the outstanding peer
>>> to peer patchset.
>> Can you point me to that? The last code I could find about that was from
>> 2015.
> The latest posted series is here:
>
> https://lkml.org/lkml/2018/3/12/830
>
> However, we've made some significant changes to the area that's similar
> to what you are doing. You can find lasted un-posted here:
>
> https://github.com/sbates130272/linux-p2pmem/tree/pci-p2p-v4-pre2
>
> Specifically this function would be of interest to you:
>
> https://github.com/sbates130272/linux-p2pmem/blob/0e9468ae2a5a5198513dd1299…
>
> However, the difference between what we are doing is that we are
> interested in the distance through the common upstream device and you
> appear to be finding the actual common device.
Yeah, that looks very similar to what I picked up from the older
patches, going to read up on that after my vacation.
Just in general why are you interested in the "distance" of the devices?
And BTW: At least for writes that Peer 2 Peer transactions between
different root complexes work is actually more common than the other way
around.
So I'm a bit torn between using a blacklist or a whitelist. A whitelist
is certainly more conservative approach, but that could get a bit long.
Thanks,
Christian.
>
> Thanks,
>
> Logan