Am 08.01.25 um 20:22 schrieb Xu Yilun:
On Wed, Jan 08, 2025 at 07:44:54PM +0100, Simona Vetter wrote:
On Wed, Jan 08, 2025 at 12:22:27PM -0400, Jason Gunthorpe wrote:
On Wed, Jan 08, 2025 at 04:25:54PM +0100, Christian König wrote:
Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
I have imagined a staged approach were DMABUF gets a new API that works with the new DMA API to do importer mapping with "P2P source information" and a gradual conversion.
To make it clear as maintainer of that subsystem I would reject such a step with all I have.
This is unexpected, so you want to just leave dmabuf broken? Do you have any plan to fix it, to fix the misuse of the DMA API, and all the problems I listed below? This is a big deal, it is causing real problems today.
If it going to be like this I think we will stop trying to use dmabuf and do something simpler for vfio/kvm/iommufd :(
As the gal who help edit the og dma-buf spec 13 years ago, I think adding pfn isn't a terrible idea. By design, dma-buf is the "everything is optional" interface. And in the beginning, even consistent locking was optional, but we've managed to fix that by now :-/
Well you were also the person who mangled the struct page pointers in the scatterlist because people were abusing this and getting a bloody nose :)
Where I do agree with Christian is that stuffing pfn support into the dma_buf_attachment interfaces feels a bit much wrong.
So it could a dmabuf interface like mmap/vmap()? I was also wondering about that. But finally I start to use dma_buf_attachment interface because of leveraging existing buffer pin and move_notify.
Exactly that's the point, sharing pfn doesn't work with the pin and move_notify interfaces because of the MMU notifier approach Sima mentioned.
We have already gone down that road and it didn't worked at all and was a really big pain to pull people back from it.
Nobody has really seriously tried to improve the DMA API before, so I don't think this is true at all.
Aside, I really hope this finally happens!
Sorry my fault. I was not talking about the DMA API, but rather that people tried to look behind the curtain of DMA-buf backing stores.
In other words all the fun we had with scatterlists and that people try to modify the struct pages inside of them.
Improving the DMA API is something I really really hope for as well.
- Importing devices need to know if they are working with PCI P2P
addresses during mapping because they need to do things like turn on ATS on their DMA. As for multi-path we have the same hacks inside mlx5 today that assume DMABUFs are always P2P because we cannot determine if things are P2P or not after being DMA mapped.
Why would you need ATS on PCI P2P and not for system memory accesses?
ATS has a significant performance cost. It is mandatory for PCI P2P, but ideally should be avoided for CPU memory.
Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p stuff a bit I guess ...
Hui? Why should ATS be mandatory for PCI P2P?
We have tons of production systems using PCI P2P without ATS. And it's the first time I hear that.
- iommufd and kvm are both using CPU addresses without DMA. No
exporter mapping is possible
We have customers using both KVM and XEN with DMA-buf, so I can clearly confirm that this isn't true.
Today they are mmaping the dma-buf into a VMA and then using KVM's follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable dma-buf must have a CPU PFN.
Here Xu implements basically the same path, except without the VMA indirection, and it suddenly not OK? Illogical.
So the big difference is that for follow_pfn() you need mmu_notifier since the mmap might move around, whereas with pfn smashed into dma_buf_attachment you need dma_resv_lock rules, and the move_notify callback if you go dynamic.
So I guess my first question is, which locking rules do you want here for pfn importers?
follow_pfn() is unwanted for private MMIO, so dma_resv_lock.
As Sima explained you either have follow_pfn() and mmu_notifier or you have DMA addresses and dma_resv lock / dma_fence.
Just giving out PFNs without some lifetime associated with them is one of the major problems we faced before and really not something you can do.
If mmu notifiers is fine, then I think the current approach of follow_pfn should be ok. But if you instead dma_resv_lock rules (or the cpu mmap somehow is an issue itself), then I think the clean design is create a new
cpu mmap() is an issue, this series is aimed to eliminate userspace mapping for private MMIO resources.
Why?
separate access mechanism just for that. It would be the 5th or so (kernel vmap, userspace mmap, dma_buf_attach and driver private stuff like virtio_dma_buf.c where you access your buffer with a uuid), so really not a big deal.
OK, will think more about that.
Please note that we have follow_pfn() + mmu_notifier working for KVM/XEN with MMIO mappings and P2P. And that required exactly zero DMA-buf changes :)
I don't fully understand your use case, but I think it's quite likely that we already have that working.
Regards, Christian.
Thanks, Yilun
And for non-contrived exporters we might be able to implement the other access methods in terms of the pfn method generically, so this wouldn't even be a terrible maintenance burden going forward. And meanwhile all the contrived exporters just keep working as-is.
The other part is that cpu mmap is optional, and there's plenty of strange exporters who don't implement. But you can dma map the attachment into plenty devices. This tends to mostly be a thing on SoC devices with some very funky memory. But I guess you don't care about these use-case, so should be ok.
I couldn't come up with a good name for these pfn users, maybe dma_buf_pfn_attachment? This does _not_ have a struct device, but maybe some of these new p2p source specifiers (or a list of those which are allowed, no idea how this would need to fit into the new dma api).
Cheers, Sima
Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Answering on my reply once more as pure text mail.
I love AMDs mail servers.
Cheers, Christian.
Am 09.01.25 um 09:04 schrieb Christian König:
Am 08.01.25 um 20:22 schrieb Xu Yilun:
On Wed, Jan 08, 2025 at 07:44:54PM +0100, Simona Vetter wrote:
On Wed, Jan 08, 2025 at 12:22:27PM -0400, Jason Gunthorpe wrote:
On Wed, Jan 08, 2025 at 04:25:54PM +0100, Christian König wrote:
Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
I have imagined a staged approach were DMABUF gets a new API that works with the new DMA API to do importer mapping with "P2P source information" and a gradual conversion.
To make it clear as maintainer of that subsystem I would reject such a step with all I have.
This is unexpected, so you want to just leave dmabuf broken? Do you have any plan to fix it, to fix the misuse of the DMA API, and all the problems I listed below? This is a big deal, it is causing real problems today.
If it going to be like this I think we will stop trying to use dmabuf and do something simpler for vfio/kvm/iommufd :(
As the gal who help edit the og dma-buf spec 13 years ago, I think adding pfn isn't a terrible idea. By design, dma-buf is the "everything is optional" interface. And in the beginning, even consistent locking was optional, but we've managed to fix that by now :-/
Well you were also the person who mangled the struct page pointers in the scatterlist because people were abusing this and getting a bloody nose :)
Where I do agree with Christian is that stuffing pfn support into the dma_buf_attachment interfaces feels a bit much wrong.
So it could a dmabuf interface like mmap/vmap()? I was also wondering about that. But finally I start to use dma_buf_attachment interface because of leveraging existing buffer pin and move_notify.
Exactly that's the point, sharing pfn doesn't work with the pin and move_notify interfaces because of the MMU notifier approach Sima mentioned.
We have already gone down that road and it didn't worked at all and was a really big pain to pull people back from it.
Nobody has really seriously tried to improve the DMA API before, so I don't think this is true at all.
Aside, I really hope this finally happens!
Sorry my fault. I was not talking about the DMA API, but rather that people tried to look behind the curtain of DMA-buf backing stores.
In other words all the fun we had with scatterlists and that people try to modify the struct pages inside of them.
Improving the DMA API is something I really really hope for as well.
- Importing devices need to know if they are working with PCI P2P
addresses during mapping because they need to do things like turn on ATS on their DMA. As for multi-path we have the same hacks inside mlx5 today that assume DMABUFs are always P2P because we cannot determine if things are P2P or not after being DMA mapped.
Why would you need ATS on PCI P2P and not for system memory accesses?
ATS has a significant performance cost. It is mandatory for PCI P2P, but ideally should be avoided for CPU memory.
Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p stuff a bit I guess ...
Hui? Why should ATS be mandatory for PCI P2P?
We have tons of production systems using PCI P2P without ATS. And it's the first time I hear that.
- iommufd and kvm are both using CPU addresses without DMA. No
exporter mapping is possible
We have customers using both KVM and XEN with DMA-buf, so I can clearly confirm that this isn't true.
Today they are mmaping the dma-buf into a VMA and then using KVM's follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable dma-buf must have a CPU PFN.
Here Xu implements basically the same path, except without the VMA indirection, and it suddenly not OK? Illogical.
So the big difference is that for follow_pfn() you need mmu_notifier since the mmap might move around, whereas with pfn smashed into dma_buf_attachment you need dma_resv_lock rules, and the move_notify callback if you go dynamic.
So I guess my first question is, which locking rules do you want here for pfn importers?
follow_pfn() is unwanted for private MMIO, so dma_resv_lock.
As Sima explained you either have follow_pfn() and mmu_notifier or you have DMA addresses and dma_resv lock / dma_fence.
Just giving out PFNs without some lifetime associated with them is one of the major problems we faced before and really not something you can do.
If mmu notifiers is fine, then I think the current approach of follow_pfn should be ok. But if you instead dma_resv_lock rules (or the cpu mmap somehow is an issue itself), then I think the clean design is create a new
cpu mmap() is an issue, this series is aimed to eliminate userspace mapping for private MMIO resources.
Why?
separate access mechanism just for that. It would be the 5th or so (kernel vmap, userspace mmap, dma_buf_attach and driver private stuff like virtio_dma_buf.c where you access your buffer with a uuid), so really not a big deal.
OK, will think more about that.
Please note that we have follow_pfn() + mmu_notifier working for KVM/XEN with MMIO mappings and P2P. And that required exactly zero DMA-buf changes :)
I don't fully understand your use case, but I think it's quite likely that we already have that working.
Regards, Christian.
Thanks, Yilun
And for non-contrived exporters we might be able to implement the other access methods in terms of the pfn method generically, so this wouldn't even be a terrible maintenance burden going forward. And meanwhile all the contrived exporters just keep working as-is.
The other part is that cpu mmap is optional, and there's plenty of strange exporters who don't implement. But you can dma map the attachment into plenty devices. This tends to mostly be a thing on SoC devices with some very funky memory. But I guess you don't care about these use-case, so should be ok.
I couldn't come up with a good name for these pfn users, maybe dma_buf_pfn_attachment? This does _not_ have a struct device, but maybe some of these new p2p source specifiers (or a list of those which are allowed, no idea how this would need to fit into the new dma api).
Cheers, Sima
Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Thu, Jan 09, 2025 at 09:09:46AM +0100, Christian König wrote:
Answering on my reply once more as pure text mail.
It is hard to do anything with your HTML mails :\
Well you were also the person who mangled the struct page pointers in the scatterlist because people were abusing this and getting a bloody nose :)
But alot of this is because scatterlist is too limited, you actually can't correctly describe anything except struct page backed CPU memory in a scatterlist.
As soon as we can correctly describe everything in a datastructure these issues go away - or at least turn into a compatability exchange problem.
Where I do agree with Christian is that stuffing pfn support into the dma_buf_attachment interfaces feels a bit much wrong.
So it could a dmabuf interface like mmap/vmap()? I was also wondering about that. But finally I start to use dma_buf_attachment interface because of leveraging existing buffer pin and move_notify.
Exactly that's the point, sharing pfn doesn't work with the pin and move_notify interfaces because of the MMU notifier approach Sima mentioned.
Huh?
mmu notifiers are for tracking changes to VMAs
pin/move_notify are for tracking changes the the underlying memory of a DMABUF.
How does sharing the PFN vs DMA addre effect the pin/move_notify lifetime rules at all?
> 3) Importing devices need to know if they are working with PCI P2P > addresses during mapping because they need to do things like turn on > ATS on their DMA. As for multi-path we have the same hacks inside mlx5 > today that assume DMABUFs are always P2P because we cannot determine > if things are P2P or not after being DMA mapped. Why would you need ATS on PCI P2P and not for system memory accesses?
ATS has a significant performance cost. It is mandatory for PCI P2P, but ideally should be avoided for CPU memory.
Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p stuff a bit I guess ...
Hui? Why should ATS be mandatory for PCI P2P?
I should say "mandatory on some configurations"
If you need the iommu turned on, and you have a PCI switch in your path, then ATS allows you to have full P2P bandwidth and retain full IOMMU security.
We have tons of production systems using PCI P2P without ATS. And it's the first time I hear that.
It is situational and topologically dependent. We have very large number of deployed systems now that rely on ATS for PCI P2P.
As Sima explained you either have follow_pfn() and mmu_notifier or you have DMA addresses and dma_resv lock / dma_fence.
Just giving out PFNs without some lifetime associated with them is one of the major problems we faced before and really not something you can do.
Certainly I never imagined there would be no liftime, I expect anything coming out of the dmabuf interface to use the dma_resv lock, fence and move_notify for lifetime managament, regardless of how the target memory is described.
separate access mechanism just for that. It would be the 5th or so (kernel vmap, userspace mmap, dma_buf_attach and driver private stuff like virtio_dma_buf.c where you access your buffer with a uuid), so really not a big deal.
OK, will think more about that.
Please note that we have follow_pfn() + mmu_notifier working for KVM/XEN with MMIO mappings and P2P. And that required exactly zero DMA-buf changes :)
I don't fully understand your use case, but I think it's quite likely that we already have that working.
In Intel CC systems you cannot mmap secure memory or the system will take a machine check.
You have to convey secure memory inside a FD entirely within the kernel so that only an importer that understands how to handle secure memory (such as KVM) is using it to avoid machine checking.
The patch series here should be thought of as the first part of this, allowing PFNs to flow without VMAs. IMHO the second part of preventing machine checks is not complete.
In the approach I have been talking about the secure memory would be represented by a p2p_provider structure that is incompatible with everything else. For instance importers that can only do DMA would simply cleanly fail when presented with this memory.
Jason
Am 10.01.25 um 21:54 schrieb Jason Gunthorpe:
[SNIP]
I don't fully understand your use case, but I think it's quite likely that we already have that working.
In Intel CC systems you cannot mmap secure memory or the system will take a machine check.
You have to convey secure memory inside a FD entirely within the kernel so that only an importer that understands how to handle secure memory (such as KVM) is using it to avoid machine checking.
The patch series here should be thought of as the first part of this, allowing PFNs to flow without VMAs. IMHO the second part of preventing machine checks is not complete.
In the approach I have been talking about the secure memory would be represented by a p2p_provider structure that is incompatible with everything else. For instance importers that can only do DMA would simply cleanly fail when presented with this memory.
That's a rather interesting use case, but not something I consider fitting for the DMA-buf interface.
See DMA-buf in meant to be used between drivers to allow DMA access on shared buffers.
What you try to do here instead is to give memory in the form of a file descriptor to a client VM to do things like CPU mapping and giving it to drivers to do DMA etc...
As far as I can see this memory is secured by some kind of MMU which makes sure that even the host CPU can't access it without causing a machine check exception.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Regards, Christian.
Jason
On Wed, Jan 15, 2025 at 10:38:00AM +0100, Christian König wrote:
Am 10.01.25 um 21:54 schrieb Jason Gunthorpe:
[SNIP]
I don't fully understand your use case, but I think it's quite likely that we already have that working.
In Intel CC systems you cannot mmap secure memory or the system will take a machine check.
You have to convey secure memory inside a FD entirely within the kernel so that only an importer that understands how to handle secure memory (such as KVM) is using it to avoid machine checking.
The patch series here should be thought of as the first part of this, allowing PFNs to flow without VMAs. IMHO the second part of preventing machine checks is not complete.
In the approach I have been talking about the secure memory would be represented by a p2p_provider structure that is incompatible with everything else. For instance importers that can only do DMA would simply cleanly fail when presented with this memory.
That's a rather interesting use case, but not something I consider fitting for the DMA-buf interface.
To recast the problem statement, it is basically the same as your device private interconnects. There are certain devices that understand how to use this memory, and if they work together they can access it.
See DMA-buf in meant to be used between drivers to allow DMA access on shared buffers.
They are shared, just not with everyone :)
What you try to do here instead is to give memory in the form of a file descriptor to a client VM to do things like CPU mapping and giving it to drivers to do DMA etc...
How is this paragraph different from the first? It is a shared buffer that we want real DMA and CPU "DMA" access to. It is "private" so things that don't understand the interconnect rules cannot access it.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Jason
Am 15.01.25 um 14:38 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 10:38:00AM +0100, Christian König wrote:
Am 10.01.25 um 21:54 schrieb Jason Gunthorpe:
[SNIP]
I don't fully understand your use case, but I think it's quite likely that we already have that working.
In Intel CC systems you cannot mmap secure memory or the system will take a machine check.
You have to convey secure memory inside a FD entirely within the kernel so that only an importer that understands how to handle secure memory (such as KVM) is using it to avoid machine checking.
The patch series here should be thought of as the first part of this, allowing PFNs to flow without VMAs. IMHO the second part of preventing machine checks is not complete.
In the approach I have been talking about the secure memory would be represented by a p2p_provider structure that is incompatible with everything else. For instance importers that can only do DMA would simply cleanly fail when presented with this memory.
That's a rather interesting use case, but not something I consider fitting for the DMA-buf interface.
To recast the problem statement, it is basically the same as your device private interconnects. There are certain devices that understand how to use this memory, and if they work together they can access it.
See DMA-buf in meant to be used between drivers to allow DMA access on shared buffers.
They are shared, just not with everyone :)
What you try to do here instead is to give memory in the form of a file descriptor to a client VM to do things like CPU mapping and giving it to drivers to do DMA etc...
How is this paragraph different from the first? It is a shared buffer that we want real DMA and CPU "DMA" access to. It is "private" so things that don't understand the interconnect rules cannot access it.
Yeah, but it's private to the exporter. And a very fundamental rule of DMA-buf is that the exporter is the one in control of things.
So for example it is illegal for an importer to setup CPU mappings to a buffer. That's why we have dma_buf_mmap() which redirects mmap() requests from the importer to the exporter.
In your use case here the importer wants to be in control and do things like both CPU as well as DMA mappings.
As far as I can see that is really not an use case which fits DMA-buf in any way.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Why?
Regards, Christian.
Jason
Explicitly replying as text mail once more.
I just love the AMD mails servers :(
Christian.
Am 15.01.25 um 14:45 schrieb Christian König:
Am 15.01.25 um 14:38 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 10:38:00AM +0100, Christian König wrote:
Am 10.01.25 um 21:54 schrieb Jason Gunthorpe:
[SNIP]
I don't fully understand your use case, but I think it's quite likely that we already have that working.
In Intel CC systems you cannot mmap secure memory or the system will take a machine check.
You have to convey secure memory inside a FD entirely within the kernel so that only an importer that understands how to handle secure memory (such as KVM) is using it to avoid machine checking.
The patch series here should be thought of as the first part of this, allowing PFNs to flow without VMAs. IMHO the second part of preventing machine checks is not complete.
In the approach I have been talking about the secure memory would be represented by a p2p_provider structure that is incompatible with everything else. For instance importers that can only do DMA would simply cleanly fail when presented with this memory.
That's a rather interesting use case, but not something I consider fitting for the DMA-buf interface.
To recast the problem statement, it is basically the same as your device private interconnects. There are certain devices that understand how to use this memory, and if they work together they can access it.
See DMA-buf in meant to be used between drivers to allow DMA access on shared buffers.
They are shared, just not with everyone :)
What you try to do here instead is to give memory in the form of a file descriptor to a client VM to do things like CPU mapping and giving it to drivers to do DMA etc...
How is this paragraph different from the first? It is a shared buffer that we want real DMA and CPU "DMA" access to. It is "private" so things that don't understand the interconnect rules cannot access it.
Yeah, but it's private to the exporter. And a very fundamental rule of DMA-buf is that the exporter is the one in control of things.
So for example it is illegal for an importer to setup CPU mappings to a buffer. That's why we have dma_buf_mmap() which redirects mmap() requests from the importer to the exporter.
In your use case here the importer wants to be in control and do things like both CPU as well as DMA mappings.
As far as I can see that is really not an use case which fits DMA-buf in any way.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Why?
Regards, Christian.
Jason
On Wed, Jan 15, 2025 at 02:46:56PM +0100, Christian König wrote:
Explicitly replying as text mail once more.
I just love the AMD mails servers :(
:( This is hard
Yeah, but it's private to the exporter. And a very fundamental rule of DMA-buf is that the exporter is the one in control of things.
I've said a few times now, I don't think we can build the kind of buffer sharing framework we need to solve all the problems with this philosophy. It is also inefficient with the new DMA API.
I think it is backwards looking and we need to move forwards with fixing the fundamental API issues which motivated that design.
So for example it is illegal for an importer to setup CPU mappings to a buffer. That's why we have dma_buf_mmap() which redirects mmap() requests from the importer to the exporter.
Like this, in a future no-scatter list world I would want to make this safe. The importer will have enough information to know if CPU mappings exist and are safe to use under what conditions.
There is no reason the importer should not be able to CPU access memory that is HW permitted to be CPU accessible.
If the importer needs CPU access and the exporter cannot provide it then the attachment simply fails.
Saying CPU access is banned 100% of the time is not a helpful position when we have use cases that need it.
As far as I can see that is really not an use case which fits DMA-buf in any way.
I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Why?
The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
This is about confidential MMIO memory.
This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved.
Jason
Am 15.01.25 um 15:14 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 02:46:56PM +0100, Christian König wrote: [SNIP]
Yeah, but it's private to the exporter. And a very fundamental rule of DMA-buf is that the exporter is the one in control of things.
I've said a few times now, I don't think we can build the kind of buffer sharing framework we need to solve all the problems with this philosophy. It is also inefficient with the new DMA API.
I think it is backwards looking and we need to move forwards with fixing the fundamental API issues which motivated that design.
And that's what I clearly see completely different.
Those rules are not something we cam up with because of some limitation of the DMA-API, but rather from experience working with different device driver and especially their developers.
Applying and enforcing those restrictions is absolutely mandatory must have for extending DMA-buf.
So for example it is illegal for an importer to setup CPU mappings to a buffer. That's why we have dma_buf_mmap() which redirects mmap() requests from the importer to the exporter.
Like this, in a future no-scatter list world I would want to make this safe. The importer will have enough information to know if CPU mappings exist and are safe to use under what conditions.
There is no reason the importer should not be able to CPU access memory that is HW permitted to be CPU accessible.
If the importer needs CPU access and the exporter cannot provide it then the attachment simply fails.
Saying CPU access is banned 100% of the time is not a helpful position when we have use cases that need it.
That approach is an absolutely no-go from my side.
We have fully intentionally implemented the restriction that importers can't CPU access DMA-buf for both kernel and userspace without going through the exporter because of design requirements and a lot of negative experience with exactly this approach.
This is not something which is discuss-able in any way possible.
As far as I can see that is really not an use case which fits DMA-buf in any way.
I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities.
The point is that a dmabuf2 would most likely be rejected as well or otherwise run into the same issues we have seen before.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Why?
The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
Where is that coming from and how it is used?
This is about confidential MMIO memory.
Who is the exporter and who is the importer of the DMA-buf in this use case?
This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved.
Yes and it is documented by now how that is supposed to happen with DMA-buf.
As far as I can see there is not much new approach here.
Regards, Christian.
Jason
Sending it as text mail to the mailing lists once more :(
Christian.
Am 15.01.25 um 15:29 schrieb Christian König:
Am 15.01.25 um 15:14 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 02:46:56PM +0100, Christian König wrote: [SNIP]
Yeah, but it's private to the exporter. And a very fundamental rule of DMA-buf is that the exporter is the one in control of things.
I've said a few times now, I don't think we can build the kind of buffer sharing framework we need to solve all the problems with this philosophy. It is also inefficient with the new DMA API.
I think it is backwards looking and we need to move forwards with fixing the fundamental API issues which motivated that design.
And that's what I clearly see completely different.
Those rules are not something we cam up with because of some limitation of the DMA-API, but rather from experience working with different device driver and especially their developers.
Applying and enforcing those restrictions is absolutely mandatory must have for extending DMA-buf.
So for example it is illegal for an importer to setup CPU mappings to a buffer. That's why we have dma_buf_mmap() which redirects mmap() requests from the importer to the exporter.
Like this, in a future no-scatter list world I would want to make this safe. The importer will have enough information to know if CPU mappings exist and are safe to use under what conditions.
There is no reason the importer should not be able to CPU access memory that is HW permitted to be CPU accessible.
If the importer needs CPU access and the exporter cannot provide it then the attachment simply fails.
Saying CPU access is banned 100% of the time is not a helpful position when we have use cases that need it.
That approach is an absolutely no-go from my side.
We have fully intentionally implemented the restriction that importers can't CPU access DMA-buf for both kernel and userspace without going through the exporter because of design requirements and a lot of negative experience with exactly this approach.
This is not something which is discuss-able in any way possible.
As far as I can see that is really not an use case which fits DMA-buf in any way.
I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities.
The point is that a dmabuf2 would most likely be rejected as well or otherwise run into the same issues we have seen before.
That sounds more something for the TEE driver instead of anything DMA-buf should be dealing with.
Has nothing to do with TEE.
Why?
The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
Where is that coming from and how it is used?
This is about confidential MMIO memory.
Who is the exporter and who is the importer of the DMA-buf in this use case?
This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved.
Yes and it is documented by now how that is supposed to happen with DMA-buf.
As far as I can see there is not much new approach here.
Regards, Christian.
Jason
On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:
Those rules are not something we cam up with because of some limitation of the DMA-API, but rather from experience working with different device driver and especially their developers.
I would say it stems from the use of scatter list. You do not have enough information exchanged between exporter and importer to implement something sane and correct. At that point being restrictive is a reasonable path.
Because of scatterlist developers don't have APIs that correctly solve the problems they want to solve, so of course things get into a mess.
Applying and enforcing those restrictions is absolutely mandatory must have for extending DMA-buf.
You said to come to the maintainers with the problems, here are the problems. Your answer is don't use dmabuf.
That doesn't make the problems go away :(
I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities.
The point is that a dmabuf2 would most likely be rejected as well or otherwise run into the same issues we have seen before.
You'd need to be much more concrete and technical in your objections to cause a rejection. "We tried something else before and it didn't work" won't cut it.
There is a very simple problem statement here, we need a FD handle for various kinds of memory, with a lifetime model that fits a couple of different use cases. The exporter and importer need to understand what type of memory it is and what rules apply to working with it. The required importers are more general that just simple PCI DMA.
I feel like this is already exactly DMABUF's mission.
Besides, you have been saying to go do this in TEE or whatever, how is that any different from dmabuf2?
> That sounds more something for the TEE driver instead of anything DMA-buf > should be dealing with. Has nothing to do with TEE.
Why?
The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
Where is that coming from and how it is used?
What do you mean? guest memfd is the result of years of negotiation in the mm and x86 arch subsystems :( It is used like a normal memfd, and we now have APIs in KVM and iommufd to directly intake and map from a memfd. I expect guestmemfd will soon grow some more generic dmabuf-like lifetime callbacks to avoid pinning - it already has some KVM specific APIs IIRC.
But it is 100% exclusively focused on CPU memory and nothing else.
This is about confidential MMIO memory.
Who is the exporter and who is the importer of the DMA-buf in this use case?
In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.
This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved.
Yes and it is documented by now how that is supposed to happen with DMA-buf.
I doubt that. It is complex and not fully solved in the core code today. Many scenarios do not work correctly, devices don't even exist yet that can exercise the hard paths. This is a future problem :(
Jason
Am 15.01.25 um 16:10 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:
Those rules are not something we cam up with because of some limitation of the DMA-API, but rather from experience working with different device driver and especially their developers.
I would say it stems from the use of scatter list. You do not have enough information exchanged between exporter and importer to implement something sane and correct. At that point being restrictive is a reasonable path.
Because of scatterlist developers don't have APIs that correctly solve the problems they want to solve, so of course things get into a mess.
Well I completely agree that scatterlists have many many problems. And at least some of the stuff you note here sounds like a good idea to tackle those problems.
But I'm trying to explain the restrictions and requirements we previously found necessary. And I strongly think that any new approach needs to respect those restrictions as well or otherwise we will just repeat history.
Applying and enforcing those restrictions is absolutely mandatory must have for extending DMA-buf.
You said to come to the maintainers with the problems, here are the problems. Your answer is don't use dmabuf.
That doesn't make the problems go away :(
Yeah, that's why I'm desperately trying to understand your use case.
I really don't want to make a dmabuf2 - everyone would have to implement it, including all the GPU drivers if they want to work with RDMA. I don't think this makes any sense compared to incrementally evolving dmabuf with more optional capabilities.
The point is that a dmabuf2 would most likely be rejected as well or otherwise run into the same issues we have seen before.
You'd need to be much more concrete and technical in your objections to cause a rejection. "We tried something else before and it didn't work" won't cut it.
Granted, let me try to improve this.
Here is a real world example of one of the issues we ran into and why CPU mappings of importers are redirected to the exporter.
We have a good bunch of different exporters who track the CPU mappings of their backing store using address_space objects in one way or another and then uses unmap_mapping_range() to invalidate those CPU mappings.
But when importers get the PFNs of the backing store they can look behind the curtain and directly insert this PFN into the CPU page tables.
We had literally tons of cases like this where drivers developers cause access after free issues because the importer created a CPU mappings on their own without the exporter knowing about it.
This is just one example of what we ran into. Additional to that basically the whole synchronization between drivers was overhauled as well because we found that we can't trust importers to always do the right thing.
There is a very simple problem statement here, we need a FD handle for various kinds of memory, with a lifetime model that fits a couple of different use cases. The exporter and importer need to understand what type of memory it is and what rules apply to working with it. The required importers are more general that just simple PCI DMA.
I feel like this is already exactly DMABUF's mission.
Besides, you have been saying to go do this in TEE or whatever, how is that any different from dmabuf2?
You can already turn both a TEE allocated buffer as well as a memfd into a DMA-buf. So basically TEE and memfd already provides different interfaces which go beyond what DMA-buf does and allows.
In other words if you want to do things like direct I/O to block or network devices you can mmap() your memfd and do this while at the same time send your memfd as DMA-buf to your GPU, V4L or neural accelerator.
Would this be a way you could work with as well? E.g. you have your separate file descriptor representing the private MMIO which iommufd and KVM uses but you can turn it into a DMA-buf whenever you need to give it to a DMA-buf importer?
>> That sounds more something for the TEE driver instead of anything DMA-buf >> should be dealing with. > Has nothing to do with TEE. Why?
The Linux TEE framework is not used as part of confidential compute.
CC already has guest memfd for holding it's private CPU memory.
Where is that coming from and how it is used?
What do you mean? guest memfd is the result of years of negotiation in the mm and x86 arch subsystems :( It is used like a normal memfd, and we now have APIs in KVM and iommufd to directly intake and map from a memfd. I expect guestmemfd will soon grow some more generic dmabuf-like lifetime callbacks to avoid pinning - it already has some KVM specific APIs IIRC.
But it is 100% exclusively focused on CPU memory and nothing else.
I have seen patches for that flying by on mailing lists and have a high level understand of what's supposed to do, but never really looked more deeply into the code.
This is about confidential MMIO memory.
Who is the exporter and who is the importer of the DMA-buf in this use case?
In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.
So basically a portion of a PCIe BAR is imported into iommufd?
This is also not just about the KVM side, the VM side also has issues with DMABUF and CC - only co-operating devices can interact with the VM side "encrypted" memory and there needs to be a negotiation as part of all buffer setup what the mutual capability is. :\ swiotlb hides some of this some times, but confidential P2P is currently unsolved.
Yes and it is documented by now how that is supposed to happen with DMA-buf.
I doubt that. It is complex and not fully solved in the core code today. Many scenarios do not work correctly, devices don't even exist yet that can exercise the hard paths. This is a future problem :(
Let's just say that both the ARM guys as well as the GPU people already have some pretty "interesting" ways of doing digital rights management and content protection.
Regards, Christian.
Jason
On Wed, Jan 15, 2025 at 05:34:23PM +0100, Christian König wrote:
Granted, let me try to improve this. Here is a real world example of one of the issues we ran into and why CPU mappings of importers are redirected to the exporter. We have a good bunch of different exporters who track the CPU mappings of their backing store using address_space objects in one way or another and then uses unmap_mapping_range() to invalidate those CPU mappings. But when importers get the PFNs of the backing store they can look behind the curtain and directly insert this PFN into the CPU page tables. We had literally tons of cases like this where drivers developers cause access after free issues because the importer created a CPU mappings on their own without the exporter knowing about it. This is just one example of what we ran into. Additional to that basically the whole synchronization between drivers was overhauled as well because we found that we can't trust importers to always do the right thing.
But this, fundamentally, is importers creating attachments and then *ignoring the lifetime rules of DMABUF*. If you created an attachment, got a move and *ignored the move* because you put the PFN in your own VMA, then you are not following the attachment lifetime rules!
To implement this safely the driver would need to use unma_mapping_range() on the driver VMA inside the move callback, and hook into the VMA fault callback to re-attach the dmabuf.
This is where I get into trouble with your argument. It is not that the API has an issue, or that the rules of the API are not logical and functional.
You are arguing that even a logical and functional API will be mis-used by some people and that reviewers will not catch it.
Honestly, I don't think that is consistent with the kernel philosophy.
We should do our best to make APIs that are had to mis-use, but if we can't achieve that it doesn't mean we stop and give up on problems, we go into the world of APIs that can be mis-used and we are supposed to rely on the reviewer system to catch it.
You can already turn both a TEE allocated buffer as well as a memfd into a DMA-buf. So basically TEE and memfd already provides different interfaces which go beyond what DMA-buf does and allows.
In other words if you want to do things like direct I/O to block or network devices you can mmap() your memfd and do this while at the same time send your memfd as DMA-buf to your GPU, V4L or neural accelerator. Would this be a way you could work with as well?
I guess, but this still requires creating a dmabuf2 type thing with very similar semantics and then shimming dmabuf2 to 1 for DRM consumers.
I don't see how it addresses your fundamental concern that the semantics we want are an API that is too easy for drivers to abuse.
And being more functional and efficient we'd just see people wanting to use dmabuf2 directly instead of bothering with 1.
separate file descriptor representing the private MMIO which iommufd and KVM uses but you can turn it into a DMA-buf whenever you need to give it to a DMA-buf importer?
Well, it would end up just being used everywhere. I think one person wanted to use this with DRM drivers for some reason, but RDMA would use the dmabuf2 directly because it will be much more efficient than using scatterlist.
Honestly, I'd much rather extend dmabuf and see DRM institute some rule that DRM drivers may not use XYZ parts of the improvement. Like maybe we could use some symbol namespaces to really enforce it eg. MODULE_IMPORT_NS(DMABUF_NOT_FOR_DRM_USAGE)
Some of the improvements we want like the revoke rules for lifetime seem to be agreeable.
Block the API that gives you the non-scatterlist attachment. Only VFIO/RDMA/kvm/iommufd will get to implement it.
In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.
So basically a portion of a PCIe BAR is imported into iommufd?
And KVM. We need to get the CPU address into KVM and IOMMU page tables. It must go through a private FD path and not a VMA because of the CC rules about machine check I mentioned earlier. The private FD must have a lifetime model to ensure we don't UAF the PCIe BAR memory.
Someone else had some use case where they wanted to put the VFIO MMIO PCIe BAR into a DMABUF and ship it into a GPU driver for somethingsomething virtualization but I didn't understand it.
Let's just say that both the ARM guys as well as the GPU people already have some pretty "interesting" ways of doing digital rights management and content protection.
Well, that is TEE stuff, TEE and CC are not the same thing, though they have some high level conceptual overlap.
In a certain sense CC is a TEE that is built using KVM instead of the TEE subsystem. Using KVM and integrating with the MM brings a whole set of unique challenges that TEE got to avoid..
Jason
Am 15.01.25 um 18:09 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 05:34:23PM +0100, Christian König wrote:
Granted, let me try to improve this. Here is a real world example of one of the issues we ran into and why CPU mappings of importers are redirected to the exporter. We have a good bunch of different exporters who track the CPU mappings of their backing store using address_space objects in one way or another and then uses unmap_mapping_range() to invalidate those CPU mappings. But when importers get the PFNs of the backing store they can look behind the curtain and directly insert this PFN into the CPU page tables. We had literally tons of cases like this where drivers developers cause access after free issues because the importer created a CPU mappings on their own without the exporter knowing about it. This is just one example of what we ran into. Additional to that basically the whole synchronization between drivers was overhauled as well because we found that we can't trust importers to always do the right thing.
But this, fundamentally, is importers creating attachments and then *ignoring the lifetime rules of DMABUF*. If you created an attachment, got a move and *ignored the move* because you put the PFN in your own VMA, then you are not following the attachment lifetime rules!
Move notify is solely for informing the importer that they need to re-fresh their DMA mappings and eventually block for ongoing DMA to end.
This semantics doesn't work well for CPU mappings because you need to hold the reservation lock to make sure that the information stay valid and you can't hold a lock while returning from a page fault.
In other words page faults are opportunistically and happen in concurrent with invalidation, while the move_notify approach is serialized through a common lock.
I think that's what Sima tried to point out as well.
To implement this safely the driver would need to use unma_mapping_range() on the driver VMA inside the move callback, and hook into the VMA fault callback to re-attach the dmabuf.
Yeah and exactly that is something we don't want to allow because it means that every importer need to get things right to prevent exporters from running into problems.
This is where I get into trouble with your argument. It is not that the API has an issue, or that the rules of the API are not logical and functional.
You are arguing that even a logical and functional API will be mis-used by some people and that reviewers will not catch it.
Well it's not miss-used, it's just a very bad design decision to let every importer implement functionality which actually belong into a single point in the exporter.
Honestly, I don't think that is consistent with the kernel philosophy.
We should do our best to make APIs that are had to mis-use, but if we can't achieve that it doesn't mean we stop and give up on problems, we go into the world of APIs that can be mis-used and we are supposed to rely on the reviewer system to catch it.
This is not giving up, but rather just apply good design approaches.
See I can only repeat my self that we came up with this approach because of experience and finding that what you suggest here doesn't work.
In other words we already tried what you suggest here and it doesn't work.
You can already turn both a TEE allocated buffer as well as a memfd into a DMA-buf. So basically TEE and memfd already provides different interfaces which go beyond what DMA-buf does and allows. In other words if you want to do things like direct I/O to block or network devices you can mmap() your memfd and do this while at the same time send your memfd as DMA-buf to your GPU, V4L or neural accelerator. Would this be a way you could work with as well?
I guess, but this still requires creating a dmabuf2 type thing with very similar semantics and then shimming dmabuf2 to 1 for DRM consumers.
I don't see how it addresses your fundamental concern that the semantics we want are an API that is too easy for drivers to abuse.
And being more functional and efficient we'd just see people wanting to use dmabuf2 directly instead of bothering with 1.
Why would you want to do a dmabuf2 here?
separate file descriptor representing the private MMIO which iommufd and KVM uses but you can turn it into a DMA-buf whenever you need to give it to a DMA-buf importer?
Well, it would end up just being used everywhere. I think one person wanted to use this with DRM drivers for some reason, but RDMA would use the dmabuf2 directly because it will be much more efficient than using scatterlist.
Honestly, I'd much rather extend dmabuf and see DRM institute some rule that DRM drivers may not use XYZ parts of the improvement. Like maybe we could use some symbol namespaces to really enforce it eg. MODULE_IMPORT_NS(DMABUF_NOT_FOR_DRM_USAGE)
Some of the improvements we want like the revoke rules for lifetime seem to be agreeable.
Block the API that gives you the non-scatterlist attachment. Only VFIO/RDMA/kvm/iommufd will get to implement it.
I don't mind improving the scatterlist approach in any way possible. I'm just rejecting things which we already tried and turned out to be a bad idea.
If you make an interface which gives DMA addresses plus additional information like address space, access hints etc.. to importers that would be really welcomed.
But exposing PFNs and letting the importers created their DMA mappings themselves and making CPU mappings themselves is an absolutely clear no-go.
In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.
So basically a portion of a PCIe BAR is imported into iommufd?
And KVM. We need to get the CPU address into KVM and IOMMU page tables. It must go through a private FD path and not a VMA because of the CC rules about machine check I mentioned earlier. The private FD must have a lifetime model to ensure we don't UAF the PCIe BAR memory.
Then create an interface between VFIO and KVM/iommufd which allows to pass data between these two.
We already do this between DMA-buf exporters/importers all the time. Just don't make it general DMA-buf API.
Someone else had some use case where they wanted to put the VFIO MMIO PCIe BAR into a DMABUF and ship it into a GPU driver for somethingsomething virtualization but I didn't understand it.
Yeah, that is already perfectly supported.
Let's just say that both the ARM guys as well as the GPU people already have some pretty "interesting" ways of doing digital rights management and content protection.
Well, that is TEE stuff, TEE and CC are not the same thing, though they have some high level conceptual overlap.
In a certain sense CC is a TEE that is built using KVM instead of the TEE subsystem. Using KVM and integrating with the MM brings a whole set of unique challenges that TEE got to avoid..
Please go over those challenges in more detail. I need to get a better understanding of what's going on here.
E.g. who manages encryption keys, who raises the machine check on violations etc...
Regards, Christian.
Jason
On Thu, Jan 16, 2025 at 04:13:13PM +0100, Christian König wrote:
But this, fundamentally, is importers creating attachments and then *ignoring the lifetime rules of DMABUF*. If you created an attachment, got a move and *ignored the move* because you put the PFN in your own VMA, then you are not following the attachment lifetime rules!
Move notify is solely for informing the importer that they need to re-fresh their DMA mappings and eventually block for ongoing DMA to end.
I feel that it is a bit pedantic to say DMA and CPU are somehow different. The DMABUF API gives you a scatterlist, it is reasonable to say that move invalidates the entire scatterlist, CPU and DMA equally.
This semantics doesn't work well for CPU mappings because you need to hold the reservation lock to make sure that the information stay valid and you can't hold a lock while returning from a page fault.
Sure, I imagine hooking up a VMA is hard - but that doesn't change my point. The semantics can be reasonable and well defined.
Yeah and exactly that is something we don't want to allow because it means that every importer need to get things right to prevent exporters from running into problems.
You can make the same argument about the DMA address. We should just get rid of DMABUF entirely because people are going to mis-use it and wrongly implement the invalidation callback.
I have no idea why GPU drivers want to implement mmap of dmabuf, that seems to be a uniquely GPU thing. We are not going to be doing stuff like that in KVM and other places. And we can implement the invalidation callback with correct locking. Why should we all be punished because DRM drivers seem to have this weird historical mmap problem?
I don't think that is a reasonable way to approach building a general purpose linux kernel API.
Well it's not miss-used, it's just a very bad design decision to let every importer implement functionality which actually belong into a single point in the exporter.
Well, this is the problem. Sure it may be that importers should not implement mmap - but using the PFN side address is needed for more than just mmap!
DMA mapping belongs in the importer, and the new DMA API makes this even more explicit by allowing the importer alot of options to optimize the process of building the HW datastructures. Scatterlist and the enforeced represetation of the DMA list is very inefficient and we are working to get rid of it. It isn't going to be replaced by any sort of list of DMA addresses though.
If you really disagree you can try to convince the NVMe people to give up their optimizations the new DMA API allows so DRM can prevent this code-review problem.
I also want the same optimizations in RDMA, and I am also not convinced giving them up is a worthwhile tradeoff.
Why would you want to do a dmabuf2 here?
Because I need the same kind of common framework. I need to hook VFIO to RDMA as well. I need to fix RDMA to have working P2P in all cases. I need to hook KVM virtual device stuff to iommufd. Someone else need VFIO to hook into DRM.
How many different times do I need to implement a buffer sharing lifetime model? No, we should not make a VFIO specific thing, we need a general tool to do this properly and cover all the different use cases. That's "dmabuf2" or whatever you want to call it. There are more than enough use cases to justify doing this. I think this is a bad idea, we do not need two things, we should have dmabuf to handle all the use cases people have, not just DRMs.
I don't mind improving the scatterlist approach in any way possible. I'm just rejecting things which we already tried and turned out to be a bad idea. If you make an interface which gives DMA addresses plus additional information like address space, access hints etc.. to importers that would be really welcomed.
This is not welcomed, having lists of DMA addresses is inefficient and does not match the direction of the DMA API. We are trying very hard to completely remove the lists of DMA addresses in common fast paths.
But exposing PFNs and letting the importers created their DMA mappings themselves and making CPU mappings themselves is an absolutely clear no-go.
Again, this is what we must have to support the new DMA API, the KVM and IOMMUFD use cases I mentioned.
In this case Xu is exporting MMIO from VFIO and importing to KVM and iommufd.
So basically a portion of a PCIe BAR is imported into iommufd?
Yeah, and KVM. And RMDA.
Then create an interface between VFIO and KVM/iommufd which allows to pass data between these two. We already do this between DMA-buf exporters/importers all the time. Just don't make it general DMA-buf API.
I have no idea what this means. We'd need a new API linked to DMABUF that would be optional and used by this part of the world. As I said above we could protect it with some module namespace so you can keep it out of DRM. If you can agree to that then it seems fine..
Someone else had some use case where they wanted to put the VFIO MMIO PCIe BAR into a DMABUF and ship it into a GPU driver for somethingsomething virtualization but I didn't understand it.
Yeah, that is already perfectly supported.
No, it isn't. Christoph is blocking DMABUF in VFIO because he does not want to scatterlist abuses that dmabuf is doing to proliferate. We already have some ARM systems where the naive way typical DMABUF implementations are setting up P2P does not work. Those systems have PCI offset.
Getting this to be "perfectly supported" is why we are working on all these aspects to improve the DMA API and remove the scatterlist abuses.
In a certain sense CC is a TEE that is built using KVM instead of the TEE subsystem. Using KVM and integrating with the MM brings a whole set of unique challenges that TEE got to avoid..
Please go over those challenges in more detail. I need to get a better understanding of what's going on here. E.g. who manages encryption keys, who raises the machine check on violations etc...
TEE broadly has Linux launch a secure world that does some private work. The secure worlds tend to be very limited, they are not really VMs and they don't run full Linux inside
CC broadly has the secure world exist at boot and launch Linux and provide services to Linux. The secure world enforces memory isolation on Linux and generates faults on violations. KVM is the gateway to launch new secure worlds and the secure worlds are full VMs with all the device emulation and more.
It CC is much more like xen with it's hypervisor and DOM0 concepts.
From this perspective, the only thing that matters is that CC secure memory is different and special - it is very much like your private memory concept. Only special places that understand it and have the right HW capability can use it. All the consumers need a CPU address to program their HW because of how the secure world security works.
Jason
linaro-mm-sig@lists.linaro.org