On Thu, Aug 22, 2019 at 11:14 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 21.08.19 um 22:22 schrieb Daniel Vetter:
On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Christian König (2019-08-21 13:31:37)
Hi everyone,
In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
Fwiw, I would like the distinction here between optional fences (writers, readers) and mandatory fences (others). The optional fences are where we put the implicit fence tracking that clever userspace would rather avoid. The mandatory fences (I would call internal) is where we put the fences for tracking migration that userspace can not opt out of.
I think this would make sense, and is kinda what I expected here.
Yeah, exactly that's the intention here.
Basic idea is to group the fences into the categories of "you always need to wait for when doing implicit synchronization" (writers), "you only need to wait for them when you want to write to the object" (readers) and "ignore them for implicit synchronization".
If (and I think that's a huge if) we can agree on what those internal fences are. There's a huge difference between internal fences for buffer moves (better not ignore those) and internal fences like amdkfd's eviction fence (better ignore those).
Yeah, that's exactly why I want to get away from those exclusive/shared naming.
The bikeshed was epic. The idea behind exclusive/shared was that you might want to ignore writers (like amdgpu does for internal buffers), so shared doesn't necessarily mean it only contains readers, there might also be writers in there. But only writers who are coordinating their writes through some other means.
For writers the reason with going with exclusive was again the above, that you might not want to put all writers into the exclusive slot (amdgpu doesn't, at least for internal stuff). Also, some exclusive fences might not be traditional writers, but other stuff like bo moves.
But clearly amdkfd_eviction fence doesn't fit into this scheme. And on the other hand we might want to have better rules to differentiate between writers/reads for implicit sync and stuff the kernel does more. Currently the rules are that you always have to sync with the exclusive fence, since you have no idea why exactly it is exclusive - it could be implicit sync, or it could be a bo move, or something else entirely. At least for foreing fences.
For your own fences I don't think any of this matters, and essentially you can treat them all as just an overall list of fences on your bo. E.g. you could also treat the exlusive fence slot as a shared fence slot for internal purposes, if the driver-internal semantics allow that.
For readers/writers I hoped the semantic would be more clear, but that's doesn't seems to be the case.
So whatever we do add, it better come with really clear docs and pretty diagrams about what it's supposed to do, and how it's supposed to be used. Or we're just back to the current mess we're in, times two.
Well documenting it in the end is clearly a good idea, but I don't think we should start with that before we actually know what we want to implement and how we want to implement it.
Otherwise I would write tons of documentation which can be thrown away again in the end because we decided to don't do it this way.
Yeah there's a bit a problem there. So for your RFC I guess the main goal is the "other" bucket, which is both going to be a huge bikeshed (not again ...) but also real tricky to figure out the semantics. If the goal with "other" is to use that for bo moves, and not for amdkfd_eviction (that one I feel like doesn't fit, at least amdgpu code tends to ignore it in many cases), then maybe we should call it kernel_exclusive? Or something else that indicates it's a level above the normal exclusive, kinda usual user/kernel split. And if you ignore the kernel_exclusive fence then the kernel will break (and not just some data corruption visible to userspace).
Or is there something else you want to clear up? I mean aside from the reader/writer vs exlusive/shared thing, imo that's easier to decided once we know what "other" is exactly like. -Daniel