Am 21.05.21 um 20:31 schrieb Daniel Vetter:
[SNIP]
We could provide an IOCTL for the BO to change the flag.
That's not the semantics we need.
But could we first figure out the semantics we want to use here?
Cause I'm pretty sure we don't actually need those changes at all and as said before I'm certainly NAKing things which break existing use cases.
Please read how other drivers do this and at least _try_ to understand it. I'm really loosing my patience here with you NAKing patches you're not even understanding (or did you actually read and fully understand the entire story I typed up here, and your NAK is on the entire thing?). There's not much useful conversation to be had with that approach. And with drivers I mean kernel + userspace here.
Well to be honest I did fully read that, but I was just to emotionally attached to answer more appropriately in that moment.
And I'm sorry that I react emotional on that, but it is really frustrating that I'm not able to convince you that we have a major problem which affects all drivers and not just amdgpu.
Regarding the reason why I'm NAKing this particular patch, you are breaking existing uAPI for RADV with that. And as a maintainer of the driver I have simply no other choice than saying halt, stop we can't do it like this.
I'm perfectly aware that I've some holes in the understanding of how ANV or other Vulkan/OpenGL stacks work. But you should probably also admit that you have some holes how amdgpu works or otherwise I can't imagine why you suggest a patch which simply breaks RADV.
I mean we are working together for years now and I think you know me pretty well, do you really think I scream bloody hell we can't do this without a good reason?
So let's stop throwing halve backed solutions at each other and discuss what we can do to solve the different problems we are both seeing here.
That's the other frustration part: You're trying to fix this purely in the kernel. This is exactly one of these issues why we require open source userspace, so that we can fix the issues correctly across the entire stack. And meanwhile you're steadfastily refusing to even look at that the userspace side of the picture.
Well I do fully understand the userspace side of the picture for the AMD stack. I just don't think we should give userspace that much control over the fences in the dma_resv object without untangling them from resource management.
And RADV is exercising exclusive sync for amdgpu already. You can do submission to both the GFX, Compute and SDMA queues in Vulkan and those currently won't over-synchronize.
When you then send a texture generated by multiple engines to the Compositor the kernel will correctly inserts waits for all submissions of the other process.
So this already works for RADV and completely without the IOCTL Jason proposed. IIRC we also have unit tests which exercised that feature for the video decoding use case long before RADV even existed.
And yes I have to admit that I haven't thought about interaction with other drivers when I came up with this because the rules of that interaction wasn't clear to me at that time.
Also I thought through your tlb issue, why are you even putting these tlb flush fences into the shard dma_resv slots? If you store them somewhere else in the amdgpu private part, the oversync issues goes away
- in your ttm bo move callback, you can just make your bo copy job
depend on them too (you have to anyway)
- even for p2p there's not an issue here, because you have the
->move_notify callback, and can then lift the tlb flush fences from your private place to the shared slots so the exporter can see them.
Because adding a shared fence requires that this shared fence signals after the exclusive fence. And this is a perfect example to explain why this is so problematic and also why why we currently stumble over that only in amdgpu.
In TTM we have a feature which allows evictions to be pipelined and don't wait for the evicting DMA operation. Without that driver will stall waiting for their allocations to finish when we need to allocate memory.
For certain use cases this gives you a ~20% fps increase under memory pressure, so it is a really important feature.
This works by adding the fence of the last eviction DMA operation to BOs when their backing store is newly allocated. That's what the ttm_bo_add_move_fence() function you stumbled over is good for: https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo...
Now the problem is it is possible that the application is terminated before it can complete it's command submission. But since resource management only waits for the shared fences when there are some there is a chance that we free up memory while it is still in use.
Because of this we have some rather crude workarounds in amdgpu. For example IIRC we manual wait for any potential exclusive fence before freeing memory.
We could enable this feature for radeon and nouveau as well with an one line change. But that would mean we need to maintain the workarounds for shortcomings of the dma_resv object design in those drivers as well.
To summarize I think that adding an unbound fence to protect an object is a perfectly valid operation for resource management, but this is restricted by the needs of implicit sync at the moment.
The kernel move fences otoh are a bit more nasty to wring through the p2p dma-buf interface. That one probably needs something new.
Well the p2p interface are my least concern.
Adding the move fence means that you need to touch every place we do CS or page flip since you now have something which is parallel to the explicit sync fence.
Otherwise having the move fence separately wouldn't make much sense in the first place if we always set it together with the exclusive fence.
Best regards and sorry for getting on your nerves so much, Christian.
-Daniel
Regards, Christian.
-Daniel
Are you bored enough to type this up for radv? I'll give Jason's kernel stuff another review meanwhile. -Daniel
> e->bo_va = amdgpu_vm_bo_find(vm, bo); > } > -- > 2.31.0 >
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....