On Fri, 21 May 2021 at 14:09, Christian König christian.koenig@amd.com wrote:
Am 21.05.21 um 14:54 schrieb Daniel Stone:
If you're curious, the interface definitions are in the csf/ directory in the 'Bifrost kernel driver' r30p0 download you can get from the Arm developer site. Unfortunately the exact semantics aren't completely clear.
Well it is actually relatively simple. Take a look at the timeline semaphores from Vulkan, everybody is basically implementing the same semantics now.
When you queued up a bunch of commands on your hardware, the first one will write value 1 to a 64bit memory location, the second one will write value 2, the third value 3 and so on. After writing the value the hardware raises and interrupt signal to everybody interested.
In other words pretty standard memory fence behavior.
When you now have a second queue which depends on work of the first one you look at the memory location and do a compare. If you depend on the third submission you just wait for the value to be >3 and are done.
Right, it is clearly defined to the timeline semaphore semantics, I just meant that it's not clear how it works at a lower level wrt the synchronisation and signaling. The simplest possible interpretation is that wait_addrval blocks infinitely before kick-cmdbuf, but that seems painful with only 32 queues. And the same for fences, which are a binary signal. I guess we'll find out. My tooth hurts.
Cheers, Daniel