On 5/20/26 14:33, Xaver Hugl wrote:
Am Mi., 20. Mai 2026 um 10:08 Uhr schrieb Christian König christian.koenig@amd.com:
Well I would say the other way around is a pretty common use case.
In other words the compositors uses the internal GPU for composing and displaying the picture. And the client uses the external GPU for fast rendering.
Sure, but that's not what I'm talking about.
Yeah sorry for that, I wasn't sure if I misunderstood your use case because it's usually the other way around.
- the buffers from the client stay valid
Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings either result in a SIGBUS or are redirected to a dummy page.
Again, not what I wrote about. The buffers are on the integrated GPU.
General rule of thumb is that as long as the exporter stays around the buffers stay around as well.
- the syncobj stays valid on the client side
- the syncobj becomes invalid on the compositor side
Nope that's not correct. The syncobj itself stays valid even if you completely hot plug the device.
It can just be that the fences inside the syncobj are terminated with an error.
What about eventfd created for a point on the syncobj?
The eventfd unfortunately doesn't has error handling as far as I know, so when a fence signals with an error condition then the eventfd you only sees that it is signaled.
Another (future) problem with hotplugs will be if the sync file hasn't materialized for the timeline point when the device is hotunplugged, since there can't be an error on the fence if there isn't one. Or could userspace somehow set an 'artificial' fence with an error in that case?
In general the answer is yes, userspace needs to take care of inserting fences when wait before signal is used and the work can not be submitted to the HW for some reason.
Currently we only have an IOCTL to insert the signaled dummy fence at some timeline sequence, but it should be trivial as well to insert a signaled fence with an error code.
But the compositor needs to be able to handle that case anyway, because it can be that a malicious or just buggy client just never inserts the fence.
So that a device is hot plugged is not different to just a client not inserting the fence in the first place.
"invalid" there means either
- the acquire point of the client is marked as signaled, before
rendering on the client side is completed
- the acquire point of the client is never signaled. Since the
compositor waits for the acquire point, the Wayland surface is stuck forever
Both of those would be a *massive* violation of documented kernel rules for hot-plugging which could lead to random data corruption and/or deadlocks.
If you see any HW driver showing behavior like that please open up a bug report and ping the relevant maintainers immediately.
If there are no error codes with syncobj yet, then to userspace, the latter behavior is exactly what we get, isn't it?
No, from userspace side you just see a signaled fence. It's just that you need to export the timeline point of the syncobj to a syncfile and then you can call the QUERY IOCTL on the syncfile to see the error code.
When a hotplug happens all operations of the device should return an -ENODEV error, even when exposed to other devices/application through syncobj or syncfile.
Okay, that at least gives us a way to fail imports somewhat gracefully. Normally, failing to import a syncobj is a fatal error in the Wayland protocol.
So the task at hand would be to avoid importing the syncobj into a driver. That should be relatively trivial.
The only real problem I see is if you want to create a syncobj without having any device whatsoever.
One problem is that only syncfile allows for querying such error codes at the moment, we have patches pending to add that to syncobj as well but we lack a compositor with support for that as userspace client.
As long as the error case can be detected with an eventfd,
Yeah that's the problem. The eventfd only tells you if the operation is completed (or at least has materialized).
To query the error you would need to ask the underlying syncobj or syncfile directly.
implementing that in KWin shouldn't be a challenge.
Well the question here is if the device the compositor is using or the client is using is gone?
If the client device is hot removed the compositor should be perfectly capable to import the syncobj.
If the compositor device is gone then you don't have a device to display anything any more, so generating the next frame doesn't seem to make sense either.
What could be is that you want the compositor to be kept alive even when the display device is gone to switch over to vkms or whatever so that a VNC session or other remote desktop still works.
There are two GPUs in the example I gave. The compositor can use both for rendering (in cosmic-comp's case) or switch between them (what I'm trying to do with KWin), or use one device for rendering, and another for importing the syncobj.
Ah! I think I got the problem now. You basically want to avoid importing the syncobj because when the wrong device goes away you are busted.
The reason we didn't considered having the IOCTLs on the FD is because if you don't import them and instead keep them around you can run out file descriptors quite quickly.
When you have an use case where you receive an FD from the client and do a one shot conversion to an eventfd that will probably work, but for keeping them in the long run you need some kind of container for the syncobjs, don't you?
> 3. It removes the need to translate between syncobjs fds and handles.
That's a pretty big no-go as well. The differentiation between FDs and handles is completely intentional.
Could you expand on why it's needed? For compositors, the handle is just an intermediary thing when translating between file descriptors.
Well what we could do is to add an IOCTL to directly attach an syncobj file descriptor to an eventfd.
That would be nice.
Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is used. Adding that functionality shouldn't be more than a typing exercise.
Yeah, this patchset already adds that functionality (on the new device).
Do I see it right that this would already solve most problems in the compositor side?
Skipping the syncobj handle step would only reduce the amounts of ioctls the compositor does, but afaict it wouldn't solve any compositor problems. At least not as long as it's still tied to a drm device.
Yeah, you need something like a syncobj container or dummy DRM device.
For device hotplugs, the only new thing we need for correctly handling syncobj is a way to receive errors on the eventfd.
I need to look into the eventfd code, could be that this is somehow possible but it's clearly not something I used before.
A device-independent way to create and use syncobj would still be useful to us though, both to simplify the compositor and to improve the software rendering use cases.
Yeah not sure how to cleanly do that. We could have a dummy /dev/dri/rendersync or something like that, but that would be quite a hack.
At least I understand the requirement now.
Thanks, Christian.
- Xaver
linaro-mm-sig@lists.linaro.org