On Fri, 2021-10-29 at 12:11 +0000, Lin, Wayne wrote:
[Public]
Thanks Lyude for patiently guiding on this : ) Would like to learn more as following
I do follow your bit about connectors only being created when a virtual path is instantiated, but that still doesn't follow how connectors in DRM typically behave though as this idea still comes down to "we don't have disconnected connectors, only connected ones". Which still breaks force probing (if the connector doesn't exist in userspace because we destroyed it, how do we get to it's force sysfs file?), and also just makes hides information from userspace that it might actually care about (what if for instance, a GUI wanted to display the topology layout of an MST hub -including- all of the currently disconnected ports on it? Considering we allow this for things like USB, it doesn't make sense to hide them for MST.
As well, while your idea for what an MST connector is honestly does make a lot more sense then what we have, that's not really the issue here. The problem is that connector creation/destruction is already quite racy, and requires a _lot_ of care to get right. We've already had tons of bugs in the past that lead to us resorting to all of the tricks we're currently using, for instance: Which just seems to add a lot of complication to the current MST code, without much reason here besides trying to reduce the amount of connectors along with a potential bug with leaking connectors that we still don't know the cause of. Trying to solve problems without understanding exactly what's causing them something around a bug that could be entirely unrelated to how we create connectors, because then it's not even really guaranteed we've fixed anything if we don't know what caused the problem in the first place. Working around problems might temporarily fix the ones we're dealing with right now, but without understanding what's causing it there's no guarantee it won't just pop up again in the future or that we won't introduce new problems in the process.
Regardless though, I would think that we could just handle this mostly from the atomic state even with a connector for every port. For instance, i915 already has something called "big joiner" for supporting display configurations where one display can take up two separate display pipes (CRTCs). We could likely do something similar but with connectors if we end up having to deal with a display driven by two DP links.
I was thinking to associate a drm connector for end stream sink only. I think we probably won't want to attach a connector to a relay/retimer/redriver within a stream path? I treat MST port as the similar role when It's fixed to connect to a MST branch device.
If it's a fixed connection, this might actually be OK to avoid attaching connectors on. Currently with input ports where we know we can never receive a CSN for them during runtime, we're able to avoid creating a connector because no potential for CSN during runtime means the only possible time an input port could transition would be suspend/resume. So if we detect we're on another topology where something that was previously an output port that is an input port on the new topology, we get rid of the connector by removing the drm_dp_mst_port it's associated with from the topology and replace it with a new one. This works pretty well, as it avoids doing any actual connector destruction from the suspend/resume codepath and ensures that any pointer references to the now non-existent output port remain valid for as long as needed. So I might actually be open to expanding this for fixed connections like relays, retimers and redrivers if we handle things in a similar manner. For anything that can receive a CSN though, a drm_connector is unconditionally needed even if nothing's connected.
Want to deepen my knowledge here. Sorry Lyude could you explain more on this please? Are you saying that if we change to associate drm connector as what I proposed in this patch, we will create actual connector destruction from the suspend/resume codepath and which is a problem here? I thought once the connection status changed from connected to disconnected during suspend/resume, we still use the same way as what we did in drm_dp_delayed_destroy_port(): i.e. if (port->connector) { drm_connector_unregister(port->connector); drm_connector_put(port->connector); } We won't directly destruct the drm connector?
Something like that, I'd need to to go look further into the details because I very vividly remember most of the tricks we do in the MST helpers regarding delayed connector destruction and when/how we change various members of the drm_dp_mst_port/drm_dp_mst_branch structures. I vaguely remember the problem with trying to hot add/remove connectors (I -did- actually try to do this once I believe! but not as thoroughly as you have) being some kind of lockdep issue. I started trying to dig into the MST code a bit deeper to get a clear answer on this, but I actually decided to take that time and just finish up the debug helpers I mentioned (I'll send the WIP patch I've got to you in a moment, and will send it off the mailing list once I finish hooking things up in i915) because it really just doesn't seem to me like we actually have a clear understanding of how this issue is being caused - and it's not a good idea for us to make any kind of API change like this to attempt (and inevitably fail or break something else) to fix an issue we don't fully understand.
[snip...]
Right! I might not recall correctly, but I think that's why I want this patch. I probably encountered that userspace doesn’t explicitly try to react to this unplug event. Instead, it tries to react when we plug in monitor next time. And the problem is when we plug in monitor next time, stale resources are not released yet. It then hits the limitation within our HW. Which let me want to explicitly release resource once driver detect the unplug event (just like sst long HPD event I think). By the way, just out of curiosity, when do you think is the timing to release sink related resource if we rely on hotplug event notifying userspace? When userspace frees the associated pipe of the connector? Won't it be a transient state that userspace just free the pipe temporarily?
The timing of releasing resources should be done at the same time that we disable the connector. In general, MST modesetting shouldn't be much different from anything else - except for having to maintain a payload table and bandwidth limitations across a shared connection. So pretty much everything related to enabling or disabling streams should be in the atomic commit phase (with any bandwidth calculations done beforehand, WIP...). I'm going to say, let's figure out where this is happening first. I've got the debugging patches for this ready and will send them to you now.
Also, I'm still working on the debugging stuff btw!
Much appreciate Lyude! Thanks!
-- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat
-- Regards, Wayne Lin