Comments below
On Tue, 2021-10-26 at 03:50 +0000, Lin, Wayne wrote:
[Public]
Hi Lyude! Apologize for replying late and really thanks for elaborating in such details! Following are some of my thoughts : )
-----Original Message----- From: Lyude Paul lyude@redhat.com Sent: Saturday, September 18, 2021 1:48 AM To: Lin, Wayne Wayne.Lin@amd.com; dri-devel@lists.freedesktop.org Cc: Kazlauskas, Nicholas Nicholas.Kazlauskas@amd.com; Wentland, Harry Harry.Wentland@amd.com; Zuo, Jerry Jerry.Zuo@amd.com; Wu, Hersen hersenxs.wu@amd.com; Juston Li juston.li@intel.com; Imre Deak imre.deak@intel.com; Ville Syrjälä ville.syrjala@linux.intel.com; Daniel Vetter daniel.vetter@ffwll.ch; Sean Paul sean@poorly.run; Maarten Lankhorst maarten.lankhorst@linux.intel.com; Maxime Ripard mripard@kernel.org; Thomas Zimmermann tzimmermann@suse.de; David Airlie airlied@linux.ie; Daniel Vetter daniel@ffwll.ch; Deucher, Alexander Alexander.Deucher@amd.com; Siqueira, Rodrigo Rodrigo.Siqueira@amd.com; Pillai, Aurabindo Aurabindo.Pillai@amd.com; Bas Nieuwenhuizen bas@basnieuwenhuizen.nl; Cornij, Nikola Nikola.Cornij@amd.com; Jani Nikula jani.nikula@intel.com; Manasi Navare manasi.d.navare@intel.com; Ankit Nautiyal ankit.k.nautiyal@intel.com; José Roberto de Souza jose.souza@intel.com; Sean Paul seanpaul@chromium.org; Ben Skeggs bskeggs@redhat.com; stable@vger.kernel.org Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
Sorry about the slow response, this week XDC has been going on and I've been mostly paying attention to that.
On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
[Public]
-----Original Message----- From: Lyude Paul lyude@redhat.com Sent: Thursday, September 2, 2021 6:00 AM To: Lin, Wayne Wayne.Lin@amd.com; dri-devel@lists.freedesktop.org Cc: Kazlauskas, Nicholas Nicholas.Kazlauskas@amd.com; Wentland, Harry Harry.Wentland@amd.com; Zuo, Jerry Jerry.Zuo@amd.com; Wu, Hersen hersenxs.wu@amd.com; Juston Li juston.li@intel.com; Imre Deak imre.deak@intel.com; Ville Syrjälä ville.syrjala@linux.intel.com; Daniel Vetter daniel.vetter@ffwll.ch; Sean Paul sean@poorly.run; Maarten Lankhorst maarten.lankhorst@linux.intel.com; Maxime Ripard mripard@kernel.org; Thomas Zimmermann tzimmermann@suse.de; David Airlie airlied@linux.ie; Daniel Vetter daniel@ffwll.ch; Deucher, Alexander Alexander.Deucher@amd.com; Siqueira, Rodrigo Rodrigo.Siqueira@amd.com; Pillai, Aurabindo Aurabindo.Pillai@amd.com; Bas Nieuwenhuizen bas@basnieuwenhuizen.nl; Cornij, Nikola Nikola.Cornij@amd.com; Jani Nikula jani.nikula@intel.com; Manasi Navare manasi.d.navare@intel.com; Ankit Nautiyal ankit.k.nautiyal@intel.com; José Roberto de Souza jose.souza@intel.com; Sean Paul seanpaul@chromium.org; Ben Skeggs bskeggs@redhat.com; stable@vger.kernel.org Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
Actually - did some more thinking, and I think we shouldn't try to make changes like this until we actually know what the problem is here. I could try to figure out what the actual race conditions I was facing before with trying to add/destroy connectors based on PDT, but we still don't even actually have a clear idea of what's broken here. I'd much rather us figure out exactly how this leak is happening before considering making changes like this, because we have no way of knowing if we've properly fixed it or not if we don't know what the problem is in the first place.
I'm still happy to write up the topology debugging stuff I mentioned to you if you think that would help you debug this issue - since that would make it a lot easier for you to track down what references are keeping a connector alive (and additkionally, where those references were taken in code. thanks stack_depot!)
Hi Lyude, Sorry for late response. A bit busy on other stuff recently..
Really really thankful for all your help : ) I'm also glad to have the debugging tool if it won’t bother you too much. But before debugging,
no problem! I will get to it early next week then
I need to have consensus with you about where do we expect to release resource allocated for a stream sink when it's reported as disconnected. Previous patch suggests releasing resource when connector is destroyed which will happen when topology refcount reaches zero (i.e. unplug mstb from topology). But when the case is receiving CSN notifying connection change, we don't try to destroy connector in this case now. And this is not caused by topology/malloc refcount leak since I don't expect neither one of them get decrease to zero under this case (topology of mstbs and ports is not changed). Hence, my plan was to also try to destroy connector under
Ah - I wonder if this might have been where some of the confusion here came from. So-both mstbs and ports (assume I'm talking the actual drm_dp_mst_port and drm_dp_mst_branch structs here) are supposed to have non-zero topology refcounts as long as there is a valid path between the port or mstb, and our source. This also means that for ports, the drm_connector associated with these ports should stay around as long as the port is reachable from the sink
- regardless of whether anything is actually plugged into the port or not.
This concept is the place where a bit hard for me to get through. I was thinking that we don’t have to associate a drm connector with a MST port whenever the port exists, since MST port is not always connected to a stream sink. I treat MST port as an intermediate node of a virtual channel, which is an end-to-end direct virtual connection between a stream source and a stream sink. Virtual channel could be constructed by multiple link count and stream sink is connected at end port. Hence,
Just to clarify I'm understanding you correctly, when you say multiple link count do you just mean VCPI slots or do you mean two separate DP links? I haven't dealt with the former, but IIRC that is how certain high resolutions displays are handled over TB correct?
Regardless though, I would think that we could just handle this mostly from the atomic state even with a connector for every port. For instance, i915 already has something called "big joiner" for supporting display configurations where one display can take up two separate display pipes (CRTCs). We could likely do something similar but with connectors if we end up having to deal with a display driven by two DP links.
I was thinking to associate a drm connector for end stream sink only. I think we probably won't want to attach a connector to a relay/retimer/redriver within a stream path? I treat MST port as the similar role when It's fixed to connect to a MST branch device.
If it's a fixed connection, this might actually be OK to avoid attaching connectors on. Currently with input ports where we know we can never receive a CSN for them during runtime, we're able to avoid creating a connector because no potential for CSN during runtime means the only possible time an input port could transition would be suspend/resume. So if we detect we're on another topology where something that was previously an output port that is an input port on the new topology, we get rid of the connector by removing the drm_dp_mst_port it's associated with from the topology and replace it with a new one. This works pretty well, as it avoids doing any actual connector destruction from the suspend/resume codepath and ensures that any pointer references to the now non-existent output port remain valid for as long as needed. So I might actually be open to expanding this for fixed connections like relays, retimers and redrivers if we handle things in a similar manner. For anything that can receive a CSN though, a drm_connector is unconditionally needed even if nothing's connected.
I think it's a bit different to SST case. For legacy DP (before DP 1.2), we can attach a connector to its physical end output port since it's dedicated for a stream sink. But MST port is not. However, I understand if there is any implementation requirement for us to associate drm connector for all MST ports.
So - a CSN on it's own shouldn't really get rid of the port it was notifying us about. But if that CSN results in an MSTB -with- its own ports being removed, this would mean there would no longer be a valid path between our source and the ports on said MSTB and as such - the connector for each one of those ports is removed from the topology. Remember however, when I say "removed from the topology" what I'm referring to is the fact that the MST helpers have dropped the main topology reference for a given mstb or port. Since various MST helpers retrieve temporary topology references to connectors they work on in order to simplify handling I/O errors, the operations from those helpers would potentially keep the port or mstb around in the topology until those helpers have had a chance to abort and drop their refs. And then once all the topology references are released, a destruction worker gets scheduled which handles unregistering the drm_connector (not destroying it). The drm_connector stays around unregistered, up until the point at which all malloc references to the drm_dp_mst_port have been released.
I think it may also be worth clarifying the lifetime of drm_connector itself here as well, since that also actually has a refcount. Basically, as long as userspace has a mode committed which references a drm_connector - that drm_connector will still exist in memory, and its mode object ID will remain valid. This means if we were to have a MST topology hooked up with one display turned on and then suddenly unplugged it, keeping in mind that the port with said display now becomes inaccessible from the topology, the drm_connector associated with that display would continue to have a valid mode object ID up until the point at which userspace has committed a new mode which disables it. The sysfs paths for the connector however, will disappear immediately once the connector is unregistered so as to ensure that userspace applications cannot try to reuse it later or attempt to reprobe it.
Any resource releases beyond this (streams on the driver side, for instance) are up to the driver, but typically I would expect them to happen in the same places as they would with an SST connector. Does that answer your question?
Unplug event of SST sink and MST remote sink is a bit different. SST unplug event relies on long HPD IRQ but MST CSN relies on short HPD IRQ. Now we use MST helper function drm_dp_mst_handle_conn_stat() to deal with CSN short HPD IRQ. But within this function, driver won't get notification of disconnection event to release associated allocated resource. So, by not changing the drm connector association logic here, should we add a new call back function here?
Sorry Lyude, I don't understand as well as you on this and would like to learn more from you. Please correct me if I misunderstand anything here. Much appreciate!
It's no problem at all! I'm always glad to help :). This still sounds a lot like a bug to me in amdgpu, because we actually do send a hotplug event here. Basically, the only function that calls this; drm_dp_mst_process_up_req(), will assume it needs to request a hotplug if we ever call drm_dp_mst_handle_conn_stat(). From there we pass this information up to drm_dp_mst_up_req_work(). Then once we've finished handling all pending up requests, we send a single hotplug to indicate to userspace it needs to reprobe everything.
Also, I'm still working on the debugging stuff btw!