Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago.
Pretend we have a topology like this:
|- DP-1: mst_primary |- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected
If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton).
We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy.
So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation.
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo Jerry.Zuo@amd.com Cc: Jerry Zuo Jerry.Zuo@amd.com Cc: Harry Wentland Harry.Wentland@amd.com Cc: stable@vger.kernel.org # v4.6+ --- drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL; + mutex_lock(&mgr->lock); - if (mgr->mst_primary) - rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port); + /* + * Port may or may not be 'valid' but we don't care about that when + * destroying the port and we are guaranteed that the port pointer + * will be valid until we've finished + */ + if (current_work() == &mgr->destroy_connector_work) { + kref_get(&port->kref); + rport = port; + } else if (mgr->mst_primary) { + rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, + port); + } mutex_unlock(&mgr->lock); return rport; }
On Wed, 14 Nov 2018 at 08:47, Lyude Paul lyude@redhat.com wrote:
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago.
Pretend we have a topology like this:
|- DP-1: mst_primary |- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected
If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton).
We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy.
So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation.
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo Jerry.Zuo@amd.com Cc: Jerry Zuo Jerry.Zuo@amd.com Cc: Harry Wentland Harry.Wentland@amd.com Cc: stable@vger.kernel.org # v4.6+
Much as I constantly feel we are missing some better way to do all of this, this seems like a reasonable fix.
Reviewed-by: Dave Airlie airlied@redhat.com
drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL;
mutex_lock(&mgr->lock);
if (mgr->mst_primary)
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
/*
* Port may or may not be 'valid' but we don't care about that when
* destroying the port and we are guaranteed that the port pointer
* will be valid until we've finished
*/
if (current_work() == &mgr->destroy_connector_work) {
kref_get(&port->kref);
rport = port;
} else if (mgr->mst_primary) {
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
} mutex_unlock(&mgr->lock); return rport;
}
2.19.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Pushed to drm-misc-fixes, thanks!
On Thu, 2018-11-22 at 06:08 +1000, Dave Airlie wrote:
On Wed, 14 Nov 2018 at 08:47, Lyude Paul lyude@redhat.com wrote:
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago.
Pretend we have a topology like this:
- DP-1: mst_primary
|- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected
If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton).
We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy.
So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation.
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo Jerry.Zuo@amd.com Cc: Jerry Zuo Jerry.Zuo@amd.com Cc: Harry Wentland Harry.Wentland@amd.com Cc: stable@vger.kernel.org # v4.6+
Much as I constantly feel we are missing some better way to do all of this, this seems like a reasonable fix.
Reviewed-by: Dave Airlie airlied@redhat.com
drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL;
mutex_lock(&mgr->lock);
if (mgr->mst_primary)
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
/*
* Port may or may not be 'valid' but we don't care about that
when
* destroying the port and we are guaranteed that the port pointer
* will be valid until we've finished
*/
if (current_work() == &mgr->destroy_connector_work) {
kref_get(&port->kref);
rport = port;
} else if (mgr->mst_primary) {
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
} mutex_unlock(&mgr->lock); return rport;
}
2.19.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul lyude@redhat.com wrote:
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago.
Pretend we have a topology like this:
|- DP-1: mst_primary |- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected
If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton).
We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy.
So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation.
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo Jerry.Zuo@amd.com Cc: Jerry Zuo Jerry.Zuo@amd.com Cc: Harry Wentland Harry.Wentland@amd.com Cc: stable@vger.kernel.org # v4.6+
Hm, sounds very similar to the bug I pointed out in your "make vcpi alloc more atomic" series. Will this all interact nicely with the solution we've worked out there (where we need to delay at least the kfree(port) until the last vcpi allocation is released? -Daniel
drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL;
mutex_lock(&mgr->lock);
if (mgr->mst_primary)
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
/*
* Port may or may not be 'valid' but we don't care about that when
* destroying the port and we are guaranteed that the port pointer
* will be valid until we've finished
*/
if (current_work() == &mgr->destroy_connector_work) {
kref_get(&port->kref);
rport = port;
} else if (mgr->mst_primary) {
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
} mutex_unlock(&mgr->lock); return rport;
}
2.19.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, 2018-11-22 at 09:34 +0100, Daniel Vetter wrote:
On Tue, Nov 13, 2018 at 11:47 PM Lyude Paul lyude@redhat.com wrote:
Jerry Zuo pointed out a rather obscure hotplugging issue that it seems I accidentally introduced into DRM two years ago.
Pretend we have a topology like this:
- DP-1: mst_primary
|- DP-4: active display |- DP-5: disconnected |- DP-6: active hub |- DP-7: active display |- DP-8: disconnected |- DP-9: disconnected
If we unplug DP-6, the topology starting at DP-7 will be destroyed but it's payloads will live on in DP-1's VCPI allocations and thus require removal. However, this removal currently fails because drm_dp_update_payload_part1() will (rightly so) try to validate the port before accessing it, fail then abort. If we keep going, eventually we run the MST hub out of bandwidth and all new allocations will start to fail (or in my case; all new displays just start flickering a ton).
We could just teach drm_dp_update_payload_part1() not to drop the port ref in this case, but then we also need to teach drm_dp_destroy_payload_step1() to do the same thing, then hope no one ever adds anything to the that requires a validated port reference in drm_dp_destroy_connector_work(). Kind of sketchy.
So let's go with a more clever solution: any port that drm_dp_destroy_connector_work() interacts with is guaranteed to still exist in memory until we say so. While said port might not be valid we don't really care: that's the whole reason we're destroying it in the first place! So, teach drm_dp_get_validated_port_ref() to use the all mighty current_work() function to avoid attempting to validate ports from the context of mgr->destroy_connector_work. I can't see any situation where this wouldn't be safe, and this avoids having to play whack-a-mole in the future of trying to work around port validation.
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()") Reported-by: Jerry Zuo Jerry.Zuo@amd.com Cc: Jerry Zuo Jerry.Zuo@amd.com Cc: Harry Wentland Harry.Wentland@amd.com Cc: stable@vger.kernel.org # v4.6+
Hm, sounds very similar to the bug I pointed out in your "make vcpi alloc more atomic" series. Will this all interact nicely with the solution we've worked out there (where we need to delay at least the kfree(port) until the last vcpi allocation is released?
Yes, it seems to work fine in my tests
-Daniel
drivers/gpu/drm/drm_dp_mst_topology.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 529414556962..08978ad72f33 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -1023,9 +1023,20 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_ static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port) { struct drm_dp_mst_port *rport = NULL;
mutex_lock(&mgr->lock);
if (mgr->mst_primary)
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
/*
* Port may or may not be 'valid' but we don't care about that
when
* destroying the port and we are guaranteed that the port pointer
* will be valid until we've finished
*/
if (current_work() == &mgr->destroy_connector_work) {
kref_get(&port->kref);
rport = port;
} else if (mgr->mst_primary) {
rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
port);
} mutex_unlock(&mgr->lock); return rport;
}
2.19.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
linux-stable-mirror@lists.linaro.org