On Fri, May 1, 2020 at 12:03 PM Jordan Crouse jcrouse@codeaurora.org wrote:
Writing to the devfreq sysfs nodes while the GPU is powered down can result in a system crash (on a5xx) or a nasty GMU error (on a6xx):
$ /sys/class/devfreq/5000000.gpu# echo 500000000 > min_freq [ 104.841625] platform 506a000.gmu: [drm:a6xx_gmu_set_oob] *ERROR* Timeout waiting for GMU OOB set GPU_DCVS: 0x0
Despite the fact that we carefully try to suspend the devfreq device when the hardware is powered down there are lots of holes in the governors that don't check for the suspend state and blindly call into the devfreq callbacks that end up triggering hardware reads in the GPU driver.
Call pm_runtime_get_if_in_use() in the gpu_busy() and gpu_set_freq() callbacks to skip the hardware access if it isn't active.
v2: Use pm_runtime_get_if_in_use() per Eric Anholt
Cc: stable@vger.kernel.org Signed-off-by: Jordan Crouse jcrouse@codeaurora.org
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 6 ++++++ drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 8 ++++++++ drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 7 +++++++ 3 files changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index 724024a2243a..4d7f269edfcc 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1404,6 +1404,10 @@ static unsigned long a5xx_gpu_busy(struct msm_gpu *gpu) { u64 busy_cycles, busy_time;
/* Only read the gpu busy if the hardware is already active */
if (pm_runtime_get_if_in_use(&gpu->pdev->dev) <= 0)
return 0;
RPM's APIs are a bit of a trap and will return a negative errno for the get functions if runtime PM is disabled in kconfig, even though usually that would mean that the power domain is not ever disabled by RPM. I think in these checks you want "if (pm_runtime_get_if_in_use() == 0)", and that seems to be a common pattern in other drivers. With that,
Reviewed-by: Eric Anholt eric@anholt.net
(and tested, too)