It looks like that this GPU core triggers an abort when reading VIVS_HI_CHIP_PRODUCT_ID and/or VIVS_HI_CHIP_CUSTOMER_ID.
I looked at different versions of Vivante's kernel driver and did not found anything about this issue or what feature flag can be used. So go the simplest route and do not read these two registers on the affected GPU core.
Signed-off-by: Christian Gmeiner christian.gmeiner@gmail.com Reported-by: Josua Mayer josua.mayer@jm0.eu Fixes: 815e45bbd4d3 ("drm/etnaviv: determine product, customer and eco id") Cc: stable@vger.kernel.org --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index d5a4cd85a0f6..d3906688c2b3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -337,10 +337,17 @@ static void etnaviv_hw_identify(struct etnaviv_gpu *gpu)
gpu->identity.model = gpu_read(gpu, VIVS_HI_CHIP_MODEL); gpu->identity.revision = gpu_read(gpu, VIVS_HI_CHIP_REV); - gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID); - gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID); gpu->identity.eco_id = gpu_read(gpu, VIVS_HI_CHIP_ECO_ID);
+ /* + * Reading these two registers on GC600 rev 0x19 result in a + * unhandled fault: external abort on non-linefetch + */ + if (!etnaviv_is_model_rev(gpu, GC600, 0x19)) { + gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID); + gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID); + } + /* * !!!! HACK ALERT !!!! * Because people change device IDs without letting software
Hi Christian,
I have formally tested the patch with 5.7.10 - and it doesn't resolve the issue - sadly :(
From my testing, the reads on
VIVS_HI_CHIP_PRODUCT_ID VIVS_HI_CHIP_ECO_ID need to be conditional - while VIVS_HI_CHIP_CUSTOMER_ID seems to be okay.
br josau Mayer
Am 21.08.20 um 20:17 schrieb Christian Gmeiner:
It looks like that this GPU core triggers an abort when reading VIVS_HI_CHIP_PRODUCT_ID and/or VIVS_HI_CHIP_CUSTOMER_ID.
I looked at different versions of Vivante's kernel driver and did not found anything about this issue or what feature flag can be used. So go the simplest route and do not read these two registers on the affected GPU core.
Signed-off-by: Christian Gmeiner christian.gmeiner@gmail.com Reported-by: Josua Mayer josua.mayer@jm0.eu Fixes: 815e45bbd4d3 ("drm/etnaviv: determine product, customer and eco id") Cc: stable@vger.kernel.org
drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index d5a4cd85a0f6..d3906688c2b3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -337,10 +337,17 @@ static void etnaviv_hw_identify(struct etnaviv_gpu *gpu) gpu->identity.model = gpu_read(gpu, VIVS_HI_CHIP_MODEL); gpu->identity.revision = gpu_read(gpu, VIVS_HI_CHIP_REV);
gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
gpu->identity.eco_id = gpu_read(gpu, VIVS_HI_CHIP_ECO_ID);gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
/*
* Reading these two registers on GC600 rev 0x19 result in a
* unhandled fault: external abort on non-linefetch
*/
if (!etnaviv_is_model_rev(gpu, GC600, 0x19)) {
gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
}
- /*
- !!!! HACK ALERT !!!!
- Because people change device IDs without letting software
Hi
I have formally tested the patch with 5.7.10 - and it doesn't resolve the issue - sadly :(
From my testing, the reads on VIVS_HI_CHIP_PRODUCT_ID VIVS_HI_CHIP_ECO_ID need to be conditional - while VIVS_HI_CHIP_CUSTOMER_ID seems to be okay.
Uhh.. okay.. just send a V2 - thanks for testing :)
On Sun, Aug 23, 2020 at 09:10:25PM +0200, Christian Gmeiner wrote:
Hi
I have formally tested the patch with 5.7.10 - and it doesn't resolve the issue - sadly :(
From my testing, the reads on VIVS_HI_CHIP_PRODUCT_ID VIVS_HI_CHIP_ECO_ID need to be conditional - while VIVS_HI_CHIP_CUSTOMER_ID seems to be okay.
Uhh.. okay.. just send a V2 - thanks for testing :)
There is also something else going on with the GC600 - 5.4 worked fine, 5.8 doesn't - my 2D Xorg driver gets stuck waiting on a BO after just a couple of minutes. Looking in debugfs, there's a whole load of BOs that are listed as "active", yet the GPU is idle:
00020000: A 0 ( 7) 00000000 00000000 8294400 00010000: I 0 ( 1) 00000000 00000000 4096 00010000: I 0 ( 1) 00000000 00000000 4096 00010000: I 0 ( 1) 00000000 00000000 327680 00010000: A 0 ( 7) 00000000 00000000 8388608 00010000: I 0 ( 1) 00000000 00000000 8388608 00010000: I 0 ( 1) 00000000 00000000 8388608 00010000: A 0 ( 7) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 4) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 .... 00010000: A 0 ( 3) 00000000 00000000 8388608 Total 38 objects, 293842944 bytes
My guess is there's something up with the way a job completes that's causing the BOs not to be marked inactive. I haven't yet been able to debug any further.
Hi Russell,
Am Sonntag, den 23.08.2020, 20:19 +0100 schrieb Russell King - ARM Linux admin:
On Sun, Aug 23, 2020 at 09:10:25PM +0200, Christian Gmeiner wrote:
Hi
I have formally tested the patch with 5.7.10 - and it doesn't resolve the issue - sadly :(
From my testing, the reads on VIVS_HI_CHIP_PRODUCT_ID VIVS_HI_CHIP_ECO_ID need to be conditional - while VIVS_HI_CHIP_CUSTOMER_ID seems to be okay.
Uhh.. okay.. just send a V2 - thanks for testing :)
There is also something else going on with the GC600 - 5.4 worked fine, 5.8 doesn't - my 2D Xorg driver gets stuck waiting on a BO after just a couple of minutes. Looking in debugfs, there's a whole load of BOs that are listed as "active", yet the GPU is idle:
00020000: A 0 ( 7) 00000000 00000000 8294400 00010000: I 0 ( 1) 00000000 00000000 4096 00010000: I 0 ( 1) 00000000 00000000 4096 00010000: I 0 ( 1) 00000000 00000000 327680 00010000: A 0 ( 7) 00000000 00000000 8388608 00010000: I 0 ( 1) 00000000 00000000 8388608 00010000: I 0 ( 1) 00000000 00000000 8388608 00010000: A 0 ( 7) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 4) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 00010000: A 0 ( 3) 00000000 00000000 8388608 .... 00010000: A 0 ( 3) 00000000 00000000 8388608 Total 38 objects, 293842944 bytes
My guess is there's something up with the way a job completes that's causing the BOs not to be marked inactive. I haven't yet been able to debug any further.
The patch I just sent out should fix this issue. The DRM scheduler is doing some funny business which breaks our job done signalling if the GPU timeout has been hit, even if our timeout handler is just extending the timeout as the GPU is still working normally.
Regards, Lucas
linux-stable-mirror@lists.linaro.org