On Thu, Jan 23, 2025 at 08:52:13AM -0600, Jose Souza wrote:
On Thu, 2025-01-23 at 08:30 -0600, Lucas De Marchi wrote:
On Thu, Jan 23, 2025 at 08:14:11AM -0600, Jose Souza wrote:
On Wed, 2025-01-22 at 21:11 -0800, Lucas De Marchi wrote:
Having the exec queue snapshot inside a "GuC CT" section was always wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section headings and add tile info") tried to fix that bug, but with that also broke the mesa tool that parses the devcoredump, hence it was reverted in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool").
With the mesa tool also fixed, this can propagate as a fix on both kernel and userspace side to avoid unnecessary headache for a debug feature.
This will break older versions of the Mesa parser. Is this really necessary?
See cover letter with the mesa MR that would fix the tool to follow the kernel fix and work with both newer and older format. Linking it here anyway: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33177
Still someone running the older version of the parser with a new Xe KMD would not be able to parse it. I understand that we can break it but is this really worthy? not in my opinion.
because for the debug nature of this file, it's hard if we always keep the cruft around. In 5 years developers implementing new decoders will have to get data from random places because we will notice things are wrongly placed.
isn't it easier to do do it early so we don't increase the exposure and just say "kernel screwed that up and fixed it", then propagate the change in mesa in a stable release, just like we are doing in the kernel?
It's a fix so simple that IMO it's better than carrying the cruft ad infinitum on all the tools that may possibly parse the devcoredump.
Is it worth breaking the tool? In my opinion, it is not.
Also, do we need to discuss this now? Wouldn't it be better to focus on bringing the GuC log in first?
That's what the second patch does. We need to discuss both now and decide, otherwise we can't re-enable it and have either the guc log parser or mesa's aubinator_error_decode_xe broken.
I can't understand why it needs both, could you explain further?
we are already discussing it, why not? Also as I said there's the guc log parser, another tool, that is already expecting it in the other place. So if we are going to re-enable the guc log, it's the best opportunity to fix this, otherwise we will probably never do it and keep accumulating.
Lucas De Marchi
Lucas De Marchi
Cc: John Harrison John.C.Harrison@Intel.com Cc: Julia Filipchuk julia.filipchuk@intel.com Cc: José Roberto de Souza jose.souza@intel.com Cc: stable@vger.kernel.org Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool") Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_devcoredump.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index 81dc7795c0651..a7946a76777e7 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -119,11 +119,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, drm_puts(&p, "\n**** GuC CT ****\n"); xe_guc_ct_snapshot_print(ss->guc.ct, &p);
- /*
* Don't add a new section header here because the mesa debug decoder
* tool expects the context information to be in the 'GuC CT' section.
*/
- /* drm_puts(&p, "\n**** Contexts ****\n"); */
drm_puts(&p, "\n**** Contexts ****\n"); xe_guc_exec_queue_snapshot_print(ss->ge, &p);
drm_puts(&p, "\n**** Job ****\n");