On Tue, Dec 10, 2024 at 01:37:11PM +0100, Greg Kroah-Hartman wrote:
On Tue, Dec 10, 2024 at 02:24:56PM +0200, Jani Nikula wrote:
On Tue, 10 Dec 2024, Genes Lists lists@sapience.com wrote:
On Tue, 2024-12-10 at 10:58 +0200, Jani Nikula wrote:
On Tue, 10 Dec 2024, Sakari Ailus sakari.ailus@linux.intel.com wrote:
Hi,
... FYI 6.12.4 got a crash shortly after booting in dma_alloc_attrs - maybe triggered in ipu6_probe. Crash only happened on laptop with ipu6. All other machines are running fine.
Have you read the dmesg further than the IPU6 related warning? The IPU6 driver won't work (maybe not even probe?) but if the system crashes, it appears unlikely the IPU6 drivers would have something to do with that. Look for warnings on linked list corruption later, they seem to be coming from the i915 driver.
And the list corruption is actually happening in cpu_latency_qos_update_request(). I don't see any i915 changes in 6.12.4 that could cause it.
I guess the question is, when did it work? Did 6.12.3 work?
BR, Jani.
6.12.1 worked
mainline - works (but only with i915 patch set [1] otherwise there
are no graphics at all)
[1] https://patchwork.freedesktop.org/series/141911/
- 6.12.3 - crashed (i see i915 not ipu6) and again it has cpu_latency_qos_update_request+0x61/0xc0
Thanks for testing.
There are no changes to either i915 or kernel/power between 6.12.1 and 6.12.4.
There are some changes to drm core, but none that could explain this.
Maybe try the same kernels a few more times to see if it's really deterministic? Not that I have obvious ideas where to go from there, but it's a clue nonetheless.
'git bisect' would be nice to run if possible...
I've reproduced the issue. It's caused by 6.12.y commit:
commit 6ac269abab9ca5ae910deb2d3ca54351c3467e99 Author: Bingbu Cao bingbu.cao@intel.com Date: Wed Oct 16 15:53:01 2024 +0800
media: ipu6: not override the dma_ops of device in driver
[ Upstream commit daabc5c64703432c4a8798421a3588c2c142c51b ]
It makes alloc_fw_msg_bufs() fail on isys_probe()
cpu_latency_qos_add_request(&isys->pm_qos, PM_QOS_DEFAULT_VALUE);
ret = alloc_fw_msg_bufs(isys, 20); if (ret < 0) goto out_remove_pkg_dir_shared_buffer;
And on error path we do not call cpu_latency_qos_remove_request() what cause pm_qos_request list corruption (it is memory use after free bug).
The problem will disappear after applying: https://lore.kernel.org/stable/20241209175416.59433-1-stanislaw.gruszka@linu... since the allocation will not longer fail.
But we also need to handle fail case correctly by adding cpu_latency_qos_remove_request() on error path. This requires mainline fix, I'll post it.
Regards Stanislaw