Hi
Am 03.12.23 um 21:57 schrieb Alyssa Ross:
Thomas Zimmermann tzimmermann@suse.de writes:
Invoke drm_plane_helper_funcs.end_fb_access before drm_atomic_helper_commit_hw_done(). The latter function hands over ownership of the plane state to the following commit, which might free it. Releasing resources in end_fb_access then operates on undefined state. This bug has been observed with non-blocking commits when they are being queued up quickly.
Here is an example stack trace from the bug report. The plane state has been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
Unable to handle kernel paging request at virtual address 0000000100000049 [...] drm_gem_fb_vunmap+0x18/0x74 drm_gem_end_shadow_fb_access+0x1c/0x2c drm_atomic_helper_cleanup_planes+0x58/0xd8 drm_atomic_helper_commit_tail+0x90/0xa0 commit_tail+0x15c/0x188 commit_work+0x14/0x20
Fix this by running end_fb_access immediately after updating all planes in drm_atomic_helper_commit_planes(). The existing clean-up helper drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.
For aborted commits, roll back from drm_atomic_helper_prepare_planes() in the new helper drm_atomic_helper_unprepare_planes(). This case is different from regular cleanup, as we have to release the new state; regular cleanup releases the old state. The new helper also invokes cleanup_fb for all planes.
The changes mostly involve DRM's atomic helpers. Only two drivers, i915 and nouveau, implement their own commit function. Update them to invoke drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail function do not require changes.
v3:
- add drm_atomic_helper_unprepare_planes() for rolling back
- use correct state for end_fb_access
v2:
- fix test in drm_atomic_helper_cleanup_planes()
Reported-by: Alyssa Ross hi@alyssa.is Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/ Suggested-by: Daniel Vetter daniel@ffwll.ch Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers") Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: stable@vger.kernel.org # v6.2+
I've been running this for days now, and haven't had a single Oops. Given the rate with which I encountered them before in this configuration, it looks very likely that the issue is resolved.
Tested-by: Alyssa Ross hi@alyssa.is
And, once the wrong parameter name in the kerneldoc identified by the kernel test robot is resolved,
Reviewed-by: Alyssa Ross hi@alyssa.is
Great. I'll prepare another update so this fix can land before the next -fixes PR. Thanks a lot!
Best regards Thomas