Invoke drm_plane_helper_funcs.end_fb_access before drm_atomic_helper_commit_hw_done(). The latter function hands over ownership of the plane state to the following commit, which might free it. Releasing resources in end_fb_access then operates on undefined state. This bug has been observed with non-blocking commits when they are being queued up quickly.
Here is an example stack trace from the bug report. The plane state has been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
Unable to handle kernel paging request at virtual address 0000000100000049 [...] drm_gem_fb_vunmap+0x18/0x74 drm_gem_end_shadow_fb_access+0x1c/0x2c drm_atomic_helper_cleanup_planes+0x58/0xd8 drm_atomic_helper_commit_tail+0x90/0xa0 commit_tail+0x15c/0x188 commit_work+0x14/0x20
Fix this by running end_fb_access immediately after updating all planes in drm_atomic_helper_commit_planes(). The existing clean-up helper drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.
For aborted commits, roll back from drm_atomic_helper_prepare_planes() in the new helper drm_atomic_helper_unprepare_planes(). This case is different from regular cleanup, as we have to release the new state; regular cleanup releases the old state. The new helper also invokes cleanup_fb for all planes.
The changes mostly involve DRM's atomic helpers. Only two drivers, i915 and nouveau, implement their own commit function. Update them to invoke drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail function do not require changes.
v3: * add drm_atomic_helper_unprepare_planes() for rolling back * use correct state for end_fb_access v2: * fix test in drm_atomic_helper_cleanup_planes()
Reported-by: Alyssa Ross hi@alyssa.is Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/ Suggested-by: Daniel Vetter daniel@ffwll.ch Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers") Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: stable@vger.kernel.org # v6.2+ --- drivers/gpu/drm/drm_atomic_helper.c | 78 +++++++++++++------- drivers/gpu/drm/i915/display/intel_display.c | 2 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +- include/drm/drm_atomic_helper.h | 2 + 4 files changed, 56 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index c3f677130def0..9adec3eb78563 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -2012,7 +2012,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, return ret;
drm_atomic_helper_async_commit(dev, state); - drm_atomic_helper_cleanup_planes(dev, state); + drm_atomic_helper_unprepare_planes(dev, state);
return 0; } @@ -2072,7 +2072,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, return 0;
err: - drm_atomic_helper_cleanup_planes(dev, state); + drm_atomic_helper_unprepare_planes(dev, state); return ret; } EXPORT_SYMBOL(drm_atomic_helper_commit); @@ -2650,6 +2650,39 @@ int drm_atomic_helper_prepare_planes(struct drm_device *dev, } EXPORT_SYMBOL(drm_atomic_helper_prepare_planes);
+/** + * drm_atomic_helper_unprepare_planes - release plane resources on aborts + * @dev: DRM device + * @old_state: atomic state object with old state structures + * + * This function cleans up plane state, specifically framebuffers, from the + * atomic state. It undoes the effects of drm_atomic_helper_prepare_planes() + * when aborting an atomic commit. For cleaning up after a successful commit + * use drm_atomic_helper_cleanup_planes(). + */ +void drm_atomic_helper_unprepare_planes(struct drm_device *dev, + struct drm_atomic_state *state) +{ + struct drm_plane *plane; + struct drm_plane_state *new_plane_state; + int i; + + for_each_new_plane_in_state(state, plane, new_plane_state, i) { + const struct drm_plane_helper_funcs *funcs = plane->helper_private; + + if (funcs->end_fb_access) + funcs->end_fb_access(plane, new_plane_state); + } + + for_each_new_plane_in_state(state, plane, new_plane_state, i) { + const struct drm_plane_helper_funcs *funcs = plane->helper_private; + + if (funcs->cleanup_fb) + funcs->cleanup_fb(plane, new_plane_state); + } +} +EXPORT_SYMBOL(drm_atomic_helper_unprepare_planes); + static bool plane_crtc_active(const struct drm_plane_state *state) { return state->crtc && state->crtc->state->active; @@ -2784,6 +2817,17 @@ void drm_atomic_helper_commit_planes(struct drm_device *dev,
funcs->atomic_flush(crtc, old_state); } + + /* + * Signal end of framebuffer access here before hw_done. After hw_done, + * a later commit might have already released the plane state. + */ + for_each_old_plane_in_state(old_state, plane, old_plane_state, i) { + const struct drm_plane_helper_funcs *funcs = plane->helper_private; + + if (funcs->end_fb_access) + funcs->end_fb_access(plane, old_plane_state); + } } EXPORT_SYMBOL(drm_atomic_helper_commit_planes);
@@ -2911,40 +2955,22 @@ EXPORT_SYMBOL(drm_atomic_helper_disable_planes_on_crtc); * configuration. Hence the old configuration must be perserved in @old_state to * be able to call this function. * - * This function must also be called on the new state when the atomic update - * fails at any point after calling drm_atomic_helper_prepare_planes(). + * This function may not be called on the new state when the atomic update + * fails at any point after calling drm_atomic_helper_prepare_planes(). Use + * drm_atomic_helper_unprepare_planes() in this case. */ void drm_atomic_helper_cleanup_planes(struct drm_device *dev, struct drm_atomic_state *old_state) { struct drm_plane *plane; - struct drm_plane_state *old_plane_state, *new_plane_state; + struct drm_plane_state *old_plane_state; int i;
- for_each_oldnew_plane_in_state(old_state, plane, old_plane_state, new_plane_state, i) { + for_each_old_plane_in_state(old_state, plane, old_plane_state, i) { const struct drm_plane_helper_funcs *funcs = plane->helper_private;
- if (funcs->end_fb_access) - funcs->end_fb_access(plane, new_plane_state); - } - - for_each_oldnew_plane_in_state(old_state, plane, old_plane_state, new_plane_state, i) { - const struct drm_plane_helper_funcs *funcs; - struct drm_plane_state *plane_state; - - /* - * This might be called before swapping when commit is aborted, - * in which case we have to cleanup the new state. - */ - if (old_plane_state == plane->state) - plane_state = new_plane_state; - else - plane_state = old_plane_state; - - funcs = plane->helper_private; - if (funcs->cleanup_fb) - funcs->cleanup_fb(plane, plane_state); + funcs->cleanup_fb(plane, old_plane_state); } } EXPORT_SYMBOL(drm_atomic_helper_cleanup_planes); diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 5cf162628b95e..ace834c9e8f9f 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -7354,7 +7354,7 @@ int intel_atomic_commit(struct drm_device *dev, struct drm_atomic_state *_state, for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) intel_color_cleanup_commit(new_crtc_state);
- drm_atomic_helper_cleanup_planes(dev, &state->base); + drm_atomic_helper_unprepare_planes(dev, &state->base); intel_runtime_pm_put(&dev_priv->runtime_pm, state->wakeref); return ret; } diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 11fe75b68e95c..8d37a694b7724 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2476,7 +2476,7 @@ nv50_disp_atomic_commit(struct drm_device *dev,
err_cleanup: if (ret) - drm_atomic_helper_cleanup_planes(dev, state); + drm_atomic_helper_unprepare_planes(dev, state); done: pm_runtime_put_autosuspend(dev->dev); return ret; diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h index 536a0b0091c3a..006b5c977ad77 100644 --- a/include/drm/drm_atomic_helper.h +++ b/include/drm/drm_atomic_helper.h @@ -97,6 +97,8 @@ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev,
int drm_atomic_helper_prepare_planes(struct drm_device *dev, struct drm_atomic_state *state); +void drm_atomic_helper_unprepare_planes(struct drm_device *dev, + struct drm_atomic_state *state);
#define DRM_PLANE_COMMIT_ACTIVE_ONLY BIT(0) #define DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODESET BIT(1)
Hi Thomas,
kernel test robot noticed the following build warnings:
[auto build test WARNING on drm-misc/drm-misc-next] [also build test WARNING on drm-intel/for-linux-next drm-intel/for-linux-next-fixes linus/master v6.7-rc3 next-20231128] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Zimmermann/drm-atomic-... base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20231128090158.15564-1-tzimmermann%40suse.de patch subject: [PATCH v3] drm/atomic-helpers: Invoke end_fb_access while owning plane state config: i386-buildonly-randconfig-002-20231128 (https://download.01.org/0day-ci/archive/20231128/202311282308.nZrBqKpT-lkp@i...) compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231128/202311282308.nZrBqKpT-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202311282308.nZrBqKpT-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/gpu/drm/drm_atomic_helper.c:2665: warning: Function parameter or member 'state' not described in 'drm_atomic_helper_unprepare_planes' drivers/gpu/drm/drm_atomic_helper.c:2665: warning: Excess function parameter 'old_state' description in 'drm_atomic_helper_unprepare_planes'
vim +2665 drivers/gpu/drm/drm_atomic_helper.c
2652 2653 /** 2654 * drm_atomic_helper_unprepare_planes - release plane resources on aborts 2655 * @dev: DRM device 2656 * @old_state: atomic state object with old state structures 2657 * 2658 * This function cleans up plane state, specifically framebuffers, from the 2659 * atomic state. It undoes the effects of drm_atomic_helper_prepare_planes() 2660 * when aborting an atomic commit. For cleaning up after a successful commit 2661 * use drm_atomic_helper_cleanup_planes(). 2662 */ 2663 void drm_atomic_helper_unprepare_planes(struct drm_device *dev, 2664 struct drm_atomic_state *state)
2665 {
2666 struct drm_plane *plane; 2667 struct drm_plane_state *new_plane_state; 2668 int i; 2669 2670 for_each_new_plane_in_state(state, plane, new_plane_state, i) { 2671 const struct drm_plane_helper_funcs *funcs = plane->helper_private; 2672 2673 if (funcs->end_fb_access) 2674 funcs->end_fb_access(plane, new_plane_state); 2675 } 2676 2677 for_each_new_plane_in_state(state, plane, new_plane_state, i) { 2678 const struct drm_plane_helper_funcs *funcs = plane->helper_private; 2679 2680 if (funcs->cleanup_fb) 2681 funcs->cleanup_fb(plane, new_plane_state); 2682 } 2683 } 2684 EXPORT_SYMBOL(drm_atomic_helper_unprepare_planes); 2685
Thomas Zimmermann tzimmermann@suse.de writes:
Invoke drm_plane_helper_funcs.end_fb_access before drm_atomic_helper_commit_hw_done(). The latter function hands over ownership of the plane state to the following commit, which might free it. Releasing resources in end_fb_access then operates on undefined state. This bug has been observed with non-blocking commits when they are being queued up quickly.
Here is an example stack trace from the bug report. The plane state has been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
Unable to handle kernel paging request at virtual address 0000000100000049 [...] drm_gem_fb_vunmap+0x18/0x74 drm_gem_end_shadow_fb_access+0x1c/0x2c drm_atomic_helper_cleanup_planes+0x58/0xd8 drm_atomic_helper_commit_tail+0x90/0xa0 commit_tail+0x15c/0x188 commit_work+0x14/0x20
Fix this by running end_fb_access immediately after updating all planes in drm_atomic_helper_commit_planes(). The existing clean-up helper drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.
For aborted commits, roll back from drm_atomic_helper_prepare_planes() in the new helper drm_atomic_helper_unprepare_planes(). This case is different from regular cleanup, as we have to release the new state; regular cleanup releases the old state. The new helper also invokes cleanup_fb for all planes.
The changes mostly involve DRM's atomic helpers. Only two drivers, i915 and nouveau, implement their own commit function. Update them to invoke drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail function do not require changes.
v3:
- add drm_atomic_helper_unprepare_planes() for rolling back
- use correct state for end_fb_access
v2:
- fix test in drm_atomic_helper_cleanup_planes()
Reported-by: Alyssa Ross hi@alyssa.is Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/ Suggested-by: Daniel Vetter daniel@ffwll.ch Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers") Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: stable@vger.kernel.org # v6.2+
I've been running this for days now, and haven't had a single Oops. Given the rate with which I encountered them before in this configuration, it looks very likely that the issue is resolved.
Tested-by: Alyssa Ross hi@alyssa.is
And, once the wrong parameter name in the kerneldoc identified by the kernel test robot is resolved,
Reviewed-by: Alyssa Ross hi@alyssa.is
Hi
Am 03.12.23 um 21:57 schrieb Alyssa Ross:
Thomas Zimmermann tzimmermann@suse.de writes:
Invoke drm_plane_helper_funcs.end_fb_access before drm_atomic_helper_commit_hw_done(). The latter function hands over ownership of the plane state to the following commit, which might free it. Releasing resources in end_fb_access then operates on undefined state. This bug has been observed with non-blocking commits when they are being queued up quickly.
Here is an example stack trace from the bug report. The plane state has been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
Unable to handle kernel paging request at virtual address 0000000100000049 [...] drm_gem_fb_vunmap+0x18/0x74 drm_gem_end_shadow_fb_access+0x1c/0x2c drm_atomic_helper_cleanup_planes+0x58/0xd8 drm_atomic_helper_commit_tail+0x90/0xa0 commit_tail+0x15c/0x188 commit_work+0x14/0x20
Fix this by running end_fb_access immediately after updating all planes in drm_atomic_helper_commit_planes(). The existing clean-up helper drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.
For aborted commits, roll back from drm_atomic_helper_prepare_planes() in the new helper drm_atomic_helper_unprepare_planes(). This case is different from regular cleanup, as we have to release the new state; regular cleanup releases the old state. The new helper also invokes cleanup_fb for all planes.
The changes mostly involve DRM's atomic helpers. Only two drivers, i915 and nouveau, implement their own commit function. Update them to invoke drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail function do not require changes.
v3:
- add drm_atomic_helper_unprepare_planes() for rolling back
- use correct state for end_fb_access
v2:
- fix test in drm_atomic_helper_cleanup_planes()
Reported-by: Alyssa Ross hi@alyssa.is Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/ Suggested-by: Daniel Vetter daniel@ffwll.ch Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers") Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: stable@vger.kernel.org # v6.2+
I've been running this for days now, and haven't had a single Oops. Given the rate with which I encountered them before in this configuration, it looks very likely that the issue is resolved.
Tested-by: Alyssa Ross hi@alyssa.is
And, once the wrong parameter name in the kerneldoc identified by the kernel test robot is resolved,
Reviewed-by: Alyssa Ross hi@alyssa.is
Great. I'll prepare another update so this fix can land before the next -fixes PR. Thanks a lot!
Best regards Thomas
linux-stable-mirror@lists.linaro.org