From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Fix this by adding a NULL check in vmw_validation_bo_fence(): if the fence is NULL, fall back to ttm_eu_backoff_reservation()to safely release the buffer object reservations without attempting to add a NULL fence to dma_resv. This is safe because when fence is NULL, vmw_fallback_wait() has already been called inside vmw_execbuf_fence_commands() to synchronize the GPU.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 038ecc503236 ("drm/vmwgfx: Add a validation module v2") Cc: stable@vger.kernel.org Signed-off-by: Vladimir Popov popov.nkv@gmail.com --- drivers/gpu/drm/vmwgfx/vmwgfx_validation.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h index 353d837907d8..fc04555ca505 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h @@ -127,16 +127,23 @@ vmw_validation_bo_reserve(struct vmw_validation_context *ctx, * vmw_validation_bo_fence - Unreserve and fence buffer objects registered * with a validation context * @ctx: The validation context + * @fence: Fence with which to fence all buffer objects taking part in the + * command submission. * * This function unreserves the buffer objects previously reserved using - * vmw_validation_bo_reserve, and fences them with a fence object. + * vmw_validation_bo_reserve, and fences them with a fence object if the + * given fence object is not NULL. */ static inline void vmw_validation_bo_fence(struct vmw_validation_context *ctx, struct vmw_fence_obj *fence) { - ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list, - (void *) fence); + /* fence is able to be NULL if vmw_execbuf_fence_commands() fails */ + if (fence) + ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list, + (void *)fence); + else + ttm_eu_backoff_reservation(&ctx->ticket, &ctx->bo_list); }
/**
On 4/14/26 12:55, popov.nkv@gmail.com wrote:
From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
Regards, Christian.
Fix this by adding a NULL check in vmw_validation_bo_fence(): if the fence is NULL, fall back to ttm_eu_backoff_reservation()to safely release the buffer object reservations without attempting to add a NULL fence to dma_resv. This is safe because when fence is NULL, vmw_fallback_wait() has already been called inside vmw_execbuf_fence_commands() to synchronize the GPU.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 038ecc503236 ("drm/vmwgfx: Add a validation module v2") Cc: stable@vger.kernel.org Signed-off-by: Vladimir Popov popov.nkv@gmail.com
drivers/gpu/drm/vmwgfx/vmwgfx_validation.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h index 353d837907d8..fc04555ca505 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h @@ -127,16 +127,23 @@ vmw_validation_bo_reserve(struct vmw_validation_context *ctx,
- vmw_validation_bo_fence - Unreserve and fence buffer objects registered
- with a validation context
- @ctx: The validation context
- @fence: Fence with which to fence all buffer objects taking part in the
- command submission.
- This function unreserves the buffer objects previously reserved using
- vmw_validation_bo_reserve, and fences them with a fence object.
- vmw_validation_bo_reserve, and fences them with a fence object if the
*/
- given fence object is not NULL.
static inline void vmw_validation_bo_fence(struct vmw_validation_context *ctx, struct vmw_fence_obj *fence) {
ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list,(void *) fence);
/* fence is able to be NULL if vmw_execbuf_fence_commands() fails */if (fence)ttm_eu_fence_buffer_objects(&ctx->ticket, &ctx->bo_list,(void *)fence);elsettm_eu_backoff_reservation(&ctx->ticket, &ctx->bo_list);}
/**
2.43.0
On Tue, Apr 14, 2026 at 9:25 AM Christian König christian.koenig@amd.com wrote:
On 4/14/26 12:55, popov.nkv@gmail.com wrote:
From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
To me the patch looks correct. This path is explicitly for submission failure and does BO backoff plus vmw_validation_res_unreserve(ctx, true). The backoff=true branch skips committing dirty-state / backup-MOB changes, which is only correct if commands were not committed. Here the commands have already been submitted; only fence creation failed. So I think unlocking BO reservations without attaching a fence, then letting vmw_validation_done() keep taking the success path for resources is correct.
iirc the same helper is used by execbuf, and the shared-helper fix correctly covers both paths so this is probably not only a kms issue.
Untangling this code would make sense because it's confusing, but that's not something I'd expect Vladimir to do :)
z
On 4/15/26 03:08, Zack Rusin wrote:
On Tue, Apr 14, 2026 at 9:25 AM Christian König christian.koenig@amd.com wrote:
On 4/14/26 12:55, popov.nkv@gmail.com wrote:
From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
To me the patch looks correct. This path is explicitly for submission failure and does BO backoff plus vmw_validation_res_unreserve(ctx, true). The backoff=true branch skips committing dirty-state / backup-MOB changes, which is only correct if commands were not committed. Here the commands have already been submitted; only fence creation failed. So I think unlocking BO reservations without attaching a fence, then letting vmw_validation_done() keep taking the success path for resources is correct.
Ah! I would just avoid adding more TTM exec code dependencies.
We also have the always signaled stub fence for such use cases. How about that change here:
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index e1f18020170a..8dcb8cd19e29 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -3843,7 +3843,7 @@ int vmw_execbuf_fence_commands(struct drm_file *file_priv, if (unlikely(ret != 0 && !synced)) { (void) vmw_fallback_wait(dev_priv, false, false, sequence, false, VMW_FENCE_WAIT_TIMEOUT); - *p_fence = NULL; + *p_fence = dma_fence_get_stub(); }
return ret;
iirc the same helper is used by execbuf, and the shared-helper fix correctly covers both paths so this is probably not only a kms issue.
Untangling this code would make sense because it's confusing, but that's not something I'd expect Vladimir to do :)
Yeah fence memory allocation should definitely be move before submitting the commands.
But that is clearly more work.
Thanks, Christian.
z
On Wed, Apr 15, 2026 at 3:56 AM Christian König christian.koenig@amd.com wrote:
On 4/15/26 03:08, Zack Rusin wrote:
On Tue, Apr 14, 2026 at 9:25 AM Christian König christian.koenig@amd.com wrote:
On 4/14/26 12:55, popov.nkv@gmail.com wrote:
From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
To me the patch looks correct. This path is explicitly for submission failure and does BO backoff plus vmw_validation_res_unreserve(ctx, true). The backoff=true branch skips committing dirty-state / backup-MOB changes, which is only correct if commands were not committed. Here the commands have already been submitted; only fence creation failed. So I think unlocking BO reservations without attaching a fence, then letting vmw_validation_done() keep taking the success path for resources is correct.
Ah! I would just avoid adding more TTM exec code dependencies.
We also have the always signaled stub fence for such use cases. How about that change here:
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index e1f18020170a..8dcb8cd19e29 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -3843,7 +3843,7 @@ int vmw_execbuf_fence_commands(struct drm_file *file_priv, if (unlikely(ret != 0 && !synced)) { (void) vmw_fallback_wait(dev_priv, false, false, sequence, false, VMW_FENCE_WAIT_TIMEOUT);
*p_fence = NULL;
*p_fence = dma_fence_get_stub(); } return ret;
Yeah, that would be an ideal cleanup, but it needs a lot more work. The p_fence is a vmw_fence_obj so we'll need to write code that allows creation of vmw_fence_obj with a signaled dma_fence and then plumb that through the driver. We'll also have to change a bunch of places (especially in older kms code) in vmwgfx that treat null fence as "the device has already synchronized". It's the right path, but to fix this particular issue I'd be happy to take Vladimir patch for now and perhaps I'd ask Ian to put a proper cleanup on his todo.
z
linaro-mm-sig@lists.linaro.org