On Wed, Apr 15, 2026 at 3:56 AM Christian König christian.koenig@amd.com wrote:
On 4/15/26 03:08, Zack Rusin wrote:
On Tue, Apr 14, 2026 at 9:25 AM Christian König christian.koenig@amd.com wrote:
On 4/14/26 12:55, popov.nkv@gmail.com wrote:
From: Vladimir Popov popov.nkv@gmail.com
If vmw_execbuf_fence_commands() call fails in vmw_kms_helper_validation_finish(), it sets *p_fence = NULL. If ctx->bo_list is not empty, the caller, vmw_kms_helper_validation_finish(), passes the fence through a chain of functions to dma_fence_is_array(), which causes a NULL pointer dereference in dma_fence_is_array():
vmw_kms_helper_validation_finish() // pass NULL fence vmw_validation_done() vmw_validation_bo_fence() ttm_eu_fence_buffer_objects() // pass NULL fence dma_resv_add_fence() dma_fence_is_container() dma_fence_is_array() // NULL deref
Well good catch, but that is clearly not the right fix.
I'm not an expert for the vmwgfx code but in case of an error vmw_validation_revert() should be called an not vmw_kms_helper_validation_finish().
To me the patch looks correct. This path is explicitly for submission failure and does BO backoff plus vmw_validation_res_unreserve(ctx, true). The backoff=true branch skips committing dirty-state / backup-MOB changes, which is only correct if commands were not committed. Here the commands have already been submitted; only fence creation failed. So I think unlocking BO reservations without attaching a fence, then letting vmw_validation_done() keep taking the success path for resources is correct.
Ah! I would just avoid adding more TTM exec code dependencies.
We also have the always signaled stub fence for such use cases. How about that change here:
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index e1f18020170a..8dcb8cd19e29 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -3843,7 +3843,7 @@ int vmw_execbuf_fence_commands(struct drm_file *file_priv, if (unlikely(ret != 0 && !synced)) { (void) vmw_fallback_wait(dev_priv, false, false, sequence, false, VMW_FENCE_WAIT_TIMEOUT);
*p_fence = NULL;
*p_fence = dma_fence_get_stub(); } return ret;
Yeah, that would be an ideal cleanup, but it needs a lot more work. The p_fence is a vmw_fence_obj so we'll need to write code that allows creation of vmw_fence_obj with a signaled dma_fence and then plumb that through the driver. We'll also have to change a bunch of places (especially in older kms code) in vmwgfx that treat null fence as "the device has already synchronized". It's the right path, but to fix this particular issue I'd be happy to take Vladimir patch for now and perhaps I'd ask Ian to put a proper cleanup on his todo.
z