On 07/12/2021 13:00, Christian König wrote:
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think.
We should probably update the docs then :
* Returns 0 on success or a negative error value on failure. On success @fence * contains a reference to the fence, which must be released by calling * dma_fence_put().
Looking at some of the kernel drivers, it looks like they don't all protect themselves against NULL pointers :
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c...
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/am...
-Lionel
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards, Christian.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err: