See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl --- drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
Am 07.12.21 um 02:32 schrieb Bas Nieuwenhuizen:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
Reviewed-by: Christian König christian.koenig@amd.com
Going to push that to drm-misc-fixes later today if nobody objects in the meantime.
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
- chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
- drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
Am 07.12.21 um 08:09 schrieb Christian König:
Am 07.12.21 um 02:32 schrieb Bas Nieuwenhuizen:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
Reviewed-by: Christian König christian.koenig@amd.com
Going to push that to drm-misc-fixes later today if nobody objects in the meantime.
Need to retreat that rb, Lionel correctly pointed out that this should never ever happen.
Christian.
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
Shouldn't we fix drm_syncobj_find_fence() instead?
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
- drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14097/diffs?commit... so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Happy to move this to drm_syncobj_find_fence.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
err:drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence);
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre... so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think.
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards, Christian.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
err:drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence);
On 07/12/2021 13:00, Christian König wrote:
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think.
We should probably update the docs then :
* Returns 0 on success or a negative error value on failure. On success @fence * contains a reference to the fence, which must be released by calling * dma_fence_put().
Looking at some of the kernel drivers, it looks like they don't all protect themselves against NULL pointers :
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c...
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/am...
-Lionel
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards, Christian.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
+ /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + }
drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
On 07/12/2021 13:00, Christian König wrote:
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses.
Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL.
Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think.
We should probably update the docs then :
- Returns 0 on success or a negative error value on failure. On
success @fence
- contains a reference to the fence, which must be released by calling
- dma_fence_put().
Looking at some of the kernel drivers, it looks like they don't all protect themselves against NULL pointers :
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c...
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/am...
amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence). But yeah I think it is a bit treacherous, especially as this only occurs with timeline semaphores.
-Lionel
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards, Christian.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err;
- /* If the requested seqno is already signaled
drm_syncobj_find_fence may
* return a NULL fence. To make sure the recipient gets
signalled, use
* a new fence instead.
*/
- if (!fence) {
fence = dma_fence_allocate_private_stub();
if (!fence) {
ret = -ENOMEM;
goto err;
}
- }
err:drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence);
Am 07.12.21 um 12:35 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin lionel.g.landwerlin@intel.com wrote:
On 07/12/2021 13:00, Christian König wrote:
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König christian.koenig@amd.com wrote:
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote: > See the comments in the code. Basically if the seqno is already > signalled then we get a NULL fence. If we then put the NULL fence > in a binary syncobj it counts as unsignalled, making that syncobj > pretty much useless for all expected uses. > > Not 100% sure about the transfer to a timeline syncobj but I > believe it is needed there too, as AFAICT the add_point function > assumes the fence isn't NULL. > > Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between > binary and timeline v2") > Cc: stable@vger.kernel.org > Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl > --- > drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/gpu/drm/drm_syncobj.c > b/drivers/gpu/drm/drm_syncobj.c > index fdd2ec87cdd1..eb28a40400d2 100644 > --- a/drivers/gpu/drm/drm_syncobj.c > +++ b/drivers/gpu/drm/drm_syncobj.c > @@ -861,6 +861,19 @@ static int > drm_syncobj_transfer_to_timeline(struct drm_file *file_private, > &fence); > if (ret) > goto err; > + > + /* If the requested seqno is already signaled > drm_syncobj_find_fence may > + * return a NULL fence. To make sure the recipient gets > signalled, use > + * a new fence instead. > + */ > + if (!fence) { > + fence = dma_fence_allocate_private_stub(); > + if (!fence) { > + ret = -ENOMEM; > + goto err; > + } > + } > + Shouldn't we fix drm_syncobj_find_fence() instead?
Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled?
Double checking the code that should never ever happen.
Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fre...
so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do.
My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled).
Ah, yes that suddenly makes more sense.
Happy to move this to drm_syncobj_find_fence.
No, I think that your current patch is fine.
That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think.
We should probably update the docs then :
- Returns 0 on success or a negative error value on failure. On
success @fence
- contains a reference to the fence, which must be released by calling
- dma_fence_put().
Looking at some of the kernel drivers, it looks like they don't all protect themselves against NULL pointers :
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo...
amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence). But yeah I think it is a bit treacherous, especially as this only occurs with timeline semaphores.
Mhm, that's a good point.
While I still think it makes more sense from the design perspective to return NULL to distinct that there is nothing to wait for, I see as well that this is it not the most defensive approach.
Ok in that case let's move it into drm_syncobj_find_fence() instead.
Just one more thing, the timestamp is now busted anyway (cause the original fence is already garbage collected). So we can probably use dma_fence_get_stub() instead of dma_fence_allocate_private_stub().
Regards, Christian.
-Lionel
Going to push your original patch if nobody has any more objections.
But somebody might want to take care of the IGT as well.
Regards, Christian.
Regards, Christian.
By returning a stub fence for the timeline case if there isn't one.
Because the same NULL fence check appears missing in amdgpu (and probably other drivers).
Also we should have tests for this in IGT.
AMD contributed some tests when this code was written but they never got reviewed :(
-Lionel
> chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); > if (!chain) { > ret = -ENOMEM; > @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file > *file_private, > args->src_point, args->flags, &fence); > if (ret) > goto err; > + > + /* If the requested seqno is already signaled > drm_syncobj_find_fence may > + * return a NULL fence. To make sure the recipient gets > signalled, use > + * a new fence instead. > + */ > + if (!fence) { > + fence = dma_fence_allocate_private_stub(); > + if (!fence) { > + ret = -ENOMEM; > + goto err; > + } > + } > + > drm_syncobj_replace_fence(binary_syncobj, fence); > dma_fence_put(fence); > err:
linux-stable-mirror@lists.linaro.org