If dqm->ops.initialize() fails, add deallocate_hiq_sdma_mqd() to release the memory allocated by allocate_hiq_sdma_mqd(). Move deallocate_hiq_sdma_mqd() up to ensure proper function visibility at the point of use.
Fixes: 11614c36bc8f ("drm/amdkfd: Allocate MQD trunk for HIQ and SDMA") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li lihaoxiang@isrc.iscas.ac.cn --- Changes in v2: - Move deallocate_hiq_sdma_mqd() up. Thanks, Felix! - Add a Fixes tag. --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index d7a2e7178ea9..8af0929ca40a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2919,6 +2919,14 @@ static int allocate_hiq_sdma_mqd(struct device_queue_manager *dqm) return retval; }
+static void deallocate_hiq_sdma_mqd(struct kfd_node *dev, + struct kfd_mem_obj *mqd) +{ + WARN(!mqd, "No hiq sdma mqd trunk to free"); + + amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem); +} + struct device_queue_manager *device_queue_manager_init(struct kfd_node *dev) { struct device_queue_manager *dqm; @@ -3042,19 +3050,14 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_node *dev) return dqm; }
+ if (!dev->kfd->shared_resources.enable_mes) + deallocate_hiq_sdma_mqd(dev, &dqm->hiq_sdma_mqd); + out_free: kfree(dqm); return NULL; }
-static void deallocate_hiq_sdma_mqd(struct kfd_node *dev, - struct kfd_mem_obj *mqd) -{ - WARN(!mqd, "No hiq sdma mqd trunk to free"); - - amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem); -} - void device_queue_manager_uninit(struct device_queue_manager *dqm) { dqm->ops.stop(dqm);
…
Move deallocate_hiq_sdma_mqd() up to ensure proper function visibility at the point of use.
…
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2919,6 +2919,14 @@ static int allocate_hiq_sdma_mqd(struct device_queue_manager *dqm) return retval; } +static void deallocate_hiq_sdma_mqd(struct kfd_node *dev,
struct kfd_mem_obj *mqd)+{
- WARN(!mqd, "No hiq sdma mqd trunk to free");
- amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem);
+}
Is there also a need to reconsider the implementation of the applied null pointer check here?
Regards, Markus
On 2026-01-06 04:30, Markus Elfring wrote:
…
Move deallocate_hiq_sdma_mqd() up to ensure proper function visibility at the point of use.
…
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2919,6 +2919,14 @@ static int allocate_hiq_sdma_mqd(struct device_queue_manager *dqm) return retval; } +static void deallocate_hiq_sdma_mqd(struct kfd_node *dev,
struct kfd_mem_obj *mqd)+{
- WARN(!mqd, "No hiq sdma mqd trunk to free");
- amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem);
+}
Is there also a need to reconsider the implementation of the applied null pointer check here?
Yeah, we have a WARN if mqd is NULL but then we still call amdgpu_amdkfd_free_gtt_mem. There is a NULL pointer check in amdgpu_amdkfd_free_gtt_mem, but &mqd->gtt_mem won't be NULL because it's not at the start of struct kfd_mem_obj. So you'd get a kernel oops if mqd is ever NULL here.
That said, I've never seen anyone complain about this WARN and a subsequent kernel oops.
The only other place that calls deallocate_hiq_sdma_mqd is device_queue_manager_uninit (only if not using MES). It can only be called with a valid dqm if device_queue_manager_init succeeded, which always ends with a valid dqm->hiq_sdma_mqd if not using MES.
My conclusion is that this WARN is just unnecessary. But it's also harmless.
Regards, Felix
Regards, Markus
My conclusion is that this WARN is just unnecessary.
Would you like to omit such a questionable macro call then?
But it's also harmless.
How do you think about to avoid special development concerns here? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docu...
Regards, Markus
On 2026-01-09 03:37, Markus Elfring wrote:
My conclusion is that this WARN is just unnecessary.
Would you like to omit such a questionable macro call then?
I don't feel strongly about it. I already submitted Haoxiang's patch.
But it's also harmless.
How do you think about to avoid special development concerns here? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docu...
No. I think the WARN is used exactly as it was meant to be here: to check for something that should never happen.
Regards, Felix
Regards, Markus
[AMD Official Use Only - AMD Internal Distribution Only]
-----Original Message----- From: Haoxiang Li lihaoxiang@isrc.iscas.ac.cn Sent: Monday, January 5, 2026 8:58 PM To: Kuehling, Felix Felix.Kuehling@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; airlied@gmail.com; simona@ffwll.ch; Zeng, Oak Oak.Zeng@amd.com Cc: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-kernel@vger.kernel.org; Haoxiang Li lihaoxiang@isrc.iscas.ac.cn; stable@vger.kernel.org Subject: [PATCH v2] drm/amdkfd: fix a memory leak in device_queue_manager_init()
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
If dqm->ops.initialize() fails, add deallocate_hiq_sdma_mqd() to release the memory allocated by allocate_hiq_sdma_mqd(). Move deallocate_hiq_sdma_mqd() up to ensure proper function visibility at the point of use.
Fixes: 11614c36bc8f ("drm/amdkfd: Allocate MQD trunk for HIQ and SDMA") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li lihaoxiang@isrc.iscas.ac.cn --- Changes in v2: - Move deallocate_hiq_sdma_mqd() up. Thanks, Felix! - Add a Fixes tag. --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index d7a2e7178ea9..8af0929ca40a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2919,6 +2919,14 @@ static int allocate_hiq_sdma_mqd(struct device_queue_manager *dqm) return retval; }
+static void deallocate_hiq_sdma_mqd(struct kfd_node *dev, + struct kfd_mem_obj *mqd) { + WARN(!mqd, "No hiq sdma mqd trunk to free"); + + amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem); }
} in last line should be in a new line.
Other than that, patch lgtm. Reviewed-by: Oak.Zeng@amd.com + struct device_queue_manager *device_queue_manager_init(struct kfd_node *dev) { struct device_queue_manager *dqm; @@ -3042,19 +3050,14 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_node *dev) return dqm; }
+ if (!dev->kfd->shared_resources.enable_mes) + deallocate_hiq_sdma_mqd(dev, &dqm->hiq_sdma_mqd); + out_free: kfree(dqm); return NULL; }
-static void deallocate_hiq_sdma_mqd(struct kfd_node *dev, - struct kfd_mem_obj *mqd) -{ - WARN(!mqd, "No hiq sdma mqd trunk to free"); - - amdgpu_amdkfd_free_gtt_mem(dev->adev, &mqd->gtt_mem); -} - void device_queue_manager_uninit(struct device_queue_manager *dqm) { dqm->ops.stop(dqm); -- 2.25.1
On Thu, 8 Jan 2026 02:15:12 +0000, Zeng wrote:
} in last line should be in a new line.
Hi, Zeng! I rechecked my patch and found that this issue does not appear in the version I submitted. I’m not sure why this discrepancy occurred, but I’ve sent a v3 revision anyway and hope it now shows up correctly.
Other than that, patch lgtm. Reviewed-by: Oak.Zeng@amd.com
Thanks for review!
Thanks, Haoxiang Li
linux-stable-mirror@lists.linaro.org