From: David Francis David.Francis@amd.com
[ Upstream commit 85705b18ae7674347f8675f64b2b3115fb1d5629 ]
The kfd CRIU checkpoint ioctl would return an error if trying to checkpoint a process with no kfd buffer objects.
This is a normal case and should not be an error.
Reviewed-by: Felix Kuehling felix.kuehling@amd.com Signed-off-by: David Francis David.Francis@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes: Previously, the CRIU path rejected processes with no KFD buffer objects by requiring both a non-NULL `bos` pointer and a non-zero `num_bos`. The commit relaxes this so that a process with zero BOs is treated as a normal case instead of an error.
- Precise change: In `criu_restore`, the validation changes from rejecting zero BOs to only requiring `args->bos` when there actually are BOs: - New check only requires `bos` if `num_bos > 0`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570` - `(args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data || !args->priv_data_size || !args->num_devices` - This removes the old unconditional `!args->bos` and `!args->num_bos` rejection.
- Why it’s correct and safe: - Downstream restore code already handles zero BOs correctly: - Size checks scale with `num_bos`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2439-2440` - Zero-length allocations are fine; `kvmalloc_array(args->num_bos, ...)` and `kvzalloc(...)` safely handle `num_bos == 0` and `kvfree` is safe: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2445-2453, 2463-2467, 2495-2499` - `copy_from_user` and `copy_to_user` with size 0 are no-ops and safe even if `bos` is NULL: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2455-2461, 2487-2492` - The loop over BOs naturally skips when `num_bos == 0`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2479-2485` - For `num_bos > 0`, the new check still requires a valid `bos` pointer, preserving existing behavior where needed.
- Scope and risk: - Small, localized input validation fix in KFD CRIU restore path only; no architectural changes. - No impact on other subsystems; error handling paths remain unchanged. - Regression risk is minimal because it only relaxes a reject condition for a valid scenario and downstream code already supports zero BOs.
- User impact: - Fixes spurious `-EINVAL` on CRIU operations for processes without KFD BOs, which is a normal scenario per the commit message. - Improves reliability of CRIU-based workflows for AMD GPU compute processes.
- Stable backport criteria: - Important bugfix affecting real users. - Minimal, contained change with low risk. - No new features or API changes; aligns behavior with existing code expectations.
Note: While the commit message mentions the checkpoint ioctl, this change updates the restore validation (`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`). It still satisfies stable criteria by correcting CRIU handling for the no-BO case on restore, with the downstream code already safely handling `num_bos == 0`.
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 43115a3744694..8535a52a62cab 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2571,8 +2571,8 @@ static int criu_restore(struct file *filep, pr_debug("CRIU restore (num_devices:%u num_bos:%u num_objects:%u priv_data_size:%llu)\n", args->num_devices, args->num_bos, args->num_objects, args->priv_data_size);
- if (!args->bos || !args->devices || !args->priv_data || !args->priv_data_size || - !args->num_devices || !args->num_bos) + if ((args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data || + !args->priv_data_size || !args->num_devices) return -EINVAL;
mutex_lock(&p->mutex);