From: Christian König christian.koenig@amd.com
[ Upstream commit 5d55ed19d4190d2c210ac05ac7a53f800a8c6fe5 ]
Those can be triggered trivially by userspace.
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Timur Kristóf timur.kristof@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my comprehensive analysis using semantic code tools and repository examination, here is my recommendation:
## **BACKPORT RECOMMENDATION: YES**
This commit **MUST be backported** to stable kernel trees as a **critical security fix**.
---
## **DETAILED ANALYSIS**
### **1. Semantic Analysis Tools Used:**
- **mcp__semcode__find_function**: Located both affected functions (`gfx_v11_0_ring_emit_ib_gfx` and `gfx_v12_0_ring_emit_ib_gfx`) - **mcp__semcode__find_callers**: Traced the call chain from `amdgpu_ib_schedule` (25 callers) including the critical `amdgpu_job_run` function - **Grep/Read tools**: Analyzed the userspace API definitions and call paths - **Git history analysis**: Confirmed the BUG_ON was present from initial GFX11/12 implementation
### **2. Security Impact - CRITICAL DoS Vulnerability:**
**The removed BUG_ON() checks a userspace-controlled flag:** ```c BUG_ON(ib->flags & AMDGPU_IB_FLAG_CE); // Line removed in gfx_v11_0.c:5867 BUG_ON(ib->flags & AMDGPU_IB_FLAG_CE); // Line removed in gfx_v12_0.c:4424 ```
**Evidence of userspace control:** - `AMDGPU_IB_FLAG_CE` is defined in `include/uapi/drm/amdgpu_drm.h:935` (UAPI header) - Userspace sets this via `chunk_ib->flags` in command submissions (amdgpu_cs.c:381) - The commit message explicitly states: **"Those can be triggered trivially by userspace"**
### **3. Call Chain Analysis - Confirmed Userspace Reachability:**
``` Userspace ioctl → amdgpu_cs.c (command submission with user-controlled chunk_ib->flags) → amdgpu_job_run (assigned to .run_job callback at amdgpu_job.c:467) → amdgpu_ib_schedule (called from amdgpu_job.c:378) → gfx_v11_0_ring_emit_ib_gfx / gfx_v12_0_ring_emit_ib_gfx → **BUG_ON() PANIC if AMDGPU_IB_FLAG_CE is set** ```
**Impact Assessment:** - Any process with access to `/dev/dri/renderD*` can crash the kernel - No special privileges required beyond GPU access - Affects all AMD GPUs using GFX version 11 and 12 hardware - Complete system denial-of-service
### **4. Code Context - Improper Error Handling:**
**Other GPU versions handle this correctly** (gfx_v10_0.c:8651-8654): ```c if (ib->flags & AMDGPU_IB_FLAG_CE) header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2); else header = PACKET3(PACKET3_INDIRECT_BUFFER, 2); ```
**GFX11/12 incorrectly used BUG_ON()** instead of either: - Gracefully handling the flag (like older versions) - Returning an error - Simply ignoring the flag (as this fix does)
The CE (Constant Engine) flag is a legacy feature from older GPU architectures. Newer hardware doesn't support it for this code path, but the kernel should never panic on invalid userspace input.
### **5. Change Characteristics - Perfect Backport Candidate:**
✅ **Bug fix**: Removes a kernel panic path ✅ **Security fix**: Prevents DoS vulnerability ✅ **Small and contained**: Only 2 lines removed across 2 files ✅ **No architectural changes**: No struct modifications or API changes ✅ **No new features**: Pure removal of incorrect assertion ✅ **Low regression risk**: Removing a panic path cannot cause new crashes ✅ **No dependencies**: Standalone fix with no prerequisite commits ✅ **Well-reviewed**: Reviewed-by and Acked-by tags from maintainers present
### **6. Stable Tree Compliance:**
This fix perfectly aligns with stable kernel rules (Documentation/process/stable-kernel-rules.rst): - "It must fix a real bug that bothers people" ✅ (DoS vulnerability) - "It must be obviously correct" ✅ (removing invalid assertion) - "It cannot be bigger than 100 lines" ✅ (2 lines) - "It must fix only one thing" ✅ (single issue) - Security fixes are high priority for stable ✅
### **7. Historical Context:**
Git analysis shows: - BUG_ON was present from initial GFX11 implementation (commit 3d879e81f0f9e) - This was a **design bug from the start** - should never have used BUG_ON on user input - Fixed in commit 5d55ed19d4190 in mainline - **Must be backported to all stable kernels with GFX11/12 support**
---
## **CONCLUSION:**
**This is a critical security fix that removes a trivially exploitable kernel panic.** Any user with GPU access can crash the system by setting a single flag bit in their command submission. The fix is minimal, safe, and has zero risk of regression.
**Backport Status: YES** - High priority for all affected stable trees.
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 -- 2 files changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c37527704d433..25a5f7fa5077d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -5864,8 +5864,6 @@ static void gfx_v11_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, unsigned vmid = AMDGPU_JOB_GET_VMID(job); u32 header, control = 0;
- BUG_ON(ib->flags & AMDGPU_IB_FLAG_CE); - header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
control |= ib->length_dw | (vmid << 24); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index fd44d5503e282..329632388b43e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c @@ -4421,8 +4421,6 @@ static void gfx_v12_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, unsigned vmid = AMDGPU_JOB_GET_VMID(job); u32 header, control = 0;
- BUG_ON(ib->flags & AMDGPU_IB_FLAG_CE); - header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
control |= ib->length_dw | (vmid << 24);