From: Tejun Heo tj@kernel.org
[ Upstream commit d452972858e5cfa4262320ab74fe8f016460b96f ]
The qmap dump operation was destructively consuming queue entries while displaying them. As dump can be triggered anytime, this can easily lead to stalls. Add a temporary dump_store queue and modify the dump logic to pop entries, display them, and then restore them back to the original queue. This allows dump operations to be performed without affecting the scheduler's queue state.
Note that if racing against new enqueues during dump, ordering can get mixed up, but this is acceptable for debugging purposes.
Acked-by: Andrea Righi arighi@nvidia.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Backport Analysis: YES
**This commit SHOULD be backported to stable kernel trees** (and notably, it has already been backported to 6.17 stable as commit 2f64156df4204 by Sasha Levin on Oct 1, 2025).
### Detailed Analysis
#### 1. **Nature of the Bug (Critical Factor)**
The bug is **real and impactful**, not theoretical:
- **Location**: `tools/sched_ext/scx_qmap.bpf.c` lines 567-588 (qmap_dump function) - **Problem**: The dump operation uses `bpf_map_pop_elem(fifo, &pid)` to display queue contents but **never restores the entries** - **Impact**: Tasks are permanently removed from scheduler queues, causing **system stalls** - **Trigger**: Can be invoked at any time via SysRq-D (as documented in commit 07814a9439a3b) or during error exits
From the old code (lines 581-586): ```c bpf_repeat(4096) { if (bpf_map_pop_elem(fifo, &pid)) break; scx_bpf_dump(" %d", pid); // ← Never restored! } ```
This is a destructive read that removes tasks from the runnable queue, effectively "losing" them from the scheduler.
#### 2. **The Fix is Simple and Safe**
The fix adds 17 insertions, 1 deletion (well under the 100-line limit):
- Adds one new queue map (`dump_store`) for temporary storage - Modifies dump logic to: pop → store → display → restore - Two `bpf_repeat` loops: first to pop and display, second to restore - Low regression risk: only affects dump operations, not scheduling path
**Code changes at lines 579-600:** ```c // First loop: pop from queue, save to dump_store, display bpf_map_push_elem(&dump_store, &pid, 0); // ← Save for restoration scx_bpf_dump(" %d", pid);
// Second loop: restore from dump_store back to original queue bpf_map_push_elem(fifo, &pid, 0); // ← Restore to scheduler queue ```
#### 3. **Meets Stable Kernel Criteria**
Per `Documentation/process/stable-kernel-rules.rst`:
✅ **Already in mainline**: Upstream commit d452972858e5c ✅ **Obviously correct**: Simple save-restore pattern ✅ **Small size**: 41 total lines of diff ✅ **Fixes real bug**: Prevents stalls from destructive dump operations ✅ **User impact**: Anyone triggering dumps (SysRq-D, error exits) on systems running scx_qmap would experience task loss
#### 4. **Why This Qualifies Despite Being in tools/**
While `tools/` changes are typically not backported, this case is exceptional:
1. **BPF programs run in kernel space**: `scx_qmap.bpf.c` is not userspace tooling—it's a BPF program loaded into the kernel that implements actual scheduling decisions
2. **sched_ext schedulers are functional**: Although documented as "example schedulers" in the README (lines 6-15), they are **production-capable**. The README states: "Some of the examples are performant, production-ready schedulers" (line 11)
3. **Debugging is critical infrastructure**: The dump operation (added in commit 07814a9439a3b "Print debug dump after an error exit") is essential for debugging BPF scheduler failures. A broken dump that causes stalls defeats its purpose
4. **Already validated by stable maintainer**: Sasha Levin backported this on Oct 1, 2025, confirming it meets stable criteria
#### 5. **Historical Context**
- **sched_ext introduced**: v6.12-rc1 (commit f0e1a0643a59b) - **Dump operations added**: June 18, 2024 (commit 07814a9439a3b) - **Bug window**: ~15 months of potential stalls from dump operations - **Fix date**: September 23, 2025 (upstream d452972858e5c)
#### 6. **No Security CVE, But Real Impact**
My search specialist agent found no CVE assigned to this issue, but that doesn't diminish its importance:
- Stalls impact system availability - Debugging a broken scheduler with a broken dump tool compounds problems - Users investigating scheduler issues via SysRq-D would inadvertently cause more stalls
#### 7. **Risk Assessment**
**Regression risk**: **Very Low** - Only modifies dump operations (debugging path) - Does not touch scheduling hot paths - Temporary storage pattern is standard and safe - Race condition with concurrent enqueues is explicitly acceptable (per commit message: "ordering can get mixed up, but this is acceptable for debugging purposes")
**Benefit**: **High for affected users** - Makes dump operations actually usable - Prevents cascading failures during debugging - Enables proper root cause analysis of scheduler issues
### Conclusion
**YES - This commit should be backported.** It fixes a real bug causing system stalls, is small and safe, and affects functionality that users rely on for debugging. The fact that it has already been accepted into 6.17 stable by Sasha Levin validates this assessment. This is an appropriate stable backport that improves system reliability for users of sched_ext schedulers.
tools/sched_ext/scx_qmap.bpf.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c index 69d877501cb72..cd50a94326e3a 100644 --- a/tools/sched_ext/scx_qmap.bpf.c +++ b/tools/sched_ext/scx_qmap.bpf.c @@ -56,7 +56,8 @@ struct qmap { queue1 SEC(".maps"), queue2 SEC(".maps"), queue3 SEC(".maps"), - queue4 SEC(".maps"); + queue4 SEC(".maps"), + dump_store SEC(".maps");
struct { __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); @@ -578,11 +579,26 @@ void BPF_STRUCT_OPS(qmap_dump, struct scx_dump_ctx *dctx) return;
scx_bpf_dump("QMAP FIFO[%d]:", i); + + /* + * Dump can be invoked anytime and there is no way to iterate in + * a non-destructive way. Pop and store in dump_store and then + * restore afterwards. If racing against new enqueues, ordering + * can get mixed up. + */ bpf_repeat(4096) { if (bpf_map_pop_elem(fifo, &pid)) break; + bpf_map_push_elem(&dump_store, &pid, 0); scx_bpf_dump(" %d", pid); } + + bpf_repeat(4096) { + if (bpf_map_pop_elem(&dump_store, &pid)) + break; + bpf_map_push_elem(fifo, &pid, 0); + } + scx_bpf_dump("\n"); } }