From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 31bf77dcc3810e08bcc7d15470e92cdfffb7f7f1 ]
net_iov / freelist / etc. arrays can be quite long, make sure they're accounted.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it does: Switches three kvmalloc_array() allocations used by io_uring zcrx areas from GFP_KERNEL to GFP_KERNEL_ACCOUNT so their memory is charged to the creating task’s memcg. Specifically: - `area->nia.niovs` allocation: io_uring/zcrx.c:425 - `area->freelist` allocation: io_uring/zcrx.c:430 - `area->user_refs` allocation: io_uring/zcrx.c:435
Why it matters - Fixes unaccounted kernel memory: These arrays can be very large (one entry per page of the registered area). Without GFP_KERNEL_ACCOUNT, a cgroup can allocate significant kernel memory that is not charged to its memcg, breaking containment and potentially causing host memory pressure. The commit explicitly addresses this: “arrays can be quite long, make sure they're accounted.” - Brings consistency with existing accounting in the same path: The user memory backing the area is already accounted to memcg via `sg_alloc_table_from_pages(..., GFP_KERNEL_ACCOUNT)` (io_uring/zcrx.c:196) and to the io_uring context via `io_account_mem()` (io_uring/zcrx.c:205). Accounting these control arrays aligns with that design and closes a loophole where only the big page backing was charged but the (potentially multi‑MiB) array metadata was not. - Scope is tiny and contained: The change is three flag substitutions within `io_zcrx_create_area()` and has no API/ABI or behavioral changes beyond proper memcg charging. No architectural changes; hot paths are unaffected (this is registration-time allocation).
Risk assessment - Low regression risk: Uses a long-standing flag (`GFP_KERNEL_ACCOUNT`) already used in this file for the data path (io_uring/zcrx.c:196). The only behavioral change is that allocations will now fail earlier with `-ENOMEM` if a cgroup’s limits would be exceeded—this is the desired and correct behavior for accounting fixes. - No ordering dependencies: The patch doesn’t rely on recent refactors; the affected allocations exist in v6.15–v6.17 and are currently done with `GFP_KERNEL`. The change applies cleanly to those stable series where `io_uring/zcrx.c` is present.
Stable tree fit - Fixes a real bug affecting users: memcg under-accounting in a new but shipped subsystem (zcrx is present since v6.15). - Minimal, localized, and low risk: Three flag changes in one function. - No feature additions or architectural changes: Pure accounting fix. - Consistent with stable policy: Similar accounting fixes are regularly accepted; related earlier work in this area explicitly targeted stable (e.g., “io_uring/zcrx: account area memory” carries a `Cc: stable@vger.kernel.org`, complementing this change).
Conclusion - Backporting will prevent unaccounted kernel memory growth from zcrx area metadata, aligning with memcg expectations and improving containment with negligible risk.
io_uring/zcrx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 39d1ef52a57b1..5928544cd1687 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -426,17 +426,17 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
ret = -ENOMEM; area->nia.niovs = kvmalloc_array(nr_iovs, sizeof(area->nia.niovs[0]), - GFP_KERNEL | __GFP_ZERO); + GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!area->nia.niovs) goto err;
area->freelist = kvmalloc_array(nr_iovs, sizeof(area->freelist[0]), - GFP_KERNEL | __GFP_ZERO); + GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!area->freelist) goto err;
area->user_refs = kvmalloc_array(nr_iovs, sizeof(area->user_refs[0]), - GFP_KERNEL | __GFP_ZERO); + GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!area->user_refs) goto err;