From: Stanislav Fomichev sdf@fomichev.me
[ Upstream commit 8c0b9ed2401b9b3f164c8c94221899a1ace6e9ab ]
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?) but the failure manifests as an OOM. The OOM happens because ncdevmem spams the following message:
recvmsg ret=-1 recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT: - falling back to regular recvmsg for non-devmem skbs - increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev sdf@fomichev.me Reviewed-by: Mina Almasry almasrymina@google.com Link: https://patch.msgid.link/20250904182710.1586473-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
Explanation
- What changed - Adds a specific EFAULT handling path in the server receive loop: on recvmsg() returning -1 with errno == EFAULT, the test logs and exits the connection, instead of retrying indefinitely. - recvmsg call: tools/testing/selftests/drivers/net/hw/ncdevmem.c:940 - Error branch: tools/testing/selftests/drivers/net/hw/ncdevmem.c:944 - New EFAULT handling: tools/testing/selftests/drivers/net/hw/ncdevmem.c:946–949 - Other errors still “continue” (retry): tools/testing/selftests/drivers/net/hw/ncdevmem.c:950 - The new fatal path drops into the existing cleanup cascade via goto err_close_client, ensuring full resource cleanup: - err_close_client label and cleanup: tools/testing/selftests/drivers/net/hw/ncdevmem.c:1039–1055
- Why it matters (bug fixed) - The test currently spams “recvmsg: Bad address” (EFAULT) in a tight loop and can OOM the test host, as described in the commit message. With the new branch, the test fails fast and performs cleanup instead of repeatedly retrying a non-recoverable condition. - The commit notes likely causes of EFAULT (e.g., fallback to regular recvmsg for non-devmem skbs), which ncdevmem cannot meaningfully handle at present. Continuing to retry is not productive and causes resource exhaustion. - The control buffer is intentionally very large (tools/testing/selftests/drivers/net/hw/ncdevmem.c:830), so the “control buffer too small” EFAULT scenario is not applicable here, aligning with the commit message.
- Scope and risk - Extremely contained: changes only the ncdevmem selftest, not kernel code, UAPI, or any production subsystem behavior. - Minimal behavior change: only EFAULT is treated as fatal; other transient errors continue to be retried (tools/testing/selftests/drivers/net/hw/ncdevmem.c:950). - Cleanup is comprehensive: closes fds, frees memory, unbinds, and restores NIC state (flow steering, RSS, ring config), preventing test pollution: - close(client_fd): tools/testing/selftests/drivers/net/hw/ncdevmem.c:1040 - close(socket_fd): tools/testing/selftests/drivers/net/hw/ncdevmem.c:1042 - free(tmp_mem): tools/testing/selftests/drivers/net/hw/ncdevmem.c:1044 - ynl_sock_destroy(ys): tools/testing/selftests/drivers/net/hw/ncdevmem.c:1046 - reset_flow_steering/reset_rss/restore_ring_config/free ring_config: tools/testing/selftests/drivers/net/hw/ncdevmem.c:1048–1054
- Fit for stable - Fixes a real, practical problem in selftests (runaway logging leading to OOM), improving reliability of stable selftest runs and CI. - Small and surgical change in a test; no architectural changes; no user-visible kernel behavior change; very low regression risk. - Aligns with stable policy for backporting important selftest fixes that prevent hangs/OOM and ensure tests can complete and clean up.
Given it prevents a test-induced OOM, improves determinism, and is tightly scoped to selftests, this is a good candidate for stable backport.
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index 72f828021f832..147976e55dac2 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -631,6 +631,10 @@ static int do_server(struct memory_buffer *mem) continue; if (ret < 0) { perror("recvmsg"); + if (errno == EFAULT) { + pr_err("received EFAULT, won't recover"); + goto err_close_client; + } continue; } if (ret == 0) {