On Fri, Aug 1, 2025 at 12:31 AM Carlos Llamas cmllamas@google.com wrote:
On Fri, Jul 11, 2025 at 06:33:36PM +0200, Jann Horn wrote:
Ensure that epoll instances can never form a graph deeper than EP_MAX_NESTS+1 links.
Currently, ep_loop_check_proc() ensures that the graph is loop-free and does some recursion depth checks, but those recursion depth checks don't limit the depth of the resulting tree for two reasons:
- They don't look upwards in the tree.
- If there are multiple downwards paths of different lengths, only one of the paths is actually considered for the depth check since commit 28d82dc1c4ed ("epoll: limit paths").
Essentially, the current recursion depth check in ep_loop_check_proc() just serves to prevent it from recursing too deeply while checking for loops.
A more thorough check is done in reverse_path_check() after the new graph edge has already been created; this checks, among other things, that no paths going upwards from any non-epoll file with a length of more than 5 edges exist. However, this check does not apply to non-epoll files.
As a result, it is possible to recurse to a depth of at least roughly 500, tested on v6.15. (I am unsure if deeper recursion is possible; and this may have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem").)
To fix it:
- In ep_loop_check_proc(), note the subtree depth of each visited node,
and use subtree depths for the total depth calculation even when a subtree has already been visited. 2. Add ep_get_upwards_depth_proc() for similarly determining the maximum depth of an upwards walk. 3. In ep_loop_check(), use these values to limit the total path length between epoll nodes to EP_MAX_NESTS edges.
Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
Hey Jann,
I've bisected an LTP test failure to this commit and I can't find any reports of this. The test is epoll_ctl04:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sysca...
This is what I get: ####################################################################3 root@debian:~# ./epoll_ctl04 tst_test.c:2004: TINFO: LTP version: 20250530-116-g91e6272fe tst_test.c:2007: TINFO: Tested kernel: 6.16.0-rc1-00017-gf2e467a48287 #28 SMP PREEMPT Thu Jul 31 21:12:22 UTC 2025 aarch64 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s epoll_ctl04.c:59: TFAIL: epoll_ctl(..., EPOLL_CTL_ADD, ...) with number of nesting is 5 expected EINVAL: ELOOP (40)
Summary: passed 0 failed 1 broken 0 skipped 0 warnings 0 ####################################################################3
I haven't looked much into this but it seems the test expects EINVAL at nesting depth 5 and is instead getting ELOOP. Any chance there is an off-by-one error in ep_loop_check() as it fails with depth=4 and upwards_depth=0, which isn't correct?
This is an area where the existing code is very inconsistent in how it reports errors; limits on the structure of the epoll graph (in particular limits on the graph depth) are enforced by both ep_loop_check() (which fails with ELOOP) and reverse_path_check() (which fails with EINVAL). The comments suggest that ep_loop_check() is supposed to be doing depth checks, and reverse_path_check() is mostly supposed to count wakeup paths, but reverse_path_check() happens to be what you hit in that LTP testcase.
The manpage also says that ELOOP is for too-deep nesting, and does not mention this case of EINVAL at all. (It doesn't even mention the wakeup path count restriction in the ERRORS section...)
I think this is one of those cases where the existing semantics are too convoluted and tainted with kernel implementation details for userspace to have handled the existing error cases well enough to be broken by this change; the existing behavior was something like (not sure if I'm getting it exactly right) "ELOOP is for loops; EINVAL is for hitting a depth limit when constructing a chain of epoll instances with a file at the bottom; ELOOP is for hitting a depth limit when constructing a chain of epoll instances with no file at the bottom and you'd only get it depending on which way around you build the chain and, for more complex structures, in what order the addresses of kernel objects are"; and the implementation was different from what the manpage says.
So my opinion is that the right fix is to change the testcase to also accept ELOOP, though I can see how a bugfix that breaks a unit test is going to raise eyebrows.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 44648cc09250..811960b2a74c 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2237,7 +2237,7 @@ static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to) upwards_depth = ep_get_upwards_depth_proc(ep, 0); rcu_read_unlock();
return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0;
return (depth+upwards_depth > EP_MAX_NESTS) ? -1 : 0;
Here I am calculating: We want to create a new link between two nodes, and want to know how long the longest resulting chain of epoll instances will be. For that, I add the following numbers of links:
- The number of links going upwards from "ep". - One for the new link we're adding. - The number of links going downward from "to".
So I think this is correct.