Re: [PATCH] eventpoll: Fix semi-unbounded recursion

4 Aug 2025

On Fri, Aug 1, 2025 at 12:31 AM Carlos Llamas cmllamas@google.com wrote:
...
On Fri, Jul 11, 2025 at 06:33:36PM +0200, Jann Horn wrote:
...
Ensure that epoll instances can never form a graph deeper than
EP_MAX_NESTS+1 links.
Currently, ep_loop_check_proc() ensures that the graph is loop-free and
does some recursion depth checks, but those recursion depth checks don't
limit the depth of the resulting tree for two reasons:

They don't look upwards in the tree.
If there are multiple downwards paths of different lengths, only one of
the paths is actually considered for the depth check since commit
28d82dc1c4ed ("epoll: limit paths").

Essentially, the current recursion depth check in ep_loop_check_proc() just
serves to prevent it from recursing too deeply while checking for loops.
A more thorough check is done in reverse_path_check() after the new graph
edge has already been created; this checks, among other things, that no
paths going upwards from any non-epoll file with a length of more than 5
edges exist. However, this check does not apply to non-epoll files.
As a result, it is possible to recurse to a depth of at least roughly 500,
tested on v6.15. (I am unsure if deeper recursion is possible; and this may
have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion
problem").)
To fix it:

In ep_loop_check_proc(), note the subtree depth of each visited node,

and use subtree depths for the total depth calculation even when a subtree
has already been visited.
2. Add ep_get_upwards_depth_proc() for similarly determining the maximum
depth of an upwards walk.
3. In ep_loop_check(), use these values to limit the total path length
between epoll nodes to EP_MAX_NESTS edges.
Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn jannh@google.com

Hey Jann,
I've bisected an LTP test failure to this commit and I can't find any
reports of this. The test is epoll_ctl04:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sysca...
This is what I get:
  ####################################################################3
  root@debian:~# ./epoll_ctl04
  tst_test.c:2004: TINFO: LTP version: 20250530-116-g91e6272fe
  tst_test.c:2007: TINFO: Tested kernel: 6.16.0-rc1-00017-gf2e467a48287 #28 SMP PREEMPT Thu Jul 31 21:12:22 UTC 2025 aarch64
  tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
  tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
  epoll_ctl04.c:59: TFAIL: epoll_ctl(..., EPOLL_CTL_ADD, ...) with number of nesting is 5 expected EINVAL: ELOOP (40)
Summary:
  passed   0
  failed   1
  broken   0
  skipped  0
  warnings 0
  ####################################################################3
I haven't looked much into this but it seems the test expects EINVAL at
nesting depth 5 and is instead getting ELOOP. Any chance there is an
off-by-one error in ep_loop_check() as it fails with depth=4 and
upwards_depth=0, which isn't correct?
This is an area where the existing code is very inconsistent in how it
reports errors; limits on the structure of the epoll graph (in
particular limits on the graph depth) are enforced by both
ep_loop_check() (which fails with ELOOP) and reverse_path_check()
(which fails with EINVAL). The comments suggest that ep_loop_check()
is supposed to be doing depth checks, and reverse_path_check() is
mostly supposed to count wakeup paths, but reverse_path_check()
happens to be what you hit in that LTP testcase.
The manpage also says that ELOOP is for too-deep nesting, and does not
mention this case of EINVAL at all. (It doesn't even mention the
wakeup path count restriction in the ERRORS section...)
I think this is one of those cases where the existing semantics are
too convoluted and tainted with kernel implementation details for
userspace to have handled the existing error cases well enough to be
broken by this change; the existing behavior was something like (not
sure if I'm getting it exactly right) "ELOOP is for loops; EINVAL is
for hitting a depth limit when constructing a chain of epoll instances
with a file at the bottom; ELOOP is for hitting a depth limit when
constructing a chain of epoll instances with no file at the bottom and
you'd only get it depending on which way around you build the chain
and, for more complex structures, in what order the addresses of
kernel objects are"; and the implementation was different from what
the manpage says.
So my opinion is that the right fix is to change the testcase to also
accept ELOOP, though I can see how a bugfix that breaks a unit test is
going to raise eyebrows.
...

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 44648cc09250..811960b2a74c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2237,7 +2237,7 @@ static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to)
        upwards_depth = ep_get_upwards_depth_proc(ep, 0);
        rcu_read_unlock();

  return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0;




  return (depth+upwards_depth > EP_MAX_NESTS) ? -1 : 0;



Here I am calculating: We want to create a new link between two nodes,
and want to know how long the longest resulting chain of epoll
instances will be. For that, I add the following numbers of links:
- The number of links going upwards from "ep".
 - One for the new link we're adding.
 - The number of links going downward from "to".
So I think this is correct.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] eventpoll: Fix semi-unbounded recursion