Re: [PATCH v4 1/1] eventpoll: Replace rwlock with spinlock

15 Jul 2025

      On Tue, Jul 15, 2025 at 02:46:34PM +0200, Nam Cao wrote:
...
The ready event list of an epoll object is protected by read-write
semaphore:

The consumer (waiter) acquires the write lock and takes items.
the producer (waker) takes the read lock and adds items.

The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.
Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].
Furthermore, this could also cause stall problem, as described in [2].
Fix this problem by replacing rwlock with spinlock.
This reduces the event bandwidth, as the producers now have to contend with
each other for the spinlock. According to the benchmark from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c:
On 12 x86 CPUs:
              Before     After        Diff
    threads  events/ms  events/ms
          8       7162       4956     -31%
         16       8733       5383     -38%
         32       7968       5572     -30%
         64      10652       5739     -46%
        128      11236       5931     -47%

On 4 riscv CPUs:
              Before     After        Diff
    threads  events/ms  events/ms
          8       2958       2833      -4%
         16       3323       3097      -7%
         32       3451       3240      -6%
         64       3554       3178     -11%
        128       3601       3235     -10%

Although the numbers look bad, it should be noted that this benchmark
creates multiple threads who do nothing except constantly generating new
epoll events, thus contention on the spinlock is high. For real workload,
the event rate is likely much lower, and the performance drop is not as
bad.
Using another benchmark (perf bench epoll wait) where spinlock contention
is lower, improvement is even observed on x86:
On 12 x86 CPUs:
    Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8
    After:  Averaged 114577 operations/sec (+- 2.25%), total secs = 8

On 4 riscv CPUs:
    Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8
    After:  Averaged 167396 operations/sec (+- 0.23%), total secs = 8

In conclusion, no one is likely to be upset over this change. After all,
spinlock was used originally for years, and the commit which converted to
rwlock didn't mention a real workload, just that the benchmark numbers are
nice.
This patch is not exactly the revert of commit a218cc491420 ("epoll: use
rwlock in order to reduce ep_poll_callback() contention"), because git
revert conflicts in some places which are not obvious on the resolution.
This patch is intended to be backported, therefore go with the obvious
approach:

Replace rwlock_t with spinlock_t one to one

Delete list_add_tail_lockless() and chain_epi_lockless(). These were
introduced to allow producers to concurrently add items to the list.
But now that spinlock no longer allows producers to touch the event
list concurrently, these two functions are not necessary anymore.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Signed-off-by: Nam Cao namcao@linutronix.de
Cc: stable@vger.kernel.org
I forgot to add:
Reported-by: Frederic Weisbecker frederic@kernel.org
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1]
Reported-by: Valentin Schneider vschneid@redhat.com
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.... [2]
Christian, do you mind adding those for me, if/when you apply the patch?
Nam

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4 1/1] eventpoll: Replace rwlock with spinlock