Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible

11 Sep 2023

      * Paul E. McKenney paulmck@kernel.org [230906 14:03]:
...
On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
...

Paul E. McKenney paulmck@kernel.org [230906 13:24]:

...
On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
...
(Adding Paul & Shanker to Cc list.. please see below for why)
Apologies on the late response, I was away and have been struggling to
get a working PPC32 test environment.

Geert Uytterhoeven geert@linux-m68k.org [230829 12:42]:

...
Hi Liam,
On Fri, 18 Aug 2023, Liam R. Howlett wrote:
...
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update.  Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
...
...
...
...
...
...
...
...
...
RCU-related configs:
$ grep RCU .config
# RCU Subsystem
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_TINY_SRCU=y
# end of RCU Subsystem
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

I used the configuration from debian 8 and ran 'make oldconfig' to build
my kernel.  I have attached the configuration.
...
...
...
...
...
It appears to be something to do with struct maple_tree sparse_irqs.  If
you drop the rcu flag from that maple tree, then my configuration boots
without the warning.
I *think* this is because we will reuse a lot more nodes.  And I *think*
the rcu flag is not needed, since there is a comment about reading the
tree being protected by the mutex sparse_irq_lock within the
kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
I wonder if there is a limit to the number of RCU free events before
something is triggered to flush them out which could trigger IRQ
enabling? Paul, could this be the case?
Are you asking if call_rcu() will re-enable interrupts in the following
use case?
local_irq_disable();
   call_rcu(&p->rh, my_cb_func);
   local_irq_enable();
I am not.
...
...
...
...
Or am I missing your point?
This is very early in the boot sequence when interrupts have not been
enabled.  What we are seeing is a WARN_ON() that is triggered by
interrupts being enabled before they should be enabled.
I was wondering if, for example, I called call_rcu() a lot *before*
interrupts were enabled, that something could trigger that would either
enable interrupts or indicate the task needs rescheduling?
You aren't doing call_rcu() enough to hit OOM, are you?  The actual RCU
callback invocations won't happen until some time after the scheduler
starts up.
I am not, it's just a detection of IRQs being enabled early.
...
...
Specifically the rescheduling part is suspect.  I tracked down the call
to a mutex_lock() which calls cond_resched(), so could rcu be
'encouraging' the rcu window by a reschedule request?
During boot before interrupts are enabled, RCU has not yet spawned any of
its kthreads.  Therefore, all of its attempts to do wakeups would notice
a NULL task_struct pointer and refrain from actually doing the wakeup.
If it did do the wakeup, you would see a NULL-pointer exception.  See
for example, invoke_rcu_core_kthread(), though that won't happen unless
you booted with rcutree.use_softirq=0.
Besides, since when did doing a wakeup enable interrupts?  That would
make it hard to do wakeups from hardware interrupt handlers, not?
Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
cond_resched().
From what Michael said [1] in this thread, since something has already
set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
I've traced this to running call_rcu() in kernel/rcu/tiny.c and
is_idle_task(current) is true, which means rcu runs:
    	/* force scheduling for rcu_qs() */                                                                     
                resched_cpu(0);
the task is set idle in sched_init() -> init_idle() and never changed,
afaict.
Removing the RCU option from the maple tree in kernel/irq/irqdesc.c
fixes the issue by avoiding the maple tree running call_rcu().  I am not
sure on the locking of the tree so I feel this change may cause other
issues...also it's before lockdep_init(), so any issue I introduce may
not be detected.
When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the
same thing, but the IRQs are not enabled on return.  So, resched_cpu(0)
is called, but the IRQs warning of enabled isn't triggered.  I failed to
find a reason why.
I am not entirely sure what makes ppc32 different than other platforms
in that the initial task is configured to an idle task and the first
call to call_rcu (tiny!) would cause the observed behaviour.
Non-tiny rcu calls (as I am sure you know, but others may not)
kernel/rcu/tree.c which in turn calls __call_rcu_common().  That
function is far more complex than the tiny version.  Maybe it's part of
why we see different behaviour based on platforms?  I don't see an idle
check in that version of call_rcu().
Or maybe PPC32 has something set incorrectly to cause this failure in
early boot and I've just found something that needs to be set
differently?
...
But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas
of greatest suspicion, starting from the stack trace generated by that
mutex_lock()?  A stray interrupt-enable could be pretty much anywhere.
But where are those call_rcu() invocations?  Before rcu_init()?
During init_IRQ(), which is after rcu_init() but before rcu_init_nohz(),
srcu_init(), and softirq_init() in init/main.c start_kernel().
...
Presumably before init is spawned and the early_init() calls.
And what is the RCU-related Kconfig and boot-parameter setup?
The .config was attached to the email I sent, and it matches what was
quoted above in the "RCU-related configs" section.
[1] https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible