Le 12/09/2023 à 17:08, Paul E. McKenney a écrit :
On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
- Liam R. Howlett Liam.Howlett@Oracle.com [230912 09:56]:
- Paul E. McKenney paulmck@kernel.org [230912 06:00]:
On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
Hi Paul,
On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney paulmck@kernel.org wrote:
On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote: > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney paulmck@kernel.org wrote: >> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote: >>> * Paul E. McKenney paulmck@kernel.org [230906 14:03]: >>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote: >>>>> * Paul E. McKenney paulmck@kernel.org [230906 13:24]: >>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote: >>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why) >>>>>>> >>>>>>> Apologies on the late response, I was away and have been struggling to >>>>>>> get a working PPC32 test environment. >>>>>>> >>>>>>> * Geert Uytterhoeven geert@linux-m68k.org [230829 12:42]: >>>>>>>> Hi Liam, >>>>>>>> >>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote: >>>>>>>>> The current implementation of append may cause duplicate data and/or >>>>>>>>> incorrect ranges to be returned to a reader during an update. Although >>>>>>>>> this has not been reported or seen, disable the append write operation >>>>>>>>> while the tree is in rcu mode out of an abundance of caution. >>>>>>> >>>>>>> ... >>>>>>>>> >>> >>> ... >>> >>>>>>>> RCU-related configs: >>>>>>>> >>>>>>>> $ grep RCU .config >>>>>>>> # RCU Subsystem >>>>>>>> CONFIG_TINY_RCU=y >> >> I must have been asleep last time I looked at this. I was looking at >> Tree RCU. Please accept my apologies for my lapse. :-/ >> >> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would >> have said the same thing, albeit after looking at a lot less RCU code. >> >> TL;DR: >> >> 1. Try making the __setup_irq() function's call to mutex_lock() >> instead be as follows: >> >> if (!mutex_trylock(&desc->request_mutex)) >> mutex_lock(&desc->request_mutex); >> >> This might fail if __setup_irq() has other dependencies on a >> fully operational scheduler.
This changes where the interrupts become enabled, but doesn't stop it from happening. It still throws a WARN after init_IRQ(). I suspect it is not the way to proceed as there are probably many places that will need to be changed in both common and arch specific code. As soon as TIF_NEED_RESCHED is set, then all the checks will need to be altered.
Thank you for trying it!
I think we either need to set the boot thread to !idle, avoid call_rcu() to set TIF_NEED_RESCHED (how was this working before? Do we need rcu for the IRQs?), or alter the boot order (note this is NOT arch or platform code here).
I don't like any of these. I'd like another option, please?
My favorite is to move the interrupt enabling later, but Michael Ellerman would know better than would I about the feasibility of this.
I digged into it a bit more, looks like IRQs get enabled by the call to cond_resched() in the loop in vm_area_alloc_pages(), which is called from powerpc's init_IRQ() fonction when allocating IRQ stacks.
And IRQ stacks definitely need to be enabled before IRQs get enabled, so there's something wrong here isn't it ?
Christophe