From: Andy Lutomirski luto@kernel.org
We don't need spinlocks to protect batched entropy -- all we need is a little bit of care. This should fix up the following splat that Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y kernel:
[ 2.500000] [ BUG: Invalid wait context ] [ 2.500000] 5.17.0-rc1 #563 Not tainted [ 2.500000] ----------------------------- [ 2.500000] swapper/1 is trying to lock: [ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c [ 2.500000] other info that might help us debug this: [ 2.500000] context-{2:2} [ 2.500000] 3 locks held by swapper/1: [ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8 [ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8 [ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4 [ 2.500000] stack backtrace: [ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563 [ 2.500000] Hardware name: WPCM450 chip [ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14) [ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c) [ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354) [ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74) [ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c) [ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110) [ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200) [ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38) [ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c) [ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114) [ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34) [ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c) [ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80) [ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90) [ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54 [ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98 [ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff [ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90) [ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40) [ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50) [ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0) [ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4) [ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c) [ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38) [ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420) [ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50) [ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8) [ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284) [ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288) [ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c) [ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110) [ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c) [ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8) [ 2.500000] 5fa0: 00000000 00000000 00000000 00000000 [ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Signed-off-by: Andy Lutomirski luto@kernel.org [Jason: I extracted this from a larger in-progress series of Andy's that also unifies the two batches into one and does other performance things. Since that's still under development, but because we need this part to fix the CONFIG_PROVE_RAW_LOCK_NESTING issue, I've extracted it out and applied it to the current setup. This will also make it easier to backport to old kernels that also need the fix. I've also amended Andy's original commit message.] Reported-by: Jonathan Neuschäfer j.neuschaefer@gmx.net Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/ Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy") Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- Andy - could you take a look at this and let me know if it's still correct after I've ported it out of your series and into a standalone thing here? I'd prefer to hold off on moving forward on this until I receive our green light. I'm also still a bit uncertain about your NB: comment regarding the acceptable race. If you could elaborate on that argument, it might save me a few cycles with my thinking cap on.
drivers/char/random.c | 57 ++++++++++++++++++++++++------------------- 1 file changed, 32 insertions(+), 25 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index b411182df6f6..22c190aecbe8 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2063,7 +2063,6 @@ struct batched_entropy { u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)]; }; unsigned int position; - spinlock_t batch_lock; };
/* @@ -2075,7 +2074,7 @@ struct batched_entropy { * point prior. */ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = { - .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock), + .position = ARRAY_SIZE(((struct batched_entropy *)0)->entropy_u64) };
u64 get_random_u64(void) @@ -2083,42 +2082,55 @@ u64 get_random_u64(void) u64 ret; unsigned long flags; struct batched_entropy *batch; + size_t position; static void *previous;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u64); - spin_lock_irqsave(&batch->batch_lock, flags); - if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { + local_irq_save(flags); + batch = this_cpu_ptr(&batched_entropy_u64); + position = READ_ONCE(batch->position); + /* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out + * from under us -- see invalidate_batched_entropy(). If this, + * happens it's okay if we still return the data in the batch. */ + if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u64))) { extract_crng((u8 *)batch->entropy_u64); - batch->position = 0; + position = 0; } - ret = batch->entropy_u64[batch->position++]; - spin_unlock_irqrestore(&batch->batch_lock, flags); + ret = batch->entropy_u64[position++]; + WRITE_ONCE(batch->position, position); + local_irq_restore(flags); return ret; } EXPORT_SYMBOL(get_random_u64);
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = { - .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock), + .position = ARRAY_SIZE(((struct batched_entropy *)0)->entropy_u32) }; + u32 get_random_u32(void) { u32 ret; unsigned long flags; struct batched_entropy *batch; + size_t position; static void *previous;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u32); - spin_lock_irqsave(&batch->batch_lock, flags); - if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) { + local_irq_save(flags); + batch = this_cpu_ptr(&batched_entropy_u32); + position = READ_ONCE(batch->position); + /* NB: position can change to ARRAY_SIZE(batch->entropy_u32) out + * from under us -- see invalidate_batched_entropy(). If this, + * happens it's okay if we still return the data in the batch. */ + if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u32))) { extract_crng((u8 *)batch->entropy_u32); - batch->position = 0; + position = 0; } - ret = batch->entropy_u32[batch->position++]; - spin_unlock_irqrestore(&batch->batch_lock, flags); + ret = batch->entropy_u64[position++]; + WRITE_ONCE(batch->position, position); + local_irq_restore(flags); return ret; } EXPORT_SYMBOL(get_random_u32); @@ -2130,20 +2142,15 @@ EXPORT_SYMBOL(get_random_u32); static void invalidate_batched_entropy(void) { int cpu; - unsigned long flags;
for_each_possible_cpu(cpu) { - struct batched_entropy *batched_entropy; + struct batched_entropy *batch;
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu); - spin_lock_irqsave(&batched_entropy->batch_lock, flags); - batched_entropy->position = 0; - spin_unlock(&batched_entropy->batch_lock); + batch = per_cpu_ptr(&batched_entropy_u32, cpu); + WRITE_ONCE(batch->position, ARRAY_SIZE(batch->entropy_u32));
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu); - spin_lock(&batched_entropy->batch_lock); - batched_entropy->position = 0; - spin_unlock_irqrestore(&batched_entropy->batch_lock, flags); + batch = per_cpu_ptr(&batched_entropy_u64, cpu); + WRITE_ONCE(batch->position, ARRAY_SIZE(batch->entropy_u64)); } }
On 2022-01-28 16:33:44 [+0100], Jason A. Donenfeld wrote:
From: Andy Lutomirski luto@kernel.org
We don't need spinlocks to protect batched entropy -- all we need is a little bit of care. This should fix up the following splat that Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y kernel:
NO. Could we please look at my RANDOM patches first? This affects PREEMPT_RT. There is no need to stuff this in and tag it stable.
I can repost my rebased patched if there no objection. This patch invokes extract_crng() with disabled interrupts so we didn't gain anything IMHO.
Sebastian
Hi Sebastian,
On Fri, Jan 28, 2022 at 4:44 PM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
NO. Could we please look at my RANDOM patches first? I can repost my rebased patched if there no objection.
I did, and my reply is here: https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMT...
I was hoping for a series that addresses these issues. As I mentioned before, I'm not super keen on deferring that processing in a conditional case and having multiple entry ways into that same functionality. I don't think that's worth it, especially if your actual concern is just userspace calling RNDADDTOENTCNT too often (which can be safely ratelimited). I don't think that thread needs to spill over here, though, so feel free to follow up with a v+1 on that series and I'll happily take a look. Alternatively, if you'd like to approach this by providing a patch for Jonathan's issue, that makes sense too. So far, the things in front of me are: 1) your patchset from last month that has unresolved issues, and 2) Andy's thing, which maybe will solve the problem (or it won't?). A third alternative from you would be most welcome too.
Jason
On 2022-01-28 16:54:06 [+0100], Jason A. Donenfeld wrote:
Hi Sebastian,
Hi Jason,
On Fri, Jan 28, 2022 at 4:44 PM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
NO. Could we please look at my RANDOM patches first? I can repost my rebased patched if there no objection.
I did, and my reply is here: https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMT...
I was hoping for a series that addresses these issues. As I mentioned before, I'm not super keen on deferring that processing in a conditional case and having multiple entry ways into that same functionality. I don't think that's worth it, especially if your actual concern is just userspace calling RNDADDTOENTCNT too often (which can be safely ratelimited). I don't think that thread needs to
And what do you do in ratelimiting? As I explained, you get 20 that "enter" and the following are block. The first 20 are already problematic and you need a plan-B for those that can't enter. So I suggested a mutex_t around the ioctl() which would act as a rate limiting. You did not not follow up on that idea.
spill over here, though, so feel free to follow up with a v+1 on that series and I'll happily take a look. Alternatively, if you'd like to approach this by providing a patch for Jonathan's issue, that makes sense too. So far, the things in front of me are: 1) your patchset from last month that has unresolved issues, and 2) Andy's thing, which maybe will solve the problem (or it won't?). A third alternative from you would be most welcome too.
I made a reply yesterday I think with some numbers yesterday. From my point of view it is an in-IRQ context/ code that can be avoided. The RNDADDTOENTCNT is a simple way to hammer on the lock and see how bad it gets. Things like add_hwgenerator_randomness() don't appear so often so it is hard to figure out what the worst case can be.
Please ignore Jonathan report for now. As I tried to explain: This lockdep report shows a serious problem on PREEMPT_RT. There is _no_ need to be concerned on a non-PREEMPT_RT kernel. But it should be addressed. If this gets merged as-is then thanks to the stable tag it will get backported (again no change for !RT) and will collide with PREEMPT_RT patch. And as I mentioned, the locking is not working on PREEMPT_RT.
Jason
Sebastian
Hi Sebastian,
I wrote in my last message, "I don't think that thread needs to spill over here, though," but clearly you disagreed, which is fine I guess. Replies inline below:
On Fri, Jan 28, 2022 at 5:15 PM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
I did, and my reply is here: https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMT...
I was hoping for a series that addresses these issues. As I mentioned before, I'm not super keen on deferring that processing in a conditional case and having multiple entry ways into that same functionality. I don't think that's worth it, especially if your actual concern is just userspace calling RNDADDTOENTCNT too often (which can be safely ratelimited). I don't think that thread needs to
And what do you do in ratelimiting?
If I'm understanding the RT issue correctly, the problem is that userspace can `for (;;) iotl(...);`, and the high frequency of ioctls then increases the average latency of interrupts to a level beyond the requirements for RT. The idea of ratelimiting the ioctl would be so that userspace is throttled from calling it too often, so that the average latency isn't increased.
As I explained, you get 20 that "enter" and the following are block. The first 20 are already problematic and you need a plan-B for those that can't enter. So I suggested a mutex_t around the ioctl() which would act as a rate limiting. You did not not follow up on that idea.
A mutex_t would be fine I think? I'd like to see what this looks like in code, but conceptually I don't see why not.
Please ignore Jonathan report for now. As I tried to explain: This lockdep report shows a serious problem on PREEMPT_RT. There is _no_ need to be concerned on a non-PREEMPT_RT kernel. But it should be addressed. If this gets merged as-is then thanks to the stable tag it will get backported (again no change for !RT) and will collide with PREEMPT_RT patch. And as I mentioned, the locking is not working on PREEMPT_RT.
Gotcha, okay, that makes sense. It sounds like Andy's patch and your patch might both be part of the same non-stable-marked coin for cutting down on locks in the IRQ path.
[Relatedly, I've been doing a bit of research on other ways to cut down the amount of processing we're doing in the IRQ path, such as https://xn--4db.cc/K4zqXPh8/diff. This is really not ready to go, and I'm not ready to have a discussion on the crypto there (please, nobody comment on the crypto there yet; I'll be really annoyed), but the general gist is that I think it might be possible to reduce the number of cycles spent in IRQ with some nice new tricks.]
Jason
Hi Andy,
On Fri, Jan 28, 2022 at 4:34 PM Jason A. Donenfeld Jason@zx2c4.com wrote:
Andy - could you take a look at this and let me know if it's still correct after I've ported it out of your series and into a standalone thing here? I'd prefer to hold off on moving forward on this until I receive our green light. I'm also still a bit uncertain about your NB: comment regarding the acceptable race. If you could elaborate on that argument, it might save me a few cycles with my thinking cap on.
[...]
position = READ_ONCE(batch->position);
/* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out
* from under us -- see invalidate_batched_entropy(). If this,
* happens it's okay if we still return the data in the batch. */
if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u64))) { extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
position = 0;
So the argument for this is something along the lines of -- if the pool is invalidated _after_ the position value is read, in the worst case, we read old data, and then we set position to value 1, so the remaining reads _also_ are of invalidated data, and then eventually this depletes and we get newly extracted data? So this means that in some racey situations, we're lengthening the window during which get_random_u{32,64}() returns bad randomness, right?
Couldn't that race be avoided entirely by something like the below?
Jason
diff --git a/drivers/char/random.c b/drivers/char/random.c index 50c50a59847e..664f1a7eb293 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2082,14 +2082,15 @@ u64 get_random_u64(void) u64 ret; unsigned long flags; struct batched_entropy *batch; - unsigned int position; + unsigned int position, original; static void *previous;
warn_unseeded_randomness(&previous);
local_irq_save(flags); batch = this_cpu_ptr(&batched_entropy_u64); - position = READ_ONCE(batch->position); +try_again: + original = position = READ_ONCE(batch->position); /* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out * from under us -- see invalidate_batched_entropy(). If this * happens it's okay if we still return the data in the batch. */ @@ -2098,6 +2099,8 @@ u64 get_random_u64(void) position = 0; } ret = batch->entropy_u64[position++]; + if (unlikely(cmpxchg(&batch->position, original, batch) != original)) + goto try_again; WRITE_ONCE(batch->position, position); local_irq_restore(flags); return ret;
On Fri, Jan 28, 2022 at 04:33:44PM +0100, Jason A. Donenfeld wrote:
From: Andy Lutomirski luto@kernel.org
We don't need spinlocks to protect batched entropy -- all we need is a little bit of care. This should fix up the following splat that Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y kernel:
[ 2.500000] [ BUG: Invalid wait context ] [ 2.500000] 5.17.0-rc1 #563 Not tainted [ 2.500000] ----------------------------- [ 2.500000] swapper/1 is trying to lock: [ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c [ 2.500000] other info that might help us debug this: [ 2.500000] context-{2:2} [ 2.500000] 3 locks held by swapper/1: [ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8 [ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8 [ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4 [ 2.500000] stack backtrace: [ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563 [ 2.500000] Hardware name: WPCM450 chip [ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14) [ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c) [ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354) [ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74) [ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c) [ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110) [ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200) [ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38) [ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c) [ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114) [ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34) [ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c) [ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80) [ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90) [ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54 [ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98 [ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff [ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90) [ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40) [ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50) [ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0) [ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4) [ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c) [ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38) [ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420) [ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50) [ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8) [ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284) [ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288) [ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c) [ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110) [ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c) [ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8) [ 2.500000] 5fa0: 00000000 00000000 00000000 00000000 [ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Signed-off-by: Andy Lutomirski luto@kernel.org [Jason: I extracted this from a larger in-progress series of Andy's that also unifies the two batches into one and does other performance things. Since that's still under development, but because we need this part to fix the CONFIG_PROVE_RAW_LOCK_NESTING issue, I've extracted it out and applied it to the current setup. This will also make it easier to backport to old kernels that also need the fix. I've also amended Andy's original commit message.] Reported-by: Jonathan Neuschäfer j.neuschaefer@gmx.net Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/ Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy") Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Andy - could you take a look at this and let me know if it's still correct after I've ported it out of your series and into a standalone thing here? I'd prefer to hold off on moving forward on this until I receive our green light. I'm also still a bit uncertain about your NB: comment regarding the acceptable race. If you could elaborate on that argument, it might save me a few cycles with my thinking cap on.
drivers/char/random.c | 57 ++++++++++++++++++++++++------------------- 1 file changed, 32 insertions(+), 25 deletions(-)
FWIW, this does fix the splat on my machine.
Tested-by: Jonathan Neuschäfer j.neuschaefer@gmx.net
Thanks everyone, Jonathan
Hey Jonathan,
On Fri, Jan 28, 2022 at 7:05 PM Jonathan Neuschäfer j.neuschaefer@gmx.net wrote:
FWIW, this does fix the splat on my machine.
Tested-by: Jonathan Neuschäfer j.neuschaefer@gmx.net
Thanks for testing. If it's not too much trouble, would you verify v2 as well? It's substantially different from v1 that it doesn't seem right to carry through your Tested-by, and from talking a bit with Andy, I think we're more likely to go with a v2-like approach than a v1-like one.
Jason
Greeting,
FYI, we noticed the following commit (built with clang-14):
commit: 1e1724f9ddd1649555105fd31a8973e7a2e5466c ("[PATCH] random: remove batched entropy locking") url: https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/random-remove-bat... base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git 710f8af199ee9d72dd87083edd55c5ee250ee6f4 patch link: https://lore.kernel.org/lkml/20220128153344.34211-1-Jason@zx2c4.com
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+----------------------------------------------------------+------------+------------+ | | 710f8af199 | 1e1724f9dd | +----------------------------------------------------------+------------+------------+ | UBSAN:array-index-out-of-bounds_in_drivers/char/random.c | 0 | 13 | | BUG:KASAN:global-out-of-bounds_in_get_random_u32 | 0 | 13 | +----------------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag Reported-by: kernel test robot oliver.sang@intel.com
[ 29.921782][ T1] UBSAN: array-index-out-of-bounds in drivers/char/random.c:2141:8 [ 29.923207][ T1] index 8 is out of range for type 'u64[8]' (aka 'unsigned long long[8]') [ 29.923634][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1-00010-g1e1724f9ddd1 #2 51d507a9ab4d92cb438b1c02ba5a02d8ac52cd1d [ 29.923634][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 29.923634][ T1] Call Trace: [ 29.923634][ T1] <TASK> [ 29.923634][ T1] dump_stack_lvl (??:?) [ 29.923634][ T1] dump_stack (??:?) [ 29.923634][ T1] __ubsan_handle_out_of_bounds (??:?) [ 29.923634][ T1] get_random_u32 (??:?) [ 29.923634][ T1] bucket_table_alloc (rhashtable.c:?) [ 29.923634][ T1] rhashtable_init (??:?) [ 29.923634][ T1] ? rcu_read_lock_sched_held (??:?) [ 29.923634][ T1] ? bpf_iter_netlink (af_netlink.c:?) [ 29.923634][ T1] netlink_proto_init (af_netlink.c:?) [ 29.923634][ T1] do_one_initcall (??:?) [ 29.923634][ T1] ? bpf_iter_netlink (af_netlink.c:?) [ 29.923634][ T1] do_initcall_level (main.c:?) [ 29.923634][ T1] do_initcalls (main.c:?) [ 29.923634][ T1] do_basic_setup (main.c:?) [ 29.923634][ T1] kernel_init_freeable (main.c:?) [ 29.923634][ T1] ? rest_init (main.c:?) [ 29.923634][ T1] kernel_init (main.c:?) [ 29.923634][ T1] ? rest_init (main.c:?) [ 29.923634][ T1] ret_from_fork (??:?) [ 29.923634][ T1] </TASK> [ 29.923634][ T1] ================================================================================ [ 29.923718][ T1] ================================================================== [ 29.924895][ T1] BUG: KASAN: global-out-of-bounds in get_random_u32 (??:?) [ 29.926024][ T1] Read of size 8 at addr ffffffffb4fe94c0 by task swapper/1 [ 29.926967][ T1] [ 29.926967][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1-00010-g1e1724f9ddd1 #2 51d507a9ab4d92cb438b1c02ba5a02d8ac52cd1d [ 29.926967][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 29.926967][ T1] Call Trace: [ 29.926967][ T1] <TASK> [ 29.926967][ T1] dump_stack_lvl (??:?) [ 29.926967][ T1] print_address_description (report.c:?) [ 29.926967][ T1] __kasan_report (report.c:?) [ 29.926967][ T1] ? get_random_u32 (??:?) [ 29.926967][ T1] kasan_report (??:?) [ 29.926967][ T1] __asan_report_load8_noabort (??:?) [ 29.926967][ T1] get_random_u32 (??:?) [ 29.926967][ T1] bucket_table_alloc (rhashtable.c:?) [ 29.926967][ T1] rhashtable_init (??:?) [ 29.926967][ T1] ? rcu_read_lock_sched_held (??:?) [ 29.926967][ T1] ? bpf_iter_netlink (af_netlink.c:?) [ 29.926967][ T1] netlink_proto_init (af_netlink.c:?) [ 29.926967][ T1] do_one_initcall (??:?) [ 29.926967][ T1] ? bpf_iter_netlink (af_netlink.c:?) [ 29.926967][ T1] do_initcall_level (main.c:?) [ 29.926967][ T1] do_initcalls (main.c:?) [ 29.926967][ T1] do_basic_setup (main.c:?) [ 29.926967][ T1] kernel_init_freeable (main.c:?) [ 29.926967][ T1] ? rest_init (main.c:?) [ 29.926967][ T1] kernel_init (main.c:?) [ 29.926967][ T1] ? rest_init (main.c:?) [ 29.926967][ T1] ret_from_fork (??:?) [ 29.926967][ T1] </TASK> [ 29.926967][ T1] [ 29.926967][ T1] The buggy address belongs to the variable: [ 29.926967][ T1] random_write_wakeup_bits+0x0/0x20 [ 29.926967][ T1] [ 29.926967][ T1] Memory state around the buggy address: [ 29.926967][ T1] ffffffffb4fe9380: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 [ 29.926967][ T1] ffffffffb4fe9400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 29.926967][ T1] >ffffffffb4fe9480: 00 00 00 00 00 00 00 00 04 f9 f9 f9 00 00 00 00 [ 29.926967][ T1] ^ [ 29.926967][ T1] ffffffffb4fe9500: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 01 f9 f9 f9 [ 29.926967][ T1] ffffffffb4fe9580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 29.926967][ T1] ================================================================== [ 29.926967][ T1] Disabling lock debugging due to kernel taint [ 29.927133][ T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family [ 29.930966][ T1] thermal_sys: Registered thermal governor 'fair_share' [ 29.930971][ T1] thermal_sys: Registered thermal governor 'bang_bang' [ 29.932004][ T1] thermal_sys: Registered thermal governor 'step_wise' [ 29.933055][ T1] thermal_sys: Registered thermal governor 'user_space' [ 29.933708][ T1] cpuidle: using governor ladder [ 29.935795][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 29.937434][ T1] PCI: Using configuration type 1 for base access [ 29.958988][ T1] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible. [ 29.960327][ T7] Callback from call_rcu_tasks() invoked. [ 29.961915][ T1] HugeTLB: can free 6 vmemmap pages for hugepages-2048kB [ 29.962897][ T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 29.965886][ T1] cryptd: max_cpu_qlen set to 1000 [ 29.967924][ T1] raid6: skipped pq benchmark and selected sse2x4 [ 29.968825][ T1] raid6: using ssse3x2 recovery algorithm [ 29.969891][ T1] ACPI: Added _OSI(Module Device) [ 29.970307][ T1] ACPI: Added _OSI(Processor Device) [ 29.971058][ T1] ACPI: Added _OSI(3.0 _SCP Extensions) [ 29.971841][ T1] ACPI: Added _OSI(Processor Aggregator Device) [ 29.972747][ T1] ACPI: Added _OSI(Linux-Dell-Video) [ 29.973549][ T1] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) [ 29.973648][ T1] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics) [ 29.994328][ T1] ACPI: 1 ACPI AML tables successfully acquired and loaded [ 30.006626][ T1] ACPI: Interpreter enabled [ 30.007071][ T1] ACPI: PM: (supports S0 S3 S5) [ 30.007783][ T1] ACPI: Using IOAPIC for interrupt routing [ 30.008714][ T1] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 30.011387][ T1] ACPI: Enabled 2 GPEs in block 00 to 0F [ 30.053305][ T1] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) [ 30.053667][ T1] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3] [ 30.054872][ T1] acpi PNP0A03:00: PCIe port services disabled; not requesting _OSC control [ 30.056154][ T1] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge. [ 30.057970][ T1] acpiphp: Slot [3] registered [ 30.058877][ T1] acpiphp: Slot [4] registered [ 30.059769][ T1] acpiphp: Slot [5] registered [ 30.060516][ T1] acpiphp: Slot [6] registered [ 30.061393][ T1] acpiphp: Slot [7] registered [ 30.062306][ T1] acpiphp: Slot [8] registered [ 30.063187][ T1] acpiphp: Slot [9] registered [ 30.063877][ T1] acpiphp: Slot [10] registered [ 30.064814][ T1] acpiphp: Slot [11] registered [ 30.065712][ T1] acpiphp: Slot [12] registered [ 30.066613][ T1] acpiphp: Slot [13] registered [ 30.067181][ T1] acpiphp: Slot [14] registered [ 30.068082][ T1] acpiphp: Slot [15] registered [ 30.068992][ T1] acpiphp: Slot [16] registered [ 30.069889][ T1] acpiphp: Slot [17] registered [ 30.070506][ T1] acpiphp: Slot [18] registered [ 30.071401][ T1] acpiphp: Slot [19] registered [ 30.072314][ T1] acpiphp: Slot [20] registered [ 30.073206][ T1] acpiphp: Slot [21] registered [ 30.073840][ T1] acpiphp: Slot [22] registered [ 30.074765][ T1] acpiphp: Slot [23] registered [ 30.075669][ T1] acpiphp: Slot [24] registered [ 30.076557][ T1] acpiphp: Slot [25] registered [ 30.077176][ T1] acpiphp: Slot [26] registered [ 30.078073][ T1] acpiphp: Slot [27] registered [ 30.078982][ T1] acpiphp: Slot [28] registered
To reproduce:
# build kernel cd linux cp config-5.17.0-rc1-00010-g1e1724f9ddd1 .config make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
--- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks, Oliver Sang
linux-stable-mirror@lists.linaro.org