From: Eric Biggers ebiggers@google.com
When a CPU selects which CRNG to use, it accesses crng_node_pool without a memory barrier. That's wrong, because crng_node_pool can be set by another CPU concurrently. Without a memory barrier, the crng_state that is used might not appear to be fully initialized.
There's an explicit mb() on the write side, but it's redundant with cmpxchg() (or cmpxchg_release()) and does nothing to fix the read side.
Implement this correctly by using a cmpxchg_release() + smp_load_acquire() pair.
Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is harder to verify that it is correct, so I'd prefer not to use it here.
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Eric Biggers ebiggers@google.com ---
I sent this fix about a year ago (https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u), and though it's a correct fix, it was derailed by a debate about whether it's safe to use READ_ONCE() instead of smp_load_acquire() or not. Therefore, the current code, which (AFAIK) everyone agrees is buggy, was never actually fixed. Since random.c has a new maintainer now, I think it's worth sending this fix for reconsideration.
drivers/char/random.c | 42 ++++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 20 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index 605969ed0f96..349a6f235c61 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -843,8 +843,8 @@ static void do_numa_crng_init(struct work_struct *work) crng_initialize_secondary(crng); pool[i] = crng; } - mb(); - if (cmpxchg(&crng_node_pool, NULL, pool)) { + /* pairs with smp_load_acquire() in select_crng() */ + if (cmpxchg_release(&crng_node_pool, NULL, pool) != NULL) { for_each_node(i) kfree(pool[i]); kfree(pool); @@ -857,8 +857,26 @@ static void numa_crng_init(void) { schedule_work(&numa_crng_init_work); } + +static inline struct crng_state *select_crng(void) +{ + struct crng_state **pool; + int nid = numa_node_id(); + + /* pairs with cmpxchg_release() in do_numa_crng_init() */ + pool = smp_load_acquire(&crng_node_pool); + if (pool && pool[nid]) + return pool[nid]; + + return &primary_crng; +} #else static void numa_crng_init(void) {} + +static inline struct crng_state *select_crng(void) +{ + return &primary_crng; +} #endif
/* @@ -1005,15 +1023,7 @@ static void _extract_crng(struct crng_state *crng,
static void extract_crng(__u8 out[CHACHA_BLOCK_SIZE]) { - struct crng_state *crng = NULL; - -#ifdef CONFIG_NUMA - if (crng_node_pool) - crng = crng_node_pool[numa_node_id()]; - if (crng == NULL) -#endif - crng = &primary_crng; - _extract_crng(crng, out); + _extract_crng(select_crng(), out); }
/* @@ -1042,15 +1052,7 @@ static void _crng_backtrack_protect(struct crng_state *crng,
static void crng_backtrack_protect(__u8 tmp[CHACHA_BLOCK_SIZE], int used) { - struct crng_state *crng = NULL; - -#ifdef CONFIG_NUMA - if (crng_node_pool) - crng = crng_node_pool[numa_node_id()]; - if (crng == NULL) -#endif - crng = &primary_crng; - _crng_backtrack_protect(crng, tmp, used); + _crng_backtrack_protect(select_crng(), tmp, used); }
static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
Hi Eric,
This patch seems fine to me, and I'll apply it in a few days after sitting on the list for comments, but:
Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is harder to verify that it is correct, so I'd prefer not to use it here. (https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u), and though it's a correct fix, it was derailed by a debate about whether it's safe to use READ_ONCE() instead of smp_load_acquire() or not.
But holy smokes... I chuckled at your, "please explain in English." :)
Paul - if you'd like to look at this patch and confirm that this specific patch and usage is fine to be changed into READ_ONCE() instead of smp_load_acquire(), please pipe up here. And I really do mean this specific patch and usage, not to be confused with any other usage elsewhere in the kernel or question about general things, which doubtlessly involve larger discussions like the one Eric linked to above. If you're certain this patch here is READ_ONCE()able, I'd appreciate your saying so with a simple, "it is safe; go for it", since I'd definitely like the optimization if it's safe. If I don't hear from you, I'll apply this as-is from Eric, as I'd rather be safe than sorry.
Jason
On Mon, Dec 20, 2021 at 04:07:28PM +0100, Jason A. Donenfeld wrote:
Hi Eric,
This patch seems fine to me, and I'll apply it in a few days after sitting on the list for comments, but:
Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is harder to verify that it is correct, so I'd prefer not to use it here. (https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u), and though it's a correct fix, it was derailed by a debate about whether it's safe to use READ_ONCE() instead of smp_load_acquire() or not.
But holy smokes... I chuckled at your, "please explain in English." :)
Paul - if you'd like to look at this patch and confirm that this specific patch and usage is fine to be changed into READ_ONCE() instead of smp_load_acquire(), please pipe up here. And I really do mean this specific patch and usage, not to be confused with any other usage elsewhere in the kernel or question about general things, which doubtlessly involve larger discussions like the one Eric linked to above. If you're certain this patch here is READ_ONCE()able, I'd appreciate your saying so with a simple, "it is safe; go for it", since I'd definitely like the optimization if it's safe. If I don't hear from you, I'll apply this as-is from Eric, as I'd rather be safe than sorry.
First I would want to see some evidence that READ_ONCE() was really providing measurable performance benefit. Such evidence would be easiest to obtain by running on a weakly ordered system such as ARM, ARMv8, or PowerPC.
If this does provide a measurable benefit, why not the following?
static inline struct crng_state *select_crng(void) { struct crng_state **pool; struct crng_state *pooln; int nid = numa_node_id();
/* pairs with cmpxchg_release() in do_numa_crng_init() */ pool = rcu_dereference(&crng_node_pool); if (pool) { pooln = rcu_dereference(pool[nid]); if (pooln) return pooln; }
return &primary_crng; }
This is in ignorance of the kfree() side of this code. So another question is "Suppose that there was a long delay (vCPU preemption, for example) just before the 'return pooln'. What prevents a use-after-free bug?"
Of course, this question applies equally to the smp_load_acquire() version.
Thanx, Paul
On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney paulmck@kernel.org wrote:
First I would want
It looks like you've answered my question with four other questions, which seem certainly technically warranted, but also indicates we're probably not going to get to the nice easy resting place of, "it is safe; go for it" that I was hoping for. In light of that, it seems like merging Eric's patch is reasonable.
Jason
On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney paulmck@kernel.org wrote:
First I would want
It looks like you've answered my question with four other questions, which seem certainly technically warranted, but also indicates we're probably not going to get to the nice easy resting place of, "it is safe; go for it" that I was hoping for. In light of that, it seems like merging Eric's patch is reasonable.
My hope would be that the questions can be quickly answered by the developers and maintainers. But yes, hope springs eternal.
Thanx, Paul
On Mon, Dec 20, 2021 at 10:31:40AM -0800, Paul E. McKenney wrote:
On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney paulmck@kernel.org wrote:
First I would want
It looks like you've answered my question with four other questions, which seem certainly technically warranted, but also indicates we're probably not going to get to the nice easy resting place of, "it is safe; go for it" that I was hoping for. In light of that, it seems like merging Eric's patch is reasonable.
My hope would be that the questions can be quickly answered by the developers and maintainers. But yes, hope springs eternal.
Thanx, Paul
I wouldn't expect READ_ONCE() to provide a noticable performance improvement here, as it would be lost in the noise of the other work done, especially chacha20_block().
The data structures in question are never freed, so your other questions are irrelevant, if I understand correctly.
- Eric
On Mon, Dec 20, 2021 at 12:35:05PM -0600, Eric Biggers wrote:
On Mon, Dec 20, 2021 at 10:31:40AM -0800, Paul E. McKenney wrote:
On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney paulmck@kernel.org wrote:
First I would want
It looks like you've answered my question with four other questions, which seem certainly technically warranted, but also indicates we're probably not going to get to the nice easy resting place of, "it is safe; go for it" that I was hoping for. In light of that, it seems like merging Eric's patch is reasonable.
My hope would be that the questions can be quickly answered by the developers and maintainers. But yes, hope springs eternal.
Thanx, Paul
I wouldn't expect READ_ONCE() to provide a noticable performance improvement here, as it would be lost in the noise of the other work done, especially chacha20_block().
The data structures in question are never freed, so your other questions are irrelevant, if I understand correctly.
Very good, and thank you! You are correct, if the structures never are freed, there is no use-after-free issue. And that also explains why I was not able to find the free path. ;-)
So the main issue is the race between insertion and lookup. So yes, READ_ONCE() suffices.
This assumes that the various crng_node_pool[i] pointers never change while accessible to readers (and that some sort of synchronization applies to the values in the pointed-to structure). If these pointers do change, then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where the value returned from this READ_ONCE() is both tested and returned. (As in assign this value to a temporary.)
But if the various crng_node_pool[i] pointers really are constant while readers can access them, then the cmpxchg_release() suffices. The loads from pool[nid] are then data-race free, and because they are unmarked, the compiler is prohibited from hoisting them out from within the "if" statement. The address dependency prohibits the CPU from reordering them.
So READ_ONCE() should be just fine. Which answers Jason's question. ;-)
Looking at _extract_crng(), if this was my code, I would use READ_ONCE() in the checks, but that might be my misunderstanding boot-time constraints or some such. Without some sort of constraint, I don't see how the code avoids confusion from reloads of crng->init_time if two CPUs concurrently see the expiration of CRNG_RESEED_INTERVAL, but I could easily be missing something that makes this safe. (And this is irrelevant to this patch.)
You do appear to have ->lock guarding the pointed-to data, so that is good.
Thanx, Paul
Hi Paul,
On Mon, Dec 20, 2021 at 8:00 PM Paul E. McKenney paulmck@kernel.org wrote:
This assumes that the various crng_node_pool[i] pointers never change while accessible to readers (and that some sort of synchronization applies to the values in the pointed-to structure). If these pointers do change, then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where the value returned from this READ_ONCE() is both tested and returned. (As in assign this value to a temporary.)
But if the various crng_node_pool[i] pointers really are constant while readers can access them, then the cmpxchg_release() suffices. The loads from pool[nid] are then data-race free, and because they are unmarked, the compiler is prohibited from hoisting them out from within the "if" statement. The address dependency prohibits the CPU from reordering them.
Right, this is just an initialization-time allocation and assignment, never updated or freed again after.
So READ_ONCE() should be just fine. Which answers Jason's question. ;-)
Great. So v2 of this patch can use READ_ONCE then. Thanks!
Looking at _extract_crng(), if this was my code, I would use READ_ONCE() in the checks, but that might be my misunderstanding boot-time constraints or some such. Without some sort of constraint, I don't see how the code avoids confusion from reloads of crng->init_time if two CPUs concurrently see the expiration of CRNG_RESEED_INTERVAL, but I could easily be missing something that makes this safe. (And this is irrelevant to this patch.)
Indeed init_time seems to race via the crng_reseed path, and READ_ONCE()ing that seems reasonable. The other setters of it -- initialize_{primary,secondary} -- are in the boot path.
Jason
On Mon, Dec 20, 2021 at 10:45:15PM +0100, Jason A. Donenfeld wrote:
Hi Paul,
On Mon, Dec 20, 2021 at 8:00 PM Paul E. McKenney paulmck@kernel.org wrote:
This assumes that the various crng_node_pool[i] pointers never change while accessible to readers (and that some sort of synchronization applies to the values in the pointed-to structure). If these pointers do change, then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where the value returned from this READ_ONCE() is both tested and returned. (As in assign this value to a temporary.)
But if the various crng_node_pool[i] pointers really are constant while readers can access them, then the cmpxchg_release() suffices. The loads from pool[nid] are then data-race free, and because they are unmarked, the compiler is prohibited from hoisting them out from within the "if" statement. The address dependency prohibits the CPU from reordering them.
Right, this is just an initialization-time allocation and assignment, never updated or freed again after.
So READ_ONCE() should be just fine. Which answers Jason's question. ;-)
Great. So v2 of this patch can use READ_ONCE then. Thanks!
Sure, I really don't care anymore. If people want READ_ONCE() here, I'll use it. It seems that the people who really prefer smp_load_acquire() aren't on this thread (unlike on https://lore.kernel.org/linux-fsdevel/20200713033330.205104-1-ebiggers@kerne... for example, where READ_ONCE() was rejected), so I guess that is what people are going to agree on in this particular case.
- Eric
On Sun, Dec 19, 2021 at 3:52 AM Eric Biggers ebiggers@kernel.org wrote:
+static inline struct crng_state *select_crng(void) +{
+static inline struct crng_state *select_crng(void) +{
Usually static inline is avoided in .c files. Any special reason why we'd need this especially much here? Those functions are pretty small and I assume will be inlined anyway on most architectures.
I just did a test on x86_64 with GCC 11, and the same file was produced with 'static' as with 'static inline'. Was there an arch/config/compiler combo you were concerned about here?
Jason
On Mon, Dec 20, 2021 at 04:17:59PM +0100, Jason A. Donenfeld wrote:
On Sun, Dec 19, 2021 at 3:52 AM Eric Biggers ebiggers@kernel.org wrote:
+static inline struct crng_state *select_crng(void) +{
+static inline struct crng_state *select_crng(void) +{
Usually static inline is avoided in .c files. Any special reason why we'd need this especially much here? Those functions are pretty small and I assume will be inlined anyway on most architectures.
I just did a test on x86_64 with GCC 11, and the same file was produced with 'static' as with 'static inline'. Was there an arch/config/compiler combo you were concerned about here?
No special reason, this is just a bad habit. I'm fine with omitting 'inline' here.
- Eric
linux-stable-mirror@lists.linaro.org