Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Jason
Jason A. Donenfeld (3): random: restore O_NONBLOCK support random: avoid reading two cache lines on irq randomness random: use expired timer rather than wq for mixing fast pool
drivers/char/mem.c | 4 ++-- drivers/char/random.c | 23 ++++++++++++++++------- 2 files changed, 18 insertions(+), 9 deletions(-)
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.
So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.
In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.
Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 6b56bff9b68c..c5025ae6a37e 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -953,8 +953,8 @@ static const struct memdev { #endif [5] = { "zero", 0666, &zero_fops, 0 }, [7] = { "full", 0666, &full_fops, 0 }, - [8] = { "random", 0666, &random_fops, 0 }, - [9] = { "urandom", 0666, &urandom_fops, 0 }, + [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, + [9] = { "urandom", 0666, &urandom_fops, FMODE_NOWAIT }, #ifdef CONFIG_PRINTK [11] = { "kmsg", 0644, &kmsg_fops, 0 }, #endif diff --git a/drivers/char/random.c b/drivers/char/random.c index 1ef94d112521..39f811f3dcc9 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1294,6 +1294,11 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { int ret;
+ if (!crng_ready() && + ((kiocb->ki_flags & IOCB_NOWAIT) || + (kiocb->ki_filp->f_flags & O_NONBLOCK))) + return -EAGAIN; + ret = wait_for_random_bytes(); if (ret != 0) return ret;
On Thu, Oct 13, 2022 at 09:36:52AM -0600, Jason A. Donenfeld wrote:
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.
So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.
In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.
Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)
Still breaks on older kernels:
drivers/char/random.c: In function ‘random_read_iter’: drivers/char/random.c:1299:33: error: ‘IOCB_NOWAIT’ undeclared (first use in this function); did you mean ‘IPC_NOWAIT’? 1299 | ((kiocb->ki_flags & IOCB_NOWAIT) || | ^~~~~~~~~~~ | IPC_NOWAIT drivers/char/random.c:1299:33: note: each undeclared identifier is reported only once for each function it appears in drivers/char/mem.c:872:48: error: ‘FMODE_NOWAIT’ undeclared here (not in a function); did you mean ‘FOLL_NOWAIT’? 872 | [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, | ^~~~~~~~~~~~ | FOLL_NOWAIT
On Thu, Oct 13, 2022 at 06:20:10PM +0200, Greg KH wrote:
On Thu, Oct 13, 2022 at 09:36:52AM -0600, Jason A. Donenfeld wrote:
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.
So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.
In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.
Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)
Still breaks on older kernels:
drivers/char/random.c: In function ‘random_read_iter’: drivers/char/random.c:1299:33: error: ‘IOCB_NOWAIT’ undeclared (first use in this function); did you mean ‘IPC_NOWAIT’? 1299 | ((kiocb->ki_flags & IOCB_NOWAIT) || | ^~~~~~~~~~~ | IPC_NOWAIT drivers/char/random.c:1299:33: note: each undeclared identifier is reported only once for each function it appears in drivers/char/mem.c:872:48: error: ‘FMODE_NOWAIT’ undeclared here (not in a function); did you mean ‘FOLL_NOWAIT’? 872 | [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, | ^~~~~~~~~~~~ | FOLL_NOWAIT
Hm, that's only broken on 4.9, the other ones it worked, now queued up for 4.14, 4.19, and 5.4, thanks.
greg k-h
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.
So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.
In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.
Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/char/random.c b/drivers/char/random.c index 1cbc33ee5a5f..838f66723ccd 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1295,6 +1295,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { int ret;
+ if (!crng_ready() && + (kiocb->ki_filp->f_flags & O_NONBLOCK)) + return -EAGAIN; + ret = wait_for_random_bytes(); if (ret != 0) return ret;
On Thu, Oct 13, 2022 at 10:32:31AM -0600, Jason A. Donenfeld wrote:
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.
So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.
In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.
Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
drivers/char/random.c | 4 ++++ 1 file changed, 4 insertions(+)
Now queued up, thanks.
greg k-h
commit 9ee0507e896b45af6d65408c77815800bce30008 upstream.
In order to avoid reading and dirtying two cache lines on every IRQ, move the work_struct to the bottom of the fast_pool struct. add_ interrupt_randomness() always touches .pool and .count, which are currently split, because .mix pushes everything down. Instead, move .mix to the bottom, so that .pool and .count are always in the first cache line, since .mix is only accessed when the pool is full.
Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index 39f811f3dcc9..6dd9544930f8 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -890,10 +890,10 @@ void __init add_bootloader_randomness(const void *buf, size_t len) }
struct fast_pool { - struct work_struct mix; unsigned long pool[4]; unsigned long last; unsigned int count; + struct work_struct mix; };
static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = {
commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.
Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing:
do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock()
to doing:
do_some_stuff() queue_work_on(some_other_stuff_worker)
This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:
MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand
Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:
We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf
Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once.
Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index 6dd9544930f8..2d6bf0900536 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -893,17 +893,20 @@ struct fast_pool { unsigned long pool[4]; unsigned long last; unsigned int count; - struct work_struct mix; + struct timer_list mix; };
+static void mix_interrupt_randomness(struct timer_list *work); + static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = { #ifdef CONFIG_64BIT #define FASTMIX_PERM SIPHASH_PERMUTATION - .pool = { SIPHASH_CONST_0, SIPHASH_CONST_1, SIPHASH_CONST_2, SIPHASH_CONST_3 } + .pool = { SIPHASH_CONST_0, SIPHASH_CONST_1, SIPHASH_CONST_2, SIPHASH_CONST_3 }, #else #define FASTMIX_PERM HSIPHASH_PERMUTATION - .pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 } + .pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 }, #endif + .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) };
/* @@ -945,7 +948,7 @@ int __cold random_online_cpu(unsigned int cpu) } #endif
-static void mix_interrupt_randomness(struct work_struct *work) +static void mix_interrupt_randomness(struct timer_list *work) { struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); /* @@ -999,10 +1002,11 @@ void add_interrupt_randomness(int irq) if (new_count < 1024 && !time_is_before_jiffies(fast_pool->last + HZ)) return;
- if (unlikely(!fast_pool->mix.func)) - INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); fast_pool->count |= MIX_INFLIGHT; - queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix); + if (!timer_pending(&fast_pool->mix)) { + fast_pool->mix.expires = jiffies; + add_timer_on(&fast_pool->mix, raw_smp_processor_id()); + } } EXPORT_SYMBOL_GPL(add_interrupt_randomness);
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:
Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?
And 2 and 3 for all branches?
confused,
greg k-h
On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:
Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?
And 2 and 3 for all branches?
For all of them they're not yet in.
I'll have a look at the 4.9 breakage.
Jason
On Thu, Oct 13, 2022 at 10:29:40AM -0600, Jason A. Donenfeld wrote:
On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:
Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?
And 2 and 3 for all branches?
For all of them they're not yet in.
I'll have a look at the 4.9 breakage.
Oops, 748bc4dd9e66 ("random: use expired timer rather than wq for mixing fast pool") does not work for 4.9.y or 4.14.y, it breaks the build there too:
drivers/char/random.c:909:63: error: macro "__TIMER_INITIALIZER" requires 4 arguments, but only 2 given 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^ In file included from ./include/linux/workqueue.h:9, from ./include/linux/rhashtable.h:26, from ./include/linux/ipc.h:7, from ./include/uapi/linux/sem.h:5, from ./include/linux/sem.h:9, from ./include/linux/sched.h:15, from ./include/linux/utsname.h:6, from drivers/char/random.c:28: ./include/linux/timer.h:67: note: macro "__TIMER_INITIALIZER" defined here 67 | #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ | drivers/char/random.c:909:16: error: ‘__TIMER_INITIALIZER’ undeclared here (not in a function); did you mean ‘TIMER_INITIALIZER’? 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^~~~~~~~~~~~~~~~~~~ | TIMER_INITIALIZER drivers/char/random.c:951:13: warning: ‘mix_interrupt_randomness’ defined but not used [-Wunused-function] 951 | static void mix_interrupt_randomness(struct timer_list *work) | ^~~~~~~~~~~~~~~~~~~~~~~~
On Thu, Oct 13, 2022 at 10:53 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Oct 13, 2022 at 10:29:40AM -0600, Jason A. Donenfeld wrote:
On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:
Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?
And 2 and 3 for all branches?
For all of them they're not yet in.
I'll have a look at the 4.9 breakage.
Oops, 748bc4dd9e66 ("random: use expired timer rather than wq for mixing fast pool") does not work for 4.9.y or 4.14.y, it breaks the build there too:
drivers/char/random.c:909:63: error: macro "__TIMER_INITIALIZER" requires 4 arguments, but only 2 given 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^ In file included from ./include/linux/workqueue.h:9, from ./include/linux/rhashtable.h:26, from ./include/linux/ipc.h:7, from ./include/uapi/linux/sem.h:5, from ./include/linux/sem.h:9, from ./include/linux/sched.h:15, from ./include/linux/utsname.h:6, from drivers/char/random.c:28: ./include/linux/timer.h:67: note: macro "__TIMER_INITIALIZER" defined here 67 | #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ | drivers/char/random.c:909:16: error: ‘__TIMER_INITIALIZER’ undeclared here (not in a function); did you mean ‘TIMER_INITIALIZER’? 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^~~~~~~~~~~~~~~~~~~ | TIMER_INITIALIZER drivers/char/random.c:951:13: warning: ‘mix_interrupt_randomness’ defined but not used [-Wunused-function] 951 | static void mix_interrupt_randomness(struct timer_list *work) | ^~~~~~~~~~~~~~~~~~~~~~~~
Ahh the dark old days of timers taking an unsigned long. Fixing. (And testing...)
commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.
Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing:
do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock()
to doing:
do_some_stuff() queue_work_on(some_other_stuff_worker)
This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:
MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand
Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:
We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf
Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once.
Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c index bf5f0149d9d4..174dd139d2f3 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -894,7 +894,7 @@ struct fast_pool { unsigned long pool[4]; unsigned long last; unsigned int count; - struct work_struct mix; + struct timer_list mix; };
static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = { @@ -946,9 +946,9 @@ int __cold random_online_cpu(unsigned int cpu) } #endif
-static void mix_interrupt_randomness(struct work_struct *work) +static void mix_interrupt_randomness(unsigned long data) { - struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); + struct fast_pool *fast_pool = (struct fast_pool *)data; /* * The size of the copied stack pool is explicitly 2 longs so that we * only ever ingest half of the siphash output each time, retaining @@ -1000,10 +1000,14 @@ void add_interrupt_randomness(int irq) if (new_count < 1024 && !time_is_before_jiffies(fast_pool->last + HZ)) return;
- if (unlikely(!fast_pool->mix.func)) - INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); + if (unlikely(!fast_pool->mix.data)) + setup_timer(&fast_pool->mix, mix_interrupt_randomness, (unsigned long)fast_pool); + fast_pool->count |= MIX_INFLIGHT; - queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix); + if (!timer_pending(&fast_pool->mix)) { + fast_pool->mix.expires = jiffies; + add_timer_on(&fast_pool->mix, raw_smp_processor_id()); + } } EXPORT_SYMBOL_GPL(add_interrupt_randomness);
On Thu, Oct 13, 2022 at 11:07:31AM -0600, Jason A. Donenfeld wrote:
commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.
Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing:
do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock()
to doing:
do_some_stuff() queue_work_on(some_other_stuff_worker)
This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:
MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand
Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:
We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf
Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once.
Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
drivers/char/random.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
That worked, thanks, now queued up.
greg k-h
On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:
Hi Greg,
You just sent me an automated email about these failing, so here they are backported.
Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?
And 2 and 3 for all branches?
Ok, 2 and 3 are now queued up everywhere, only thing that didn't work is patch 1 on the 4.9.y branch. Can you provide a working backport for there?
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org