[PATCH stable 0/3] recent failed backports for the rng

List overview All Threads
Download

newer

older

stable-rc/queue/5.10 baseline: 125...

FAILED: patch "[PATCH] misc:...

Jason A. Donenfeld

13 Oct 2022 13 Oct '22

3:36 p.m.

Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Jason

Jason A. Donenfeld (3): random: restore O_NONBLOCK support random: avoid reading two cache lines on irq randomness random: use expired timer rather than wq for mixing fast pool

drivers/char/mem.c | 4 ++-- drivers/char/random.c | 23 ++++++++++++++++------- 2 files changed, 18 insertions(+), 9 deletions(-)

-- 2.37.3

Show replies by date

Jason A. Donenfeld

13 Oct 13 Oct

3:36 p.m.

New subject: [PATCH stable 1/3] random: restore O_NONBLOCK support

commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 6b56bff9b68c..c5025ae6a37e 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -953,8 +953,8 @@ static const struct memdev { #endif [5] = { "zero", 0666, &zero_fops, 0 }, [7] = { "full", 0666, &full_fops, 0 }, - [8] = { "random", 0666, &random_fops, 0 }, - [9] = { "urandom", 0666, &urandom_fops, 0 }, + [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, + [9] = { "urandom", 0666, &urandom_fops, FMODE_NOWAIT }, #ifdef CONFIG_PRINTK [11] = { "kmsg", 0644, &kmsg_fops, 0 }, #endif diff --git a/drivers/char/random.c b/drivers/char/random.c index 1ef94d112521..39f811f3dcc9 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1294,6 +1294,11 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { int ret;

+ if (!crng_ready() && + ((kiocb->ki_flags & IOCB_NOWAIT) || + (kiocb->ki_filp->f_flags & O_NONBLOCK))) + return -EAGAIN; + ret = wait_for_random_bytes(); if (ret != 0) return ret;

-- 2.37.3

Greg KH

4:20 p.m.

New subject: [PATCH stable 1/3] random: restore O_NONBLOCK support

On Thu, Oct 13, 2022 at 09:36:52AM -0600, Jason A. Donenfeld wrote:

...

commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com

drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)

Still breaks on older kernels:

drivers/char/random.c: In function ‘random_read_iter’: drivers/char/random.c:1299:33: error: ‘IOCB_NOWAIT’ undeclared (first use in this function); did you mean ‘IPC_NOWAIT’? 1299 | ((kiocb->ki_flags & IOCB_NOWAIT) || | ^~~~~~~~~~~ | IPC_NOWAIT drivers/char/random.c:1299:33: note: each undeclared identifier is reported only once for each function it appears in drivers/char/mem.c:872:48: error: ‘FMODE_NOWAIT’ undeclared here (not in a function); did you mean ‘FOLL_NOWAIT’? 872 | [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, | ^~~~~~~~~~~~ | FOLL_NOWAIT

Greg KH

4:22 p.m.

New subject: [PATCH stable 1/3] random: restore O_NONBLOCK support

On Thu, Oct 13, 2022 at 06:20:10PM +0200, Greg KH wrote:

...

On Thu, Oct 13, 2022 at 09:36:52AM -0600, Jason A. Donenfeld wrote:

...
commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com

drivers/char/mem.c | 4 ++-- drivers/char/random.c | 5 +++++ 2 files changed, 7 insertions(+), 2 deletions(-)

Still breaks on older kernels:

drivers/char/random.c: In function ‘random_read_iter’: drivers/char/random.c:1299:33: error: ‘IOCB_NOWAIT’ undeclared (first use in this function); did you mean ‘IPC_NOWAIT’? 1299 | ((kiocb->ki_flags & IOCB_NOWAIT) || | ^~~~~~~~~~~ | IPC_NOWAIT drivers/char/random.c:1299:33: note: each undeclared identifier is reported only once for each function it appears in drivers/char/mem.c:872:48: error: ‘FMODE_NOWAIT’ undeclared here (not in a function); did you mean ‘FOLL_NOWAIT’? 872 | [8] = { "random", 0666, &random_fops, FMODE_NOWAIT }, | ^~~~~~~~~~~~ | FOLL_NOWAIT

Hm, that's only broken on 4.9, the other ones it worked, now queued up for 4.14, 4.19, and 5.4, thanks.

greg k-h

Jason A. Donenfeld

4:32 p.m.

New subject: [PATCH stable 4.9.y] random: restore O_NONBLOCK support

commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.

So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.

diff --git a/drivers/char/random.c b/drivers/char/random.c index 1cbc33ee5a5f..838f66723ccd 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1295,6 +1295,10 @@ static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { int ret;

+ if (!crng_ready() && + (kiocb->ki_filp->f_flags & O_NONBLOCK)) + return -EAGAIN; + ret = wait_for_random_bytes(); if (ret != 0) return ret;

-- 2.37.3

Greg KH

5:02 p.m.

New subject: [PATCH stable 4.9.y] random: restore O_NONBLOCK support

On Thu, Oct 13, 2022 at 10:32:31AM -0600, Jason A. Donenfeld wrote:

...

commit cd4f24ae9404fd31fc461066e57889be3b68641b upstream.

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua guozihua@huawei.com Reported-by: Zhongguohua zhongguohua1@huawei.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Theodore Ts'o tytso@mit.edu Cc: Andrew Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com

drivers/char/random.c | 4 ++++ 1 file changed, 4 insertions(+)

Now queued up, thanks.

greg k-h

Jason A. Donenfeld

3:36 p.m.

New subject: [PATCH stable 2/3] random: avoid reading two cache lines on irq randomness

commit 9ee0507e896b45af6d65408c77815800bce30008 upstream.

In order to avoid reading and dirtying two cache lines on every IRQ, move the work_struct to the bottom of the fast_pool struct. add_ interrupt_randomness() always touches .pool and .count, which are currently split, because .mix pushes everything down. Instead, move .mix to the bottom, so that .pool and .count are always in the first cache line, since .mix is only accessed when the pool is full.

Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c index 39f811f3dcc9..6dd9544930f8 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -890,10 +890,10 @@ void __init add_bootloader_randomness(const void *buf, size_t len) }

struct fast_pool { - struct work_struct mix; unsigned long pool[4]; unsigned long last; unsigned int count; + struct work_struct mix; };

static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = {

-- 2.37.3

Jason A. Donenfeld

3:36 p.m.

New subject: [PATCH stable 3/3] random: use expired timer rather than wq for mixing fast pool

commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.

Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing:

do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock()

to doing:

do_some_stuff() queue_work_on(some_other_stuff_worker)

This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:

...

MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand

Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:

We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf

Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat

Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once.

Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c index 6dd9544930f8..2d6bf0900536 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -893,17 +893,20 @@ struct fast_pool { unsigned long pool[4]; unsigned long last; unsigned int count; - struct work_struct mix; + struct timer_list mix; };

+static void mix_interrupt_randomness(struct timer_list *work); + static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = { #ifdef CONFIG_64BIT #define FASTMIX_PERM SIPHASH_PERMUTATION - .pool = { SIPHASH_CONST_0, SIPHASH_CONST_1, SIPHASH_CONST_2, SIPHASH_CONST_3 } + .pool = { SIPHASH_CONST_0, SIPHASH_CONST_1, SIPHASH_CONST_2, SIPHASH_CONST_3 }, #else #define FASTMIX_PERM HSIPHASH_PERMUTATION - .pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 } + .pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 }, #endif + .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) };

/* @@ -945,7 +948,7 @@ int __cold random_online_cpu(unsigned int cpu) } #endif

-static void mix_interrupt_randomness(struct work_struct *work) +static void mix_interrupt_randomness(struct timer_list *work) { struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); /* @@ -999,10 +1002,11 @@ void add_interrupt_randomness(int irq) if (new_count < 1024 && !time_is_before_jiffies(fast_pool->last + HZ)) return;

- if (unlikely(!fast_pool->mix.func)) - INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); fast_pool->count |= MIX_INFLIGHT; - queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix); + if (!timer_pending(&fast_pool->mix)) { + fast_pool->mix.expires = jiffies; + add_timer_on(&fast_pool->mix, raw_smp_processor_id()); + } } EXPORT_SYMBOL_GPL(add_interrupt_randomness);

-- 2.37.3

Greg KH

4:18 p.m.

On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:

...

Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?

And 2 and 3 for all branches?

confused,

greg k-h

Jason A. Donenfeld

4:29 p.m.

On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:

...

On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:

...
Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?

And 2 and 3 for all branches?

For all of them they're not yet in.

I'll have a look at the 4.9 breakage.

Jason

Greg KH

4:54 p.m.

On Thu, Oct 13, 2022 at 10:29:40AM -0600, Jason A. Donenfeld wrote:

...

On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:

...
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:

...
Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?

And 2 and 3 for all branches?

For all of them they're not yet in.

I'll have a look at the 4.9 breakage.

Oops, 748bc4dd9e66 ("random: use expired timer rather than wq for mixing fast pool") does not work for 4.9.y or 4.14.y, it breaks the build there too:

drivers/char/random.c:909:63: error: macro "__TIMER_INITIALIZER" requires 4 arguments, but only 2 given 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^ In file included from ./include/linux/workqueue.h:9, from ./include/linux/rhashtable.h:26, from ./include/linux/ipc.h:7, from ./include/uapi/linux/sem.h:5, from ./include/linux/sem.h:9, from ./include/linux/sched.h:15, from ./include/linux/utsname.h:6, from drivers/char/random.c:28: ./include/linux/timer.h:67: note: macro "__TIMER_INITIALIZER" defined here 67 | #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ | drivers/char/random.c:909:16: error: ‘__TIMER_INITIALIZER’ undeclared here (not in a function); did you mean ‘TIMER_INITIALIZER’? 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^~~~~~~~~~~~~~~~~~~ | TIMER_INITIALIZER drivers/char/random.c:951:13: warning: ‘mix_interrupt_randomness’ defined but not used [-Wunused-function] 951 | static void mix_interrupt_randomness(struct timer_list *work) | ^~~~~~~~~~~~~~~~~~~~~~~~

Jason A. Donenfeld

5:01 p.m.

On Thu, Oct 13, 2022 at 10:53 AM Greg KH gregkh@linuxfoundation.org wrote:

...

On Thu, Oct 13, 2022 at 10:29:40AM -0600, Jason A. Donenfeld wrote:

...
On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:

...
On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:

...
Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?

And 2 and 3 for all branches?

For all of them they're not yet in.

I'll have a look at the 4.9 breakage.

Oops, 748bc4dd9e66 ("random: use expired timer rather than wq for mixing fast pool") does not work for 4.9.y or 4.14.y, it breaks the build there too:

drivers/char/random.c:909:63: error: macro "__TIMER_INITIALIZER" requires 4 arguments, but only 2 given 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^ In file included from ./include/linux/workqueue.h:9, from ./include/linux/rhashtable.h:26, from ./include/linux/ipc.h:7, from ./include/uapi/linux/sem.h:5, from ./include/linux/sem.h:9, from ./include/linux/sched.h:15, from ./include/linux/utsname.h:6, from drivers/char/random.c:28: ./include/linux/timer.h:67: note: macro "__TIMER_INITIALIZER" defined here 67 | #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ | drivers/char/random.c:909:16: error: ‘__TIMER_INITIALIZER’ undeclared here (not in a function); did you mean ‘TIMER_INITIALIZER’? 909 | .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) | ^~~~~~~~~~~~~~~~~~~ | TIMER_INITIALIZER drivers/char/random.c:951:13: warning: ‘mix_interrupt_randomness’ defined but not used [-Wunused-function] 951 | static void mix_interrupt_randomness(struct timer_list *work) | ^~~~~~~~~~~~~~~~~~~~~~~~

Ahh the dark old days of timers taking an unsigned long. Fixing. (And testing...)

Jason A. Donenfeld

5:07 p.m.

New subject: [PATCH stable 4.9.y 4.14.y] random: use expired timer rather than wq for mixing fast pool

commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.

do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock()

to doing:

do_some_stuff() queue_work_on(some_other_stuff_worker)

This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:

...

MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand

Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:

We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf

Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat

Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/random.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c index bf5f0149d9d4..174dd139d2f3 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -894,7 +894,7 @@ struct fast_pool { unsigned long pool[4]; unsigned long last; unsigned int count; - struct work_struct mix; + struct timer_list mix; };

static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = { @@ -946,9 +946,9 @@ int __cold random_online_cpu(unsigned int cpu) } #endif

-static void mix_interrupt_randomness(struct work_struct *work) +static void mix_interrupt_randomness(unsigned long data) { - struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); + struct fast_pool *fast_pool = (struct fast_pool *)data; /* * The size of the copied stack pool is explicitly 2 longs so that we * only ever ingest half of the siphash output each time, retaining @@ -1000,10 +1000,14 @@ void add_interrupt_randomness(int irq) if (new_count < 1024 && !time_is_before_jiffies(fast_pool->last + HZ)) return;

- if (unlikely(!fast_pool->mix.func)) - INIT_WORK(&fast_pool->mix, mix_interrupt_randomness); + if (unlikely(!fast_pool->mix.data)) + setup_timer(&fast_pool->mix, mix_interrupt_randomness, (unsigned long)fast_pool); + fast_pool->count |= MIX_INFLIGHT; - queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix); + if (!timer_pending(&fast_pool->mix)) { + fast_pool->mix.expires = jiffies; + add_timer_on(&fast_pool->mix, raw_smp_processor_id()); + } } EXPORT_SYMBOL_GPL(add_interrupt_randomness);

-- 2.37.3

Greg KH

5:17 p.m.

New subject: [PATCH stable 4.9.y 4.14.y] random: use expired timer rather than wq for mixing fast pool

On Thu, Oct 13, 2022 at 11:07:31AM -0600, Jason A. Donenfeld wrote:

...

commit 748bc4dd9e663f23448d8ad7e58c011a67ea1eca upstream.

Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing:
do_some_stuff()
spin_lock()
do_some_other_stuff()
spin_unlock()
to doing:
do_some_stuff()
queue_work_on(some_other_stuff_worker)
This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message:

...
MT27500 Family [ConnectX-3] cards: Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 base lid: 0x6 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand

Cards are configured with IP addresses on private subnet for IPoIB performance testing. Regression identified in this bug is in TCP latency in this stack as reported by qperf tcp_lat metric:

We have one system listen as a qperf server: [root@yourQperfServer ~]# qperf

Have the other system connect to qperf server as a client (in this case, it’s X7 server with Mellanox card): [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat

Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once.

Reported-by: Sherry Yang sherry.yang@oracle.com Tested-by: Paul Webb paul.x.webb@oracle.com Cc: Sherry Yang sherry.yang@oracle.com Cc: Phillip Goerl phillip.goerl@oracle.com Cc: Jack Vogel jack.vogel@oracle.com Cc: Nicky Veitch nicky.veitch@oracle.com Cc: Colm Harrington colm.harrington@oracle.com Cc: Ramanan Govindarajan ramanan.govindarajan@oracle.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Tejun Heo tj@kernel.org Cc: Sultan Alsawaf sultan@kerneltoast.com Cc: stable@vger.kernel.org Fixes: 58340f8e952b ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com

drivers/char/random.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)

That worked, thanks, now queued up.

greg k-h

Greg KH

4:30 p.m.

On Thu, Oct 13, 2022 at 06:18:53PM +0200, Greg KH wrote:

...

On Thu, Oct 13, 2022 at 09:36:51AM -0600, Jason A. Donenfeld wrote:

...
Hi Greg,

You just sent me an automated email about these failing, so here they are backported.

Backported where? Patch 1 is already in 5.10 and newer, does this one work in older?

And 2 and 3 for all branches?

Ok, 2 and 3 are now queued up everywhere, only thing that didn't work is patch 1 on the 4.9.y branch. Can you provide a working backport for there?

thanks,

greg k-h

1005

days inactive

1005

days old

linux-stable-mirror@lists.linaro.org

14 comments

participants

tags (0)

participants (2)

Greg KH
Jason A. Donenfeld