There's currently an almost deadlock when ath9k shuts down if no random bytes are available:
Thread A Thread B ------------------------------------------------------------------------- rng_dev_read get_current_rng kref_get(¤t_rng->ref) rng_get_data ath9k_rng_read msleep_interruptible(...) ath9k_rng_stop devm_hwrng_unregister devm_hwrng_release hwrng_unregister drop_current_rng kref_put(¤t_rng->ref, cleanup_rng) // Does NOT call cleanup_rng here, // because of thread A's kref_get. wait_for_completion(&rng->cleanup_done); // Waits for a really long time... // Eventually sleep is over... put_rng kref_put(&rng->ref, cleanup_rng); cleanup_rng complete(&rng->cleanup_done); return
When thread B doesn't return right away, sleep and other functions that are waiting for ath9k_rng_stop to complete time out, and problems ensue.
This commit fixes the problem by using another completion inside of ath9k_rng_read so that hwrng_unregister can interrupt it as needed.
Reported-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- I do not have an ath9k and therefore I can't test this myself. The analysis above was done completely statically, with no dynamic tracing and just a bug report of symptoms from Gregory. So it might be totally wrong. Thus, this patch very much requires Gregory's testing. Please don't apply it until we have his `Tested-by` line.
drivers/net/wireless/ath/ath9k/ath9k.h | 1 + drivers/net/wireless/ath/ath9k/rng.c | 26 ++++++++++++++++---------- 2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h index 3ccf8cfc6b63..731db7f70e5d 100644 --- a/drivers/net/wireless/ath/ath9k/ath9k.h +++ b/drivers/net/wireless/ath/ath9k/ath9k.h @@ -1072,6 +1072,7 @@ struct ath_softc {
#ifdef CONFIG_ATH9K_HWRNG struct hwrng rng_ops; + struct completion rng_shutdown; u32 rng_last; char rng_name[sizeof("ath9k_65535")]; #endif diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..67c6b03a22ac 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2015 Qualcomm Atheros, Inc. + * Copyright (C) 2022 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above @@ -52,18 +53,13 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) +static unsigned long ath9k_rng_delay_get(u32 fail_stats) { - u32 delay; - if (fail_stats < 100) - delay = 10; + return msecs_to_jiffies(10); else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; + return msecs_to_jiffies(1000); + return msecs_to_jiffies(10000); }
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) @@ -83,7 +79,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) if (!wait || !max || likely(bytes_read) || fail_stats > 110) break;
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); + if (wait_for_completion_interruptible_timeout( + &sc->rng_shutdown, + ath9k_rng_delay_get(++fail_stats))) + break; }
if (wait && !bytes_read && max) @@ -91,6 +90,11 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) return bytes_read; }
+static void ath9k_rng_cleanup(struct hwrng *rng) +{ + complete(&container_of(rng, struct ath_softc, rng_ops)->rng_shutdown); +} + void ath9k_rng_start(struct ath_softc *sc) { static atomic_t serial = ATOMIC_INIT(0); @@ -104,8 +108,10 @@ void ath9k_rng_start(struct ath_softc *sc)
snprintf(sc->rng_name, sizeof(sc->rng_name), "ath9k_%u", (atomic_inc_return(&serial) - 1) & U16_MAX); + init_completion(&sc->rng_shutdown); sc->rng_ops.name = sc->rng_name; sc->rng_ops.read = ath9k_rng_read; + sc->rng_ops.cleanup = ath9k_rng_cleanup; sc->rng_ops.quality = 320;
if (devm_hwrng_register(sc->dev, &sc->rng_ops))
Hi Jason,
I think you are on the right track, but even with this patch 'ip link set wlan0 down' blocks until the hwrng reader gives up. The reader can either be userspace (dd, cat, etc) or it can also be the rng_core module. I can replicate the hang in the two different situations, so I gathered two stack traces for 'ip' depending on the reader of hwrng:
-userspace- $ ip link set wlan0 up $ dd if=/dev/hwrng count=1 & # blocks until interrupted or receives -EIO $ ip link set wlan0 down & # blocks until dd exits $ cat /proc/$(pidof ip)/stack [<0>] devres_release+0x2b/0x60 [<0>] ath9k_rng_stop+0x27/0x40 [ath9k] [<0>] ath9k_stop+0x40/0x230 [ath9k] [<0>] drv_stop+0x34/0x100 [mac80211] [<0>] ieee80211_do_stop+0x685/0x880 [mac80211] [<0>] ieee80211_stop+0x41/0x170 [mac80211] [<0>] __dev_close_many+0x9e/0x110 [<0>] __dev_change_flags+0x1a7/0x250 [<0>] dev_change_flags+0x26/0x60 [<0>] do_setlink+0x32d/0x1220 [<0>] __rtnl_newlink+0x5a2/0x9f0 [<0>] rtnl_newlink+0x47/0x70 [<0>] rtnetlink_rcv_msg+0x143/0x370 [<0>] netlink_rcv_skb+0x55/0x100 [<0>] netlink_unicast+0x243/0x390 [<0>] netlink_sendmsg+0x254/0x4b0 [<0>] sock_sendmsg+0x60/0x70 [<0>] ____sys_sendmsg+0x264/0x2a0 [<0>] ___sys_sendmsg+0x96/0xd0 [<0>] __sys_sendmsg+0x7a/0xd0 [<0>] do_syscall_64+0x5f/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
-rng_core- $ ip link set wlan0 up $ # wait a while, maybe a minute or two $ ip link set wlan0 down & # blocks until 10s after 'hwrng: no data available' $ cat /proc/$(pidof ip)/stack [<0>] kthread_stop+0x65/0x170 [<0>] hwrng_unregister+0xbe/0xe0 [rng_core] [<0>] devres_release+0x2b/0x60 [<0>] ath9k_rng_stop+0x27/0x40 [ath9k] [<0>] ath9k_stop+0x40/0x230 [ath9k] [<0>] drv_stop+0x34/0x100 [mac80211] [<0>] ieee80211_do_stop+0x685/0x880 [mac80211] [<0>] ieee80211_stop+0x41/0x170 [mac80211] [<0>] __dev_close_many+0x9e/0x110 [<0>] __dev_change_flags+0x1a7/0x250 [<0>] dev_change_flags+0x26/0x60 [<0>] do_setlink+0x32d/0x1220 [<0>] __rtnl_newlink+0x5a2/0x9f0 [<0>] rtnl_newlink+0x47/0x70 [<0>] rtnetlink_rcv_msg+0x143/0x370 [<0>] netlink_rcv_skb+0x55/0x100 [<0>] netlink_unicast+0x243/0x390 [<0>] netlink_sendmsg+0x254/0x4b0 [<0>] sock_sendmsg+0x60/0x70 [<0>] ____sys_sendmsg+0x264/0x2a0 [<0>] ___sys_sendmsg+0x96/0xd0 [<0>] __sys_sendmsg+0x7a/0xd0 [<0>] do_syscall_64+0x5f/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Regards, - Greg.
On Thu, Jun 23, 2022 at 6:15 PM Jason A. Donenfeld Jason@zx2c4.com wrote:
There's currently an almost deadlock when ath9k shuts down if no random bytes are available:
Thread A Thread B
rng_dev_read get_current_rng kref_get(¤t_rng->ref) rng_get_data ath9k_rng_read msleep_interruptible(...) ath9k_rng_stop devm_hwrng_unregister devm_hwrng_release hwrng_unregister drop_current_rng kref_put(¤t_rng->ref, cleanup_rng) // Does NOT call cleanup_rng here, // because of thread A's kref_get. wait_for_completion(&rng->cleanup_done); // Waits for a really long time... // Eventually sleep is over... put_rng kref_put(&rng->ref, cleanup_rng); cleanup_rng complete(&rng->cleanup_done); return
When thread B doesn't return right away, sleep and other functions that are waiting for ath9k_rng_stop to complete time out, and problems ensue.
This commit fixes the problem by using another completion inside of ath9k_rng_read so that hwrng_unregister can interrupt it as needed.
Reported-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
I do not have an ath9k and therefore I can't test this myself. The analysis above was done completely statically, with no dynamic tracing and just a bug report of symptoms from Gregory. So it might be totally wrong. Thus, this patch very much requires Gregory's testing. Please don't apply it until we have his `Tested-by` line.
drivers/net/wireless/ath/ath9k/ath9k.h | 1 + drivers/net/wireless/ath/ath9k/rng.c | 26 ++++++++++++++++---------- 2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h index 3ccf8cfc6b63..731db7f70e5d 100644 --- a/drivers/net/wireless/ath/ath9k/ath9k.h +++ b/drivers/net/wireless/ath/ath9k/ath9k.h @@ -1072,6 +1072,7 @@ struct ath_softc {
#ifdef CONFIG_ATH9K_HWRNG struct hwrng rng_ops;
struct completion rng_shutdown; u32 rng_last; char rng_name[sizeof("ath9k_65535")];
#endif diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..67c6b03a22ac 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -1,5 +1,6 @@ /*
- Copyright (c) 2015 Qualcomm Atheros, Inc.
- Copyright (C) 2022 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved.
- Permission to use, copy, modify, and/or distribute this software for any
- purpose with or without fee is hereby granted, provided that the above
@@ -52,18 +53,13 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) +static unsigned long ath9k_rng_delay_get(u32 fail_stats) {
u32 delay;
if (fail_stats < 100)
delay = 10;
return msecs_to_jiffies(10); else if (fail_stats < 105)
delay = 1000;
else
delay = 10000;
return delay;
return msecs_to_jiffies(1000);
return msecs_to_jiffies(10000);
}
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) @@ -83,7 +79,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) if (!wait || !max || likely(bytes_read) || fail_stats > 110) break;
msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
if (wait_for_completion_interruptible_timeout(
&sc->rng_shutdown,
ath9k_rng_delay_get(++fail_stats)))
break; } if (wait && !bytes_read && max)
@@ -91,6 +90,11 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) return bytes_read; }
+static void ath9k_rng_cleanup(struct hwrng *rng) +{
complete(&container_of(rng, struct ath_softc, rng_ops)->rng_shutdown);
+}
void ath9k_rng_start(struct ath_softc *sc) { static atomic_t serial = ATOMIC_INIT(0); @@ -104,8 +108,10 @@ void ath9k_rng_start(struct ath_softc *sc)
snprintf(sc->rng_name, sizeof(sc->rng_name), "ath9k_%u", (atomic_inc_return(&serial) - 1) & U16_MAX);
init_completion(&sc->rng_shutdown); sc->rng_ops.name = sc->rng_name; sc->rng_ops.read = ath9k_rng_read;
sc->rng_ops.cleanup = ath9k_rng_cleanup; sc->rng_ops.quality = 320; if (devm_hwrng_register(sc->dev, &sc->rng_ops))
-- 2.35.1
Hi Gregory,
On Thu, Jun 23, 2022 at 10:25:26PM -0700, Gregory Erwin wrote:
Hi Jason,
I think you are on the right track, but even with this patch 'ip link set wlan0 down' blocks until the hwrng reader gives up. The reader can either be userspace (dd, cat, etc) or it can also be the rng_core module. I can replicate the hang in the two different situations, so I gathered two stack traces for 'ip' depending on the reader of hwrng:
Thanks for the traces. I'll send a v2 to you shortly.
Jason
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and prevents calling msleep_interruptible() for tons of time when a thread is supposed to be shutting down, since msleep_interruptible() isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- I do not have an ath9k and therefore I can't test this myself. The analysis above was done completely statically, with no dynamic tracing and just a bug report of symptoms from Gregory. So it might be totally wrong. Thus, this patch very much requires Gregory's testing. Please don't apply it until we have his `Tested-by` line.
drivers/char/hw_random/core.c | 10 ++++++++-- drivers/net/wireless/ath/ath9k/rng.c | 19 ++----------------- 2 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..af1c1905bb7e 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) { - pr_warn("hwrng: no data available\n"); - msleep_interruptible(10000); + int i; + + for (i = 0; i < 100; ++i) { + if (kthread_should_stop() || + msleep_interruptible(10000 / 100)) + goto out; + } continue; }
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused) add_hwgenerator_randomness((void *)rng_fillbuf, rc, entropy >> 10); } +out: hwrng_fill = NULL; return 0; } diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..883110c66e5e 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{ - u32 delay; - - if (fail_stats < 100) - delay = 10; - else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; -} - static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); } - if (!wait || !max || likely(bytes_read) || fail_stats > 110) + if (!wait || !max || likely(bytes_read) || + ++fail_stats >= 100 || msleep_interruptible(5)) break; - - msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); }
if (wait && !bytes_read && max)
On Fri, Jun 24, 2022 at 1:44 PM Jason A. Donenfeld Jason@zx2c4.com wrote:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and prevents calling msleep_interruptible() for tons of time when a thread is supposed to be shutting down, since msleep_interruptible() isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
I do not have an ath9k and therefore I can't test this myself. The analysis above was done completely statically, with no dynamic tracing and just a bug report of symptoms from Gregory. So it might be totally wrong. Thus, this patch very much requires Gregory's testing. Please don't apply it until we have his `Tested-by` line.
drivers/char/hw_random/core.c | 10 ++++++++-- drivers/net/wireless/ath/ath9k/rng.c | 19 ++----------------- 2 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..af1c1905bb7e 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(10000);
int i;
for (i = 0; i < 100; ++i) {
if (kthread_should_stop() ||
msleep_interruptible(10000 / 100))
goto out;
} continue; }
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused) add_hwgenerator_randomness((void *)rng_fillbuf, rc, entropy >> 10); } +out: hwrng_fill = NULL; return 0; } diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..883110c66e5e 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{
u32 delay;
if (fail_stats < 100)
delay = 10;
else if (fail_stats < 105)
delay = 1000;
else
delay = 10000;
return delay;
-}
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); }
if (!wait || !max || likely(bytes_read) || fail_stats > 110)
if (!wait || !max || likely(bytes_read) ||
++fail_stats >= 100 || msleep_interruptible(5)) break;
msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); } if (wait && !bytes_read && max)
-- 2.35.1
Jason,
This patch is working as you described. Trying to read from /dev/hwrng consistently blocks for only 1.3s before returning an IO error. The longest that I observed 'ip link set wlan0 down' to block was also about 1.3s, and that was immediately after 'cat /dev/hwrng'. Additionally, the longest duration that I observed for wiphy_suspend() to return was just under 100ms.
Tested-by: Gregory Erwin gregerwin256@gmail.com
Hi Gregory,
On Sat, Jun 25, 2022 at 2:13 AM Gregory Erwin gregerwin256@gmail.com wrote:
This patch is working as you described. Trying to read from /dev/hwrng consistently blocks for only 1.3s before returning an IO error. The longest that I observed 'ip link set wlan0 down' to block was also about 1.3s, and that was immediately after 'cat /dev/hwrng'. Additionally, the longest duration that I observed for wiphy_suspend() to return was just under 100ms.
Tested-by: Gregory Erwin gregerwin256@gmail.com
Great, thanks for testing. I think that barring more invasive changes to the hwrng subsystem, a heuristic approach like this is the best we're going to do inside the ath9k driver itself.
So Toke/Kalle - can you queue this up?
Jason
On Fri, Jun 24, 2022 at 10:44:33PM +0200, Jason A. Donenfeld wrote: .
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..af1c1905bb7e 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused) break; if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(10000);
int i;
for (i = 0; i < 100; ++i) {
if (kthread_should_stop() ||
msleep_interruptible(10000 / 100))
goto out;
}
Please use schedule_timeout_interruptible. But if you're going to make it interruptible it should probably at least try to do something about signals rather than just ignore them.
Cheers,
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and prevents calling msleep_interruptible() for tons of time when a thread is supposed to be shutting down, since msleep_interruptible() isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/hw_random/core.c | 10 ++++++++-- drivers/net/wireless/ath/ath9k/rng.c | 20 +++----------------- 2 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..a15273271d87 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) { - pr_warn("hwrng: no data available\n"); - msleep_interruptible(10000); + int i; + + for (i = 0; i < 100; ++i) { + if (kthread_should_stop() || + schedule_timeout_interruptible(HZ / 20)) + goto out; + } continue; }
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused) add_hwgenerator_randomness((void *)rng_fillbuf, rc, entropy >> 10); } +out: hwrng_fill = NULL; return 0; } diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..39195f89ea85 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{ - u32 delay; - - if (fail_stats < 100) - delay = 10; - else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; -} - static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); } - if (!wait || !max || likely(bytes_read) || fail_stats > 110) + if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 || + schedule_timeout_interruptible(HZ / 20) || + ((current->flags & PF_KTHREAD) && kthread_should_stop())) break; - - msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); }
if (wait && !bytes_read && max)
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and prevents calling msleep_interruptible() for tons of time when a thread is supposed to be shutting down, since msleep_interruptible() isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/hw_random/core.c | 10 ++++++++-- drivers/net/wireless/ath/ath9k/rng.c | 20 +++----------------- 2 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..a15273271d87 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) { - pr_warn("hwrng: no data available\n"); - msleep_interruptible(10000); + int i; + + for (i = 0; i < 100; ++i) { + if (kthread_should_stop() || + schedule_timeout_interruptible(HZ / 20)) + goto out; + } continue; }
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused) add_hwgenerator_randomness((void *)rng_fillbuf, rc, entropy >> 10); } +out: hwrng_fill = NULL; return 0; } diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..757603d1949d 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{ - u32 delay; - - if (fail_stats < 100) - delay = 10; - else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; -} - static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); } - if (!wait || !max || likely(bytes_read) || fail_stats > 110) + if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 || + ((current->flags & PF_KTHREAD) && kthread_should_stop()) || + schedule_timeout_interruptible(HZ / 20)) break; - - msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); }
if (wait && !bytes_read && max)
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- Sorry for all the churn here in sending a v4 and v5 so soon. The semantics of schedule_timeout_interruptible vs msleep_interruptible with respect to kthreads is kind of confusing. I'll send a follow up patch for that elsewhere. For now I think this should suffice for fixing the bug.
drivers/char/hw_random/core.c | 3 +-- drivers/net/wireless/ath/ath9k/rng.c | 20 +++----------------- 2 files changed, 4 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..5309fab98631 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,7 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) { - pr_warn("hwrng: no data available\n"); - msleep_interruptible(10000); + schedule_timeout_interruptible(HZ * 10); continue; }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..757603d1949d 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{ - u32 delay; - - if (fail_stats < 100) - delay = 10; - else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; -} - static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); } - if (!wait || !max || likely(bytes_read) || fail_stats > 110) + if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 || + ((current->flags & PF_KTHREAD) && kthread_should_stop()) || + schedule_timeout_interruptible(HZ / 20)) break; - - msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); }
if (wait && !bytes_read && max)
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/char/hw_random/core.c | 5 +++-- drivers/net/wireless/ath/ath9k/rng.c | 20 +++----------------- 2 files changed, 6 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 16f227b995e8..9987f0642285 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -513,8 +513,9 @@ static int hwrng_fillfn(void *unused) break;
if (rc <= 0) { - pr_warn("hwrng: no data available\n"); - msleep_interruptible(10000); + if (kthread_should_stop()) + break; + schedule_timeout_interruptible(HZ * 10); continue; }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c index cb5414265a9b..757603d1949d 100644 --- a/drivers/net/wireless/ath/ath9k/rng.c +++ b/drivers/net/wireless/ath/ath9k/rng.c @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size) return j << 2; }
-static u32 ath9k_rng_delay_get(u32 fail_stats) -{ - u32 delay; - - if (fail_stats < 100) - delay = 10; - else if (fail_stats < 105) - delay = 1000; - else - delay = 10000; - - return delay; -} - static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) { struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops); @@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) bytes_read += max & 3UL; memzero_explicit(&word, sizeof(word)); } - if (!wait || !max || likely(bytes_read) || fail_stats > 110) + if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 || + ((current->flags & PF_KTHREAD) && kthread_should_stop()) || + schedule_timeout_interruptible(HZ / 20)) break; - - msleep_interruptible(ath9k_rng_delay_get(++fail_stats)); }
if (wait && !bytes_read && max)
"Jason A. Donenfeld" Jason@zx2c4.com writes:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Gregory, care to take this version for a spin as well to double-check that it still resolves the issue? :)
-Toke
On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen toke@redhat.com wrote:
"Jason A. Donenfeld" Jason@zx2c4.com writes:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Gregory, care to take this version for a spin as well to double-check that it still resolves the issue? :)
-Toke
With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
Tested-by: Gregory Erwin gregerwin256@gmail.com
Hi Gregory,
On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin gregerwin256@gmail.com wrote:
On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen toke@redhat.com wrote:
"Jason A. Donenfeld" Jason@zx2c4.com writes:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Gregory, care to take this version for a spin as well to double-check that it still resolves the issue? :)
-Toke
With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
Tested-by: Gregory Erwin gregerwin256@gmail.com
Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have been the better patch, then?
Jason
On Tue, Jun 28, 2022 at 12:46 PM Jason A. Donenfeld Jason@zx2c4.com wrote:
Hi Gregory,
On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin gregerwin256@gmail.com wrote:
On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen toke@redhat.com wrote:
"Jason A. Donenfeld" Jason@zx2c4.com writes:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Gregory, care to take this version for a spin as well to double-check that it still resolves the issue? :)
-Toke
With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
Tested-by: Gregory Erwin gregerwin256@gmail.com
Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have been the better patch, then?
$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw | git am
That one, if you want to give it a spin and see if that 5-6s is back to ~1s or less.
Jason
On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw | git am
That one, if you want to give it a spin and see if that 5-6s is back to ~1s or less.
Whatever caused kthread_should_stop to return true should have woken the thread and caused schedule_timeout to return.
If it's not waking the thread up then we should find out why.
Oh wait you're checking kthread_should_stop before the schedule call instead of afterwards, that would do it.
Cheers,
Hi Herbert,
On Tue, Jun 28, 2022 at 12:52 PM Herbert Xu herbert@gondor.apana.org.au wrote:
On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw | git am
That one, if you want to give it a spin and see if that 5-6s is back to ~1s or less.
Whatever caused kthread_should_stop to return true should have woken the thread and caused schedule_timeout to return.
If it's not waking the thread up then we should find out why.
Oh wait you're checking kthread_should_stop before the schedule call instead of afterwards, that would do it.
Oh, that's a really good observation, thank you! I'll send a patch that does it right. I have to check kthread_should_stop() before, because the wake up might have been consumed by a sleep inside of the hwrng ->read_data() function, causing it to return early. So we have to check it before going to sleep again. And after, I need to be checking the return value of schedule_timeout_interruptible(), which I'm not in this latest. So v+1 coming up.
By the way, this thread might interest you: https://lore.kernel.org/lkml/20220627145716.641185-1-Jason@zx2c4.com/
Jason
Hi again,
On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
Oh wait you're checking kthread_should_stop before the schedule call instead of afterwards, that would do it.
Oh, that's a really good observation, thank you!
Wait, no. I already am kthread_should_stop() it afterwards. That "continue" goes to the top of the loop, which checks it in the while() statement. So something else is amiss. I guess I'll investigate.
Jason
Hi Herbert,
On Tue, Jun 28, 2022 at 02:05:34PM +0200, Jason A. Donenfeld wrote:
Hi again,
On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
Oh wait you're checking kthread_should_stop before the schedule call instead of afterwards, that would do it.
Oh, that's a really good observation, thank you!
Wait, no. I already am kthread_should_stop() it afterwards. That "continue" goes to the top of the loop, which checks it in the while() statement. So something else is amiss. I guess I'll investigate.
I worked out a solution for the "larger problem" now. It's a bit involved, but not too bad. I'll post a patch shortly.
Jason
On Mon, Jun 27, 2022 at 02:07:35PM +0200, Jason A. Donenfeld wrote:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
drivers/char/hw_random/core.c | 5 +++-- drivers/net/wireless/ath/ath9k/rng.c | 20 +++----------------- 2 files changed, 6 insertions(+), 19 deletions(-)
Acked-by: Herbert Xu herbert@gondor.apana.org.au
Thanks,
"Jason A. Donenfeld" Jason@zx2c4.com writes:
Even though hwrng provides a `wait` parameter, it doesn't work very well when waiting for a long time. There are numerous deadlocks that emerge related to shutdown. Work around this API limitation by waiting for a shorter amount of time and erroring more frequently. This commit also prevents hwrng from splatting messages to dmesg when there's a timeout and switches to using schedule_timeout_interruptible(), so that the kthread can be stopped.
Reported-by: Gregory Erwin gregerwin256@gmail.com Tested-by: Gregory Erwin gregerwin256@gmail.com Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Kalle Valo kvalo@kernel.org Cc: Rui Salvaterra rsalvaterra@gmail.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: stable@vger.kernel.org Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00... Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXten... Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Acked-by: Toke Høiland-Jørgensen toke@toke.dk
linux-stable-mirror@lists.linaro.org