syzbot sent a hung task report and Eric explains that adversarial receiver may keep RWIN at 0 for a long time, so we are not guaranteed to make forward progress. Thread which took tx_lock and went to sleep may not release tx_lock for hours. Use interruptible sleep where possible and reschedule the work if it can't take the lock.
Testing: existing selftest passes
Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com Fixes: 79ffe6087e91 ("net/tls: add a TX lock") Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/ Cc: stable@vger.kernel.org # wait 4 weeks Signed-off-by: Jakub Kicinski kuba@kernel.org --- CC: borisp@nvidia.com CC: john.fastabend@gmail.com CC: simon.horman@netronome.com --- net/tls/tls_sw.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 021d760f9133..635b8bf6b937 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -956,7 +956,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) MSG_CMSG_COMPAT)) return -EOPNOTSUPP;
- mutex_lock(&tls_ctx->tx_lock); + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); + if (ret) + return ret; lock_sock(sk);
if (unlikely(msg->msg_controllen)) { @@ -1290,7 +1292,9 @@ int tls_sw_sendpage(struct sock *sk, struct page *page, MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) return -EOPNOTSUPP;
- mutex_lock(&tls_ctx->tx_lock); + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); + if (ret) + return ret; lock_sock(sk); ret = tls_sw_do_sendpage(sk, page, offset, size, flags); release_sock(sk); @@ -2435,11 +2439,19 @@ static void tx_work_handler(struct work_struct *work)
if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) return; - mutex_lock(&tls_ctx->tx_lock); - lock_sock(sk); - tls_tx_records(sk, -1); - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); + + if (mutex_trylock(&tls_ctx->tx_lock)) { + lock_sock(sk); + tls_tx_records(sk, -1); + release_sock(sk); + mutex_unlock(&tls_ctx->tx_lock); + } else if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { + /* Someone is holding the tx_lock, they will likely run Tx + * and cancel the work on their way out of the lock section. + * Schedule a long delay just in case. + */ + schedule_delayed_work(&ctx->tx_work.work, msecs_to_jiffies(10)); + } }
static bool tls_is_tx_ready(struct tls_sw_context_tx *ctx)
On Wed, Mar 1, 2023 at 1:29 AM Jakub Kicinski kuba@kernel.org wrote:
syzbot sent a hung task report and Eric explains that adversarial receiver may keep RWIN at 0 for a long time, so we are not guaranteed to make forward progress. Thread which took tx_lock and went to sleep may not release tx_lock for hours. Use interruptible sleep where possible and reschedule the work if it can't take the lock.
Testing: existing selftest passes
Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com Fixes: 79ffe6087e91 ("net/tls: add a TX lock") Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/ Cc: stable@vger.kernel.org # wait 4 weeks Signed-off-by: Jakub Kicinski kuba@kernel.org
CC: borisp@nvidia.com CC: john.fastabend@gmail.com CC: simon.horman@netronome.com
This seems sane to me, thanks.
Reviewed-by: Eric Dumazet edumazet@google.com
Hello:
This patch was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Tue, 28 Feb 2023 16:28:57 -0800 you wrote:
syzbot sent a hung task report and Eric explains that adversarial receiver may keep RWIN at 0 for a long time, so we are not guaranteed to make forward progress. Thread which took tx_lock and went to sleep may not release tx_lock for hours. Use interruptible sleep where possible and reschedule the work if it can't take the lock.
Testing: existing selftest passes
[...]
Here is the summary with links: - [net] net: tls: avoid hanging tasks on the tx_lock https://git.kernel.org/netdev/net/c/f3221361dc85
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org