6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit c91c46de6bbc1147ae5dfe046b87f5f3d6593215 ]
A lot of drivers follow the same scheme to stop / start queues without introducing locks between xmit and NAPI tx completions. I'm guessing they all copy'n'paste each other's code. The original code dates back all the way to e1000 and Linux 2.6.19.
Smaller drivers shy away from the scheme and introduce a lock which may cause deadlocks in netpoll.
Provide macros which encapsulate the necessary logic.
The macros do not prevent false wake ups, the extra barrier required to close that race is not worth it. See discussion in: https://lore.kernel.org/all/c39312a2-4537-14b4-270c-9fe1fbb91e89@gmail.com/
Acked-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 95ecba62e2fd ("net: fix races in netdev_tx_sent_queue()/dev_watchdog()") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/driver.rst | 6 ++ include/linux/netdevice.h | 1 + include/net/netdev_queues.h | 144 ++++++++++++++++++++++++++++ 3 files changed, 151 insertions(+) create mode 100644 include/net/netdev_queues.h
diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst index 3040a74d421c7..870f933e4a1a1 100644 --- a/Documentation/networking/driver.rst +++ b/Documentation/networking/driver.rst @@ -67,6 +67,12 @@ and:: TX_BUFFS_AVAIL(dp) > 0) netif_wake_queue(dp->dev);
+Lockless queue stop / wake helper macros +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: include/net/netdev_queues.h + :doc: Lockless queue stopping / waking helpers. + No exclusive ownership ----------------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0373e09359905..8b67b266cce63 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3274,6 +3274,7 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev)
static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue) { + /* Must be an atomic op see netif_txq_try_stop() */ set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state); }
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h new file mode 100644 index 0000000000000..5236d78bbdebb --- /dev/null +++ b/include/net/netdev_queues.h @@ -0,0 +1,144 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_NET_QUEUES_H +#define _LINUX_NET_QUEUES_H + +#include <linux/netdevice.h> + +/** + * DOC: Lockless queue stopping / waking helpers. + * + * The netif_txq_maybe_stop() and __netif_txq_completed_wake() + * macros are designed to safely implement stopping + * and waking netdev queues without full lock protection. + * + * We assume that there can be no concurrent stop attempts and no concurrent + * wake attempts. The try-stop should happen from the xmit handler, + * while wake up should be triggered from NAPI poll context. + * The two may run concurrently (single producer, single consumer). + * + * The try-stop side is expected to run from the xmit handler and therefore + * it does not reschedule Tx (netif_tx_start_queue() instead of + * netif_tx_wake_queue()). Uses of the ``stop`` macros outside of the xmit + * handler may lead to xmit queue being enabled but not run. + * The waking side does not have similar context restrictions. + * + * The macros guarantee that rings will not remain stopped if there's + * space available, but they do *not* prevent false wake ups when + * the ring is full! Drivers should check for ring full at the start + * for the xmit handler. + * + * All descriptor ring indexes (and other relevant shared state) must + * be updated before invoking the macros. + */ + +#define netif_txq_try_stop(txq, get_desc, start_thrs) \ + ({ \ + int _res; \ + \ + netif_tx_stop_queue(txq); \ + /* Producer index and stop bit must be visible \ + * to consumer before we recheck. \ + * Pairs with a barrier in __netif_txq_maybe_wake(). \ + */ \ + smp_mb__after_atomic(); \ + \ + /* We need to check again in a case another \ + * CPU has just made room available. \ + */ \ + _res = 0; \ + if (unlikely(get_desc >= start_thrs)) { \ + netif_tx_start_queue(txq); \ + _res = -1; \ + } \ + _res; \ + }) \ + +/** + * netif_txq_maybe_stop() - locklessly stop a Tx queue, if needed + * @txq: struct netdev_queue to stop/start + * @get_desc: get current number of free descriptors (see requirements below!) + * @stop_thrs: minimal number of available descriptors for queue to be left + * enabled + * @start_thrs: minimal number of descriptors to re-enable the queue, can be + * equal to @stop_thrs or higher to avoid frequent waking + * + * All arguments may be evaluated multiple times, beware of side effects. + * @get_desc must be a formula or a function call, it must always + * return up-to-date information when evaluated! + * Expected to be used from ndo_start_xmit, see the comment on top of the file. + * + * Returns: + * 0 if the queue was stopped + * 1 if the queue was left enabled + * -1 if the queue was re-enabled (raced with waking) + */ +#define netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs) \ + ({ \ + int _res; \ + \ + _res = 1; \ + if (unlikely(get_desc < stop_thrs)) \ + _res = netif_txq_try_stop(txq, get_desc, start_thrs); \ + _res; \ + }) \ + + +/** + * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed + * @txq: struct netdev_queue to stop/start + * @get_desc: get current number of free descriptors (see requirements below!) + * @start_thrs: minimal number of descriptors to re-enable the queue + * @down_cond: down condition, predicate indicating that the queue should + * not be woken up even if descriptors are available + * + * All arguments may be evaluated multiple times. + * @get_desc must be a formula or a function call, it must always + * return up-to-date information when evaluated! + * + * Returns: + * 0 if the queue was woken up + * 1 if the queue was already enabled (or disabled but @down_cond is true) + * -1 if the queue was left unchanged (@start_thrs not reached) + */ +#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond) \ + ({ \ + int _res; \ + \ + _res = -1; \ + if (likely(get_desc > start_thrs)) { \ + /* Make sure that anybody stopping the queue after \ + * this sees the new next_to_clean. \ + */ \ + smp_mb(); \ + _res = 1; \ + if (unlikely(netif_tx_queue_stopped(txq)) && \ + !(down_cond)) { \ + netif_tx_wake_queue(txq); \ + _res = 0; \ + } \ + } \ + _res; \ + }) + +#define netif_txq_maybe_wake(txq, get_desc, start_thrs) \ + __netif_txq_maybe_wake(txq, get_desc, start_thrs, false) + +/* subqueue variants follow */ + +#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs) \ + ({ \ + struct netdev_queue *txq; \ + \ + txq = netdev_get_tx_queue(dev, idx); \ + netif_txq_try_stop(txq, get_desc, start_thrs); \ + }) + +#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \ + ({ \ + struct netdev_queue *txq; \ + \ + txq = netdev_get_tx_queue(dev, idx); \ + netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs); \ + }) + +#endif