On Wed, 2024-05-15 at 13:12 +0200, Marc Kleine-Budde wrote:
On 14.05.2024 15:34:01, Vitor Soares wrote:
+void mcp251xfd_tx_obj_write_sync(struct work_struct *ws) +{ + struct mcp251xfd_priv *priv = container_of(ws, struct mcp251xfd_priv, + tx_work); + struct mcp251xfd_tx_obj *tx_obj = priv->tx_work_obj; + struct mcp251xfd_tx_ring *tx_ring = priv->tx; + int err;
+ err = spi_sync(priv->spi, &tx_obj->msg); + if (err) + mcp251xfd_tx_failure_drop(priv, tx_ring, err);
+ priv->tx_work_obj = NULL;
Race condition:
- after spi_sync() the CAN frame is send
- after the TX complete IRQ the TX queue is restarted
- the xmit handler might get BUSY
- fill the tx_work_obj again
You can avoid the race condition by moving "priv->tx_work_obj = NULL;" in front of the "spi_sync();". Right?
+}
static int mcp251xfd_tx_obj_write(const struct mcp251xfd_priv *priv, struct mcp251xfd_tx_obj *tx_obj) { @@ -175,7 +210,7 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, if (can_dev_dropped_skb(ndev, skb)) return NETDEV_TX_OK; - if (mcp251xfd_tx_busy(priv, tx_ring)) + if (mcp251xfd_tx_busy(priv, tx_ring) || priv->tx_work_obj)
This should not happen, but better save than sorry.
As there is the race condition you mentioned above, on this condition: priv->tx_work_obj = tx_obj --> xmit will return NETDEV_TX_BUSY
or
priv->tx_work_obj = NULL --> It goes through the rest of the code or the workqueue may sleep after setting tx_work_obj to NULL. Should I use work_busy() here instead or do you have another suggestion?
Yes, introduce mcp251xfd_work_busy().
I'll implement it.
I'm not sure what happens if the xmit is called between the "priv->tx_work_obj = NULL" and the end of the work. Will queue_work() return false, as the queue is still running?
From the test I did so far, my understanding is the following:
If mcp251xfd_tx_obj_write doesn't fail, everything is OK.
if mcp251xfd_tx_obj_write fails with EBUSY - stop netif queue - fill the tx_work_obj - start worker
queue_work() doesn't return false even when work_busy() = true. - xmit handler return, and wait netif_wake_queue() - the work handler waits until the previous job gets done before starting the next one. - after the TX completes IRQ, the TX queue is restarted
If the TX queue is restarted immediately after queue_work(), tx_work_obj is filled, making the xmit handler return NETDEV_TX_BUSY.
The tests were done with a delay after priv->tx_work_obj = NULL.
Best regards, Vitor Soares