On 14.05.2024 15:34:01, Vitor Soares wrote:
+void mcp251xfd_tx_obj_write_sync(struct work_struct *ws) +{
struct mcp251xfd_priv *priv = container_of(ws, struct
mcp251xfd_priv,
tx_work);
struct mcp251xfd_tx_obj *tx_obj = priv->tx_work_obj;
struct mcp251xfd_tx_ring *tx_ring = priv->tx;
int err;
err = spi_sync(priv->spi, &tx_obj->msg);
if (err)
mcp251xfd_tx_failure_drop(priv, tx_ring, err);
priv->tx_work_obj = NULL;
Race condition:
- after spi_sync() the CAN frame is send
- after the TX complete IRQ the TX queue is restarted
- the xmit handler might get BUSY
- fill the tx_work_obj again
You can avoid the race condition by moving "priv->tx_work_obj = NULL;" in front of the "spi_sync();". Right?
+}
static int mcp251xfd_tx_obj_write(const struct mcp251xfd_priv *priv, struct mcp251xfd_tx_obj *tx_obj) { @@ -175,7 +210,7 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, if (can_dev_dropped_skb(ndev, skb)) return NETDEV_TX_OK;
if (mcp251xfd_tx_busy(priv, tx_ring))
if (mcp251xfd_tx_busy(priv, tx_ring) || priv->tx_work_obj)
This should not happen, but better save than sorry.
As there is the race condition you mentioned above, on this condition: priv->tx_work_obj = tx_obj --> xmit will return NETDEV_TX_BUSY
or
priv->tx_work_obj = NULL --> It goes through the rest of the code or the workqueue may sleep after setting tx_work_obj to NULL. Should I use work_busy() here instead or do you have another suggestion?
Yes, introduce mcp251xfd_work_busy().
I'm not sure what happens if the xmit is called between the "priv->tx_work_obj = NULL" and the end of the work. Will queue_work() return false, as the queue is still running?
Everything else is clear to me and I'm addressing it for the v6.
Marc