Re: [PATCH 2/3] can: kvaser_pciefd: Fix echo_skb race conditions

9 Apr 2025


      Marc Kleine-Budde mkl@pengutronix.de writes:
...
On 31.03.2025 09:25:27, Axel Forsman wrote:
...
The functions kvaser_pciefd_start_xmit() and
kvaser_pciefd_handle_ack_packet() raced to stop/wake TX queues and
get/put echo skbs, as kvaser_pciefd_can->echo_lock was only ever taken
when transmitting. E.g., this caused the following error:
can_put_echo_skb: BUG! echo_skb 5 is occupied!


Instead, use the synchronization helpers in netdev_queues.h. As those
piggyback on BQL barriers, start updating in-flight packets and bytes
counts as well.
This looks like it does in the right direction. Using the
netif_subqueue_completed helpers is a great idea.
What usually works even better is to have 2 counters and a mask:

unsigned int tx_head, tx_tail
TXFIFO_DEPTH

The tx_head is incremented in the xmit function, tail is incremented in
the tx_done function.
There's no need to check how many buffers are free in the HW.
Have a look at the rockchip-canfd driver for an example.
An attempt was made to keep this a bugfix-only commit,
but I do agree it is nicer to maintain an in-flight counter
instead of querying the device.
(A mask is inapplicable,
as the size of the device TX FIFO is not necessarily a power-of-2,
though the packets have sequence numbers so it does not matter.)
I guess a write barrier would be needed before tx_tail is incremented,
for the xmit function not to see it before __can_get_echo_skb() has
cleared the echo skb slot? (Or else, can_put_echo_skb() errors if the
slot is already non-NULL.) It is not obvious to me how rockchip-canfd
handles this?
...
...

/*
* Without room for a new message, stop the queue until at least


* one successful transmit.


*/


if (!netif_subqueue_maybe_stop(netdev, 0, kvaser_pciefd_tx_avail(can), 1, 1))
return NETDEV_TX_BUSY;



Returning NETDEV_TX_BUSY is quite expensive, stop the queue at the end
of this function, if the buffers are full.
Will do.
...
...
@@ -1638,11 +1650,25 @@ static int kvaser_pciefd_read_buffer(struct kvaser_pciefd *pcie, int dma_buf)
 {
   int pos = 0;
   int res = 0;

unsigned int i;

do {
   	res = kvaser_pciefd_read_packet(pcie, &pos, dma_buf);
   } while (!res && pos > 0 && pos < KVASER_PCIEFD_DMA_SIZE);

for (i = 0; i < pcie->nr_channels; ++i) {
struct kvaser_pciefd_can *can = pcie->can[i];



if (!can->completed_tx_pkts)


	continue;


netif_subqueue_completed_wake(can->can.dev, 0,


			      can->completed_tx_pkts,


			      can->completed_tx_bytes,


			      kvaser_pciefd_tx_avail(can), 1);



You can do this as soon as as one packet is finished, if you want to
avoid too frequent wakeups, use threshold of more than 1.
I did that initially, but changed it to follow the advice of the
netdev_tx_completed_queue() docstring. Since the RX buffer for a single
IRQ may have multiple packets, would not BQL see incorrect intervals in
that case?
Thanks for the review
Axel Forsman

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 2/3] can: kvaser_pciefd: Fix echo_skb race conditions