Fedor Pchelkin pchelkin@ispras.ru wrote:
[...]
drivers/net/wireless/realtek/rtw89/core.c | 15 ++++++++--- drivers/net/wireless/realtek/rtw89/core.h | 32 ++++++++++++++--------- drivers/net/wireless/realtek/rtw89/pci.c | 6 +++-- 3 files changed, 36 insertions(+), 17 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c index 57590f5577a3..826540319027 100644 --- a/drivers/net/wireless/realtek/rtw89/core.c +++ b/drivers/net/wireless/realtek/rtw89/core.c @@ -1088,6 +1088,7 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb); struct rtw89_tx_wait_info *wait; unsigned long time_left;
bool free_wait = true; int ret = 0; wait = kzalloc(sizeof(*wait), GFP_KERNEL); @@ -1097,7
+1098,8 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct
sk_buff *sk
} init_completion(&wait->completion);
rcu_assign_pointer(skb_data->wait, wait);
spin_lock_init(&wait->owner_lock);
skb_data->wait = wait; rtw89_core_tx_kick_off(rtwdev, qsel); time_left = wait_for_completion_timeout(&wait->completion,
@@ -1107,8 +1109,15 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk else if (!wait->tx_done) ret = -EAGAIN;
rcu_assign_pointer(skb_data->wait, NULL);
kfree_rcu(wait, rcu_head);
Please consider the following. (moving "rcu_assign_pointer(skb_data->wait, NULL)" to be under "if (time_left == 0)")
There is still a tiny race window. Suppose wait_for_completion_timeout() exits with a timeout, so time_left is 0. If completing side goes on in parallel just after that, it has a chance to proceed and free skb_data before the below if (time_left == 0) fragment is executed.
Okay, logically it sounds right.
if (time_left == 0) { rcu_assign_pointer(skb_data->wait, NULL); ret = -ETIMEDOUT; } else if (!wait->tx_done) { ret = -EAGAIN; } kfree_rcu(wait, rcu_head);
If completing side does run as expected (potential racing mentioned in this patch), there is no real need to assign NULL back.
Actually the race happens regardless of wait_for_completion_timeout() exit status, it's briefly mentioned in the race diagram inside commit message (but the diagram can show only one possible concurrency scenario). I agree this may be improved and described more explicitly though.
Will appreciate to see that in next version. Thanks.
As for the patch itself, currently I can't see another way of fixing that other than introducing locks on both waiting and completing side.
I took some time on thinking this. The following is another idea. The skb, which are sent by tx_wait_complete, are owned by driver. They don't come from stack, so we don't need to do ieee80211_tx_status_ni. Based on above, some rough points of the new idea are listed below.
1. Let rtw89_core_tx_wait_complete return true/false to indicate whether tx_wait or not
2. Add some new field into rtw89_tx_wait_info e.g. list_head, skb, finished
3. Add a list_head to rtwdev Add a work func doing things as for each wait in rtwdev->XXX_list: if !wait->finished: wait_for_completion() free wait->skb free wait
4. [waiting side] [completing side] wait_for_completion_timeout() ... ... /* make complete the last step */ ... if (rtw89_core_tx_wait_complete) ... return; ... // not assign NULL back to skb_data->wait if time_left != 0: wait-> finished = true wait->skb = skb add wait to rtwdev->XXX_list queue above work
Please help evaluate the new idea. Thank you.