Thanks for the feedback, Zong-Zhe!
On Thu, 21. Aug 04:01, Zong-Zhe Yang wrote:
Fedor Pchelkin pchelkin@ispras.ru wrote:
Though one solution that _works_ currently is to get rid of 'struct rtw89_tx_wait_info' and replace it with the only field it is used for - 'bool tx_done'. Then it can be stored at 'struct ieee80211_tx_info::status::status_driver_data' directly without the need for allocating an extra dynamic object and tracking its lifecycle. I didn't post this since then the structure won't be expandable for new fields and that's probably the reason for why it wasn't done in this manner initially.
With a busy waiting on tx waiting side ? If so, it would be unacceptable.
Ohh, I forgot about the need for async completion here. Nevermind that solution, sorry.
drivers/net/wireless/realtek/rtw89/core.c | 15 ++++++++--- drivers/net/wireless/realtek/rtw89/core.h | 32 ++++++++++++++--------- drivers/net/wireless/realtek/rtw89/pci.c | 6 +++-- 3 files changed, 36 insertions(+), 17 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c index 57590f5577a3..826540319027 100644 --- a/drivers/net/wireless/realtek/rtw89/core.c +++ b/drivers/net/wireless/realtek/rtw89/core.c @@ -1088,6 +1088,7 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk struct rtw89_tx_skb_data *skb_data = RTW89_TX_SKB_CB(skb); struct rtw89_tx_wait_info *wait; unsigned long time_left;
bool free_wait = true; int ret = 0; wait = kzalloc(sizeof(*wait), GFP_KERNEL); @@ -1097,7 +1098,8 @@ int
rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk }
init_completion(&wait->completion);
rcu_assign_pointer(skb_data->wait, wait);
spin_lock_init(&wait->owner_lock);
skb_data->wait = wait; rtw89_core_tx_kick_off(rtwdev, qsel); time_left = wait_for_completion_timeout(&wait->completion,
@@ -1107,8 +1109,15 @@ int rtw89_core_tx_kick_off_and_wait(struct rtw89_dev *rtwdev, struct sk_buff *sk else if (!wait->tx_done) ret = -EAGAIN;
rcu_assign_pointer(skb_data->wait, NULL);
kfree_rcu(wait, rcu_head);
Please consider the following. (moving "rcu_assign_pointer(skb_data->wait, NULL)" to be under "if (time_left == 0)")
There is still a tiny race window. Suppose wait_for_completion_timeout() exits with a timeout, so time_left is 0. If completing side goes on in parallel just after that, it has a chance to proceed and free skb_data before the below if (time_left == 0) fragment is executed.
if (time_left == 0) { rcu_assign_pointer(skb_data->wait, NULL); ret = -ETIMEDOUT; } else if (!wait->tx_done) { ret = -EAGAIN; } kfree_rcu(wait, rcu_head);
If completing side does run as expected (potential racing mentioned in this patch), there is no real need to assign NULL back.
Actually the race happens regardless of wait_for_completion_timeout() exit status, it's briefly mentioned in the race diagram inside commit message (but the diagram can show only one possible concurrency scenario). I agree this may be improved and described more explicitly though.
As for the patch itself, currently I can't see another way of fixing that other than introducing locks on both waiting and completing side.