On (25/03/27 16:20), Mika Westerberg wrote:
On (25/03/27 15:37), Mika Westerberg wrote:
Another possibility can be tb_cfg_request_sync():
tb_cfg_request_sync() tb_cfg_request() schedule_work(&req->work) -> tb_cfg_request_dequeue() tb_cfg_request_cancel() schedule_work(&req->work) -> tb_cfg_request_dequeue()
Not sure about this one because &req->work will only be scheduled once the second schedule_work() should not queue it again (as far as I can tell).
If the second schedule_work() happens after a timeout, that's what !wait_for_completion_timeout() does, then the first schedule_work() can already execute the work by that time, and then we can schedule the work again (but the request is already dequeued). Am I missing something?
schedule_work() does not schedule the work again if it is already scheduled.
Yes, if it's scheduled. If it's already executed then we can schedule again.
tb_cfg_request_sync() { tb_cfg_request() schedule_work() executes tb_cfg_request_dequeue wait_for_completion_timeout() schedule_work() executes tb_cfg_request_dequeue again }
I guess there can be enough delay (for whatever reason, not only wait_for_completion_timeout(), but maybe also preemption) between two schedule_work calls?
The 0xdead000000000122 deference is a LIST_POISON on x86_64, which is set explicitly in list_del(), so I'd say I'm fairly confident that we have a double list_del() in tb_cfg_request_dequeue().
Yes, I agree but since I have not seen any similar reports (sans what I saw ages ago), I would like to be sure the issue you see is actually fixed with the patch (and that there are no unexpected side-effects). ;-)
Let me see what I can do (we don't normally apply patches that were not in the corresponding subsystem tree).
In the meantime, do you have a subsystem/driver tree that is exposed to linux-next? If so, would be cool if you can pick up the patch so that it can get some extra testing via linux-next.