On Thu, 29 Aug 2024 22:41:17 -0700 Mina Almasry wrote:
Thank you, I think the right fix here is to reacquire rtnl_lock before the `goto err_unbind;`, since err_unbind expects rtnl to be locked at this point.
FWIW it's best to keep the error path a mirror image of the success path, so I'd add a new label "err_relock" or something. But..
This could introduce a weird edge case where we drop rtnl_lock, then find out genlmsg_reply failed, then reacquire rtnl_lock to do the cleanup. I can't think of anything that would horribly break if we do that, but I may be missing something. In theory we could race with a dmabuf unbind call happening in parallel.
If we can't reacquire rtnl_lock to do the cleanup, I think I need to revert back to doing genlmsg_reply inside of rtnl_lock, and dropping the lock before we return from the function.
..indeed, best to keep it atomic. So let's hold rtnl_lock longer. genlmsg_reply() shouldn't block, AFAIU.
BTW CI is quite behind but Yunsheng ignored it and reposted his "refactor" which is going to take us another 10 hours back, so whatever, just post v24 when you're ready...