On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
Re-locking doesn't look great, glancing at the code I don't see any obvious better workarounds. Easiest fix would be to don't let the drivers sleep in the callbacks and then we can go back to a spin lock. Maybe nvidia people have better ideas, I'm not familiar with this offload.
I don't know how to disable bonding sleeping since we use mutex_lock now. Hi Jianbo, do you have any idea?
I think we should allow drivers to sleep in the callbacks. So, maybe it's better to move driver's xdo_dev_state_delete out of state's spin lock.
I just check the code, xfrm_dev_state_delete() and later dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x checks. Can we really move it out of spin lock from xfrm_state_delete()
I tried to move the mutex lock code to a work queue, but found we need to check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond ipsec gc.
So either we add a new lock for xfrm_state, or we need to unlock spin lock in bonding bond_ipsec_del_sa().
Cc IPsec experts to see if they have any comments.
Background: The xfrm_dev_state_delete() in xfrm_state_delete() is protected by spin lock. But the driver delete ops dev->xfrmdev_ops->xdo_dev_state_delete(x) may sleep, e.g. bond_ipsec_del_sa(). What we should deal with this issue?
Thanks Hangbin