On Fri, Apr 11, 2025 at 10:49:52AM +0300, Cosmin Ratiu wrote:
This patch series was motivated by fixing a few bugs in the bonding driver related to xfrm state migration on device failover.
struct xfrm_dev_offload has two net_device pointers: dev and real_dev. The first one is the device the xfrm_state is offloaded on and the second one is used by the bonding driver to manage the underlying device xfrm_states are actually offloaded on. When bonding isn't used, the two pointers are the same.
This causes confusion in drivers: Which device pointer should they use? If they want to support bonding, they need to only use real_dev and never look at dev.
Furthermore, real_dev is used without proper locking from multiple code paths and changing it is dangerous. See commit [1] for example.
This patch series clears things out by removing all uses of real_dev from outside the bonding driver. Then, the bonding driver is refactored to fix a couple of long standing races and the original bug which motivated this patch series.
I'm still a bit skeptical about the bonding offloads itself as mentioned here:
https://lore.kernel.org/all/ZsbkdzvjVf3GiYHa@gauss3.secunet.de/
but I'm OK with this particular pachset.
How should we merge this patchset? It touches several subsystems, including xfrm. I'm fine merging it through the ipsec-next tree, but would be also ok if it goes though the net-next tree if that's easier.