This patch series was motivated by fixing a few bugs in the bonding driver related to xfrm state migration on device failover.
struct xfrm_dev_offload has two net_device pointers: dev and real_dev. The first one is the device the xfrm_state is offloaded on and the second one is used by the bonding driver to manage the underlying device xfrm_states are actually offloaded on. When bonding isn't used, the two pointers are the same.
This causes confusion in drivers: Which device pointer should they use? If they want to support bonding, they need to only use real_dev and never look at dev.
Furthermore, real_dev is used without proper locking from multiple code paths and changing it is dangerous. See commit [1] for example.
This patch series clears things out by removing all uses of real_dev from outside the bonding driver. Then, the bonding driver is refactored to fix a couple of long standing races and the original bug which motivated this patch series.
[1] commit f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Cosmin Ratiu (6): Cleaning up unnecessary uses of xso.real_dev: net/mlx5: Avoid using xso.real_dev unnecessarily xfrm: Use xdo.dev instead of xdo.real_dev xfrm: Remove unneeded device check from validate_xmit_xfrm Refactoring device operations to get an explicit device pointer: xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Fixing a bonding xfrm state migration bug: bonding: Mark active offloaded xfrm_states Fixing long standing races in bonding: bonding: Fix multiple long standing offload races
Documentation/networking/xfrm_device.rst | 10 +- drivers/net/bonding/bond_main.c | 93 +++++++++++-------- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 ++-- .../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++-- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 40 ++++---- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 20 ++-- .../marvell/octeontx2/nic/cn10k_ipsec.c | 18 ++-- .../mellanox/mlx5/core/en_accel/ipsec.c | 28 +++--- .../mellanox/mlx5/core/en_accel/ipsec.h | 1 + .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +-- drivers/net/netdevsim/ipsec.c | 15 ++- include/linux/netdevice.h | 10 +- include/net/xfrm.h | 8 ++ net/xfrm/xfrm_device.c | 13 +-- net/xfrm/xfrm_state.c | 16 ++-- 15 files changed, 175 insertions(+), 146 deletions(-)
xso.real_dev is the active device of an offloaded xfrm state and is managed by bonding. As such, it's subject to change when states are migrated to a new device. Using it in places other than offloading/unoffloading the states is risky.
This commit saves the device into the driver-specific struct mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and mlx5e_ipsec_netevent_event() to make use of it.
Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate that correct net locks are held. But in a bonding config, the net of the master device is the same as the underlying devices, and the net is already a local var, so use that instead.
The only remaining references to xso.real_dev are now in the .xdo_dev_state_add() / .xdo_dev_state_delete() path.
Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 16 +++++----------- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 + 2 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 2dd842aac6fc..626e525c0f0d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -259,8 +259,7 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, struct mlx5_accel_esp_xfrm_attrs *attrs) { struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry); - struct xfrm_state *x = sa_entry->x; - struct net_device *netdev; + struct net_device *netdev = sa_entry->dev; struct neighbour *n; u8 addr[ETH_ALEN]; const void *pkey; @@ -270,8 +269,6 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, attrs->type != XFRM_DEV_OFFLOAD_PACKET) return;
- netdev = x->xso.real_dev; - mlx5_query_mac_address(mdev, addr); switch (attrs->dir) { case XFRM_DEV_OFFLOAD_IN: @@ -713,6 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, return -ENOMEM;
sa_entry->x = x; + sa_entry->dev = netdev; sa_entry->ipsec = ipsec; /* Check if this SA is originated from acquire flow temporary SA */ if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ) @@ -855,8 +853,6 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb, struct mlx5e_ipsec_sa_entry *sa_entry; struct mlx5e_ipsec *ipsec; struct neighbour *n = ptr; - struct net_device *netdev; - struct xfrm_state *x; unsigned long idx;
if (event != NETEVENT_NEIGH_UPDATE || !(n->nud_state & NUD_VALID)) @@ -876,11 +872,9 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb, continue; }
- x = sa_entry->x; - netdev = x->xso.real_dev; data = sa_entry->work->data;
- neigh_ha_snapshot(data->addr, n, netdev); + neigh_ha_snapshot(data->addr, n, sa_entry->dev); queue_work(ipsec->wq, &sa_entry->work->work); }
@@ -996,8 +990,8 @@ static void mlx5e_xfrm_update_stats(struct xfrm_state *x) size_t headers;
lockdep_assert(lockdep_is_held(&x->lock) || - lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_cfg_mutex) || - lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_state_lock)); + lockdep_is_held(&net->xfrm.xfrm_cfg_mutex) || + lockdep_is_held(&net->xfrm.xfrm_state_lock));
if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ) return; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h index a63c2289f8af..ffcd0cdeb775 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h @@ -274,6 +274,7 @@ struct mlx5e_ipsec_limits { struct mlx5e_ipsec_sa_entry { struct mlx5e_ipsec_esn_state esn_state; struct xfrm_state *x; + struct net_device *dev; struct mlx5e_ipsec *ipsec; struct mlx5_accel_esp_xfrm_attrs attrs; void (*set_iv_op)(struct sk_buff *skb, struct xfrm_state *x,
On 07/04/2025 16:35, Cosmin Ratiu wrote:
xso.real_dev is the active device of an offloaded xfrm state and is managed by bonding. As such, it's subject to change when states are migrated to a new device. Using it in places other than offloading/unoffloading the states is risky.
This commit saves the device into the driver-specific struct mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and mlx5e_ipsec_netevent_event() to make use of it.
Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate that correct net locks are held. But in a bonding config, the net of the master device is the same as the underlying devices, and the net is already a local var, so use that instead.
The only remaining references to xso.real_dev are now in the .xdo_dev_state_add() / .xdo_dev_state_delete() path.
Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com
Acked-by: Tariq Toukan tariqt@nvidia.com
Thanks.
The policy offload struct was reused from the state offload and real_dev was copied from dev, but it was never set to anything else. Simplify the code by always using xdo.dev for policies.
Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 2 +- net/xfrm/xfrm_device.c | 2 -- net/xfrm/xfrm_state.c | 2 -- 3 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 626e525c0f0d..0dfbbe21936f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -1164,7 +1164,7 @@ mlx5e_ipsec_build_accel_pol_attrs(struct mlx5e_ipsec_pol_entry *pol_entry, static int mlx5e_xfrm_add_policy(struct xfrm_policy *x, struct netlink_ext_ack *extack) { - struct net_device *netdev = x->xdo.real_dev; + struct net_device *netdev = x->xdo.dev; struct mlx5e_ipsec_pol_entry *pol_entry; struct mlx5e_priv *priv; int err; diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index d62f76161d83..4f4165ff738d 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -378,7 +378,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
xdo->dev = dev; netdev_tracker_alloc(dev, &xdo->dev_tracker, GFP_ATOMIC); - xdo->real_dev = dev; xdo->type = XFRM_DEV_OFFLOAD_PACKET; switch (dir) { case XFRM_POLICY_IN: @@ -400,7 +399,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp, err = dev->xfrmdev_ops->xdo_dev_policy_add(xp, extack); if (err) { xdo->dev = NULL; - xdo->real_dev = NULL; xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; xdo->dir = 0; netdev_put(dev, &xdo->dev_tracker); diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index d896c3fefb07..33d1f5679e8b 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1555,7 +1555,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, xso->type = XFRM_DEV_OFFLOAD_PACKET; xso->dir = xdo->dir; xso->dev = xdo->dev; - xso->real_dev = xdo->real_dev; xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ; netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC); error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL); @@ -1563,7 +1562,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, xso->dir = 0; netdev_put(xso->dev, &xso->dev_tracker); xso->dev = NULL; - xso->real_dev = NULL; xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; x->km.state = XFRM_STATE_DEAD; to_put = x;
validate_xmit_xfrm checks whether a packet already passed through it on the master device (xso.dev) and skips processing the skb again on the slave device (xso.real_dev).
This check was added in commit [1] to avoid tx packets on a bond device pass through xfrm twice and get two sets of headers, but the check was soon obsoleted by commit [2], which was added around the same time to fix a similar but unrelated problem. Commit [3] set XFRM_XMIT only when packets are hw offloaded.
xso.dev is usually equal to xso.real_dev, unless bonding is used, in which case the bonding driver uses xso.real_dev to manage offloaded xfrm states.
Since commit [3], the check added in commit [1] is unused on all cases, since packets going through validate_xmit_xfrm twice bail out on the check added in commit [2]. Here's a breakdown of relevant scenarios:
1. ESP offload off: validate_xmit_xfrm returns early on !xo. 2. ESP offload on, no bond: skb->dev == xso.real_dev == xso.dev. 3. ESP offload on, bond, xs on bond dev: 1st pass adds XFRM_XMIT, 2nd pass returns early on XFRM_XMIT. 3. ESP offload on, bond, xs on slave dev: 1st pass returns early on !xo, 2nd pass adds XFRM_XMIT. 4. ESP offload on, bond, xs on both bond AND slave dev: only 1 offload possible in secpath. Either 1st pass adds XFRM_XMIT and 2nd pass returns early on XFRM_XMIT, or 1st pass is sw and returns early on !xo. 6. ESP offload on, crypto fallback triggered in esp_xmit/esp6_xmit: 1st pass does sw crypto & secpath_reset, 2nd pass returns on !xo.
This commit removes the unnecessary check, so xso.real_dev becomes what it is in practice: a private field managed by bonding driver. The check immediately below that can be simplified as well.
[1] commit 272c2330adc9 ("xfrm: bail early on slave pass over skb") [2] commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") [3] commit c7dbf4c08868 ("xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets")
Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- net/xfrm/xfrm_device.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 4f4165ff738d..0be5f7ffd019 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -145,10 +145,6 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur return NULL; }
- /* This skb was already validated on the upper/virtual dev */ - if ((x->xso.dev != dev) && (x->xso.real_dev == dev)) - return skb; - local_irq_save(flags); sd = this_cpu_ptr(&softnet_data); err = !skb_queue_empty(&sd->xfrm_backlog); @@ -159,8 +155,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur return skb; }
- if (skb_is_gso(skb) && (unlikely(x->xso.dev != dev) || - unlikely(xmit_xfrm_check_overflow(skb)))) { + if (skb_is_gso(skb) && unlikely(xmit_xfrm_check_overflow(skb))) { struct sk_buff *segs;
/* Packet got rerouted, fixup features and segment it. */
Previously, device driver IPSec offload implementations would fall into two categories: 1. Those that used xso.dev to determine the offload device. 2. Those that used xso.real_dev to determine the offload device.
The first category didn't work with bonding while the second did. In a non-bonding setup the two pointers are the same.
This commit adds explicit pointers for the offload netdevice to .xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free() which eliminates the confusion and allows drivers from the first category to work with bonding.
xso.real_dev now becomes a private pointer managed by the bonding driver.
Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- Documentation/networking/xfrm_device.rst | 10 +++-- drivers/net/bonding/bond_main.c | 31 +++++++------- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 +++++----- .../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++++++--- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 40 ++++++++++--------- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 20 ++++++---- .../marvell/octeontx2/nic/cn10k_ipsec.c | 18 ++++----- .../mellanox/mlx5/core/en_accel/ipsec.c | 12 +++--- .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +++-- drivers/net/netdevsim/ipsec.c | 15 ++++--- include/linux/netdevice.h | 10 +++-- include/net/xfrm.h | 8 ++++ net/xfrm/xfrm_device.c | 4 +- net/xfrm/xfrm_state.c | 14 ++++--- 14 files changed, 132 insertions(+), 99 deletions(-)
diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index 7f24c09f2694..122204da0fff 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -65,9 +65,13 @@ Callbacks to implement /* from include/linux/netdevice.h */ struct xfrmdev_ops { /* Crypto and Packet offload callbacks */ - int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); - void (*xdo_dev_state_delete) (struct xfrm_state *x); - void (*xdo_dev_state_free) (struct xfrm_state *x); + int (*xdo_dev_state_add)(struct net_device *dev, + struct xfrm_state *x, + struct netlink_ext_ack *extack); + void (*xdo_dev_state_delete)(struct net_device *dev, + struct xfrm_state *x); + void (*xdo_dev_state_free)(struct net_device *dev, + struct xfrm_state *x); bool (*xdo_dev_offload_ok) (struct sk_buff *skb, struct xfrm_state *x); void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 950d8e4d86f8..2ffde8f672c2 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -456,10 +456,10 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs) * @xs: pointer to transformer state struct * @extack: extack point to fill failure reason **/ -static int bond_ipsec_add_sa(struct xfrm_state *xs, +static int bond_ipsec_add_sa(struct net_device *bond_dev, + struct xfrm_state *xs, struct netlink_ext_ack *extack) { - struct net_device *bond_dev = xs->xso.dev; struct net_device *real_dev; netdevice_tracker tracker; struct bond_ipsec *ipsec; @@ -496,7 +496,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs, }
xs->xso.real_dev = real_dev; - err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack); + err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); if (!err) { ipsec->xs = xs; INIT_LIST_HEAD(&ipsec->list); @@ -540,7 +540,8 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) continue;
ipsec->xs->xso.real_dev = real_dev; - if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) { + if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, + ipsec->xs, NULL)) { slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); ipsec->xs->xso.real_dev = NULL; } @@ -553,9 +554,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) * bond_ipsec_del_sa - clear out this specific SA * @xs: pointer to transformer state struct **/ -static void bond_ipsec_del_sa(struct xfrm_state *xs) +static void bond_ipsec_del_sa(struct net_device *bond_dev, + struct xfrm_state *xs) { - struct net_device *bond_dev = xs->xso.dev; struct net_device *real_dev; netdevice_tracker tracker; struct bond_ipsec *ipsec; @@ -587,7 +588,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) goto out; }
- real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs); out: netdev_put(real_dev, &tracker); mutex_lock(&bond->ipsec_lock); @@ -624,18 +625,20 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__); - } else { - real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs); - if (real_dev->xfrmdev_ops->xdo_dev_state_free) - real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs); + continue; } + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, + ipsec->xs); + if (real_dev->xfrmdev_ops->xdo_dev_state_free) + real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, + ipsec->xs); } mutex_unlock(&bond->ipsec_lock); }
-static void bond_ipsec_free_sa(struct xfrm_state *xs) +static void bond_ipsec_free_sa(struct net_device *bond_dev, + struct xfrm_state *xs) { - struct net_device *bond_dev = xs->xso.dev; struct net_device *real_dev; netdevice_tracker tracker; struct bonding *bond; @@ -661,7 +664,7 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
if (real_dev && real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) - real_dev->xfrmdev_ops->xdo_dev_state_free(xs); + real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs); out: netdev_put(real_dev, &tracker); } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 551c279dc14b..51395c96b2e9 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -6480,10 +6480,11 @@ static const struct tlsdev_ops cxgb4_ktls_ops = {
#if IS_ENABLED(CONFIG_CHELSIO_IPSEC_INLINE)
-static int cxgb4_xfrm_add_state(struct xfrm_state *x, +static int cxgb4_xfrm_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { - struct adapter *adap = netdev2adap(x->xso.dev); + struct adapter *adap = netdev2adap(dev); int ret;
if (!mutex_trylock(&uld_mutex)) { @@ -6494,7 +6495,8 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x, if (ret) goto out_unlock;
- ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(x, extack); + ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(dev, x, + extack);
out_unlock: mutex_unlock(&uld_mutex); @@ -6502,9 +6504,9 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x, return ret; }
-static void cxgb4_xfrm_del_state(struct xfrm_state *x) +static void cxgb4_xfrm_del_state(struct net_device *dev, struct xfrm_state *x) { - struct adapter *adap = netdev2adap(x->xso.dev); + struct adapter *adap = netdev2adap(dev);
if (!mutex_trylock(&uld_mutex)) { dev_dbg(adap->pdev_dev, @@ -6514,15 +6516,15 @@ static void cxgb4_xfrm_del_state(struct xfrm_state *x) if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS)) goto out_unlock;
- adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(x); + adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(dev, x);
out_unlock: mutex_unlock(&uld_mutex); }
-static void cxgb4_xfrm_free_state(struct xfrm_state *x) +static void cxgb4_xfrm_free_state(struct net_device *dev, struct xfrm_state *x) { - struct adapter *adap = netdev2adap(x->xso.dev); + struct adapter *adap = netdev2adap(dev);
if (!mutex_trylock(&uld_mutex)) { dev_dbg(adap->pdev_dev, @@ -6532,7 +6534,7 @@ static void cxgb4_xfrm_free_state(struct xfrm_state *x) if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS)) goto out_unlock;
- adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(x); + adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(dev, x);
out_unlock: mutex_unlock(&uld_mutex); diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c index baba96883f48..ecd9a0bd5e18 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c @@ -75,9 +75,12 @@ static int ch_ipsec_uld_state_change(void *handle, enum cxgb4_state new_state); static int ch_ipsec_xmit(struct sk_buff *skb, struct net_device *dev); static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop); static void ch_ipsec_advance_esn_state(struct xfrm_state *x); -static void ch_ipsec_xfrm_free_state(struct xfrm_state *x); -static void ch_ipsec_xfrm_del_state(struct xfrm_state *x); -static int ch_ipsec_xfrm_add_state(struct xfrm_state *x, +static void ch_ipsec_xfrm_free_state(struct net_device *dev, + struct xfrm_state *x); +static void ch_ipsec_xfrm_del_state(struct net_device *dev, + struct xfrm_state *x); +static int ch_ipsec_xfrm_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack);
static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = { @@ -223,7 +226,8 @@ static int ch_ipsec_setkey(struct xfrm_state *x, * returns 0 on success, negative error if failed to send message to FPGA * positive error if FPGA returned a bad response */ -static int ch_ipsec_xfrm_add_state(struct xfrm_state *x, +static int ch_ipsec_xfrm_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { struct ipsec_sa_entry *sa_entry; @@ -302,14 +306,16 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x, return res; }
-static void ch_ipsec_xfrm_del_state(struct xfrm_state *x) +static void ch_ipsec_xfrm_del_state(struct net_device *dev, + struct xfrm_state *x) { /* do nothing */ if (!x->xso.offload_handle) return; }
-static void ch_ipsec_xfrm_free_state(struct xfrm_state *x) +static void ch_ipsec_xfrm_free_state(struct net_device *dev, + struct xfrm_state *x) { struct ipsec_sa_entry *sa_entry;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index 07ea1954a276..bfb26a78da85 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -9,7 +9,7 @@ #define IXGBE_IPSEC_KEY_BITS 160 static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
-static void ixgbe_ipsec_del_sa(struct xfrm_state *xs); +static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs);
/** * ixgbe_ipsec_set_tx_sa - set the Tx SA registers @@ -321,7 +321,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
if (r->used) { if (r->mode & IXGBE_RXTXMOD_VF) - ixgbe_ipsec_del_sa(r->xs); + ixgbe_ipsec_del_sa(adapter->netdev, r->xs); else ixgbe_ipsec_set_rx_sa(hw, i, r->xs->id.spi, r->key, r->salt, @@ -330,7 +330,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
if (t->used) { if (t->mode & IXGBE_RXTXMOD_VF) - ixgbe_ipsec_del_sa(t->xs); + ixgbe_ipsec_del_sa(adapter->netdev, t->xs); else ixgbe_ipsec_set_tx_sa(hw, i, t->key, t->salt); } @@ -424,10 +424,10 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec, * This copies the protocol keys and salt to our own data tables. The * 82599 family only supports the one algorithm. **/ -static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs, +static int ixgbe_ipsec_parse_proto_keys(struct net_device *dev, + struct xfrm_state *xs, u32 *mykey, u32 *mysalt) { - struct net_device *dev = xs->xso.real_dev; unsigned char *key_data; char *alg_name = NULL; int key_len; @@ -473,11 +473,12 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
/** * ixgbe_ipsec_check_mgmt_ip - make sure there is no clash with mgmt IP filters + * @dev: pointer to net device * @xs: pointer to transformer state struct **/ -static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs) +static int ixgbe_ipsec_check_mgmt_ip(struct net_device *dev, + struct xfrm_state *xs) { - struct net_device *dev = xs->xso.real_dev; struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_hw *hw = &adapter->hw; u32 mfval, manc, reg; @@ -556,13 +557,14 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
/** * ixgbe_ipsec_add_sa - program device with a security association + * @dev: pointer to device to program * @xs: pointer to transformer state struct * @extack: extack point to fill failure reason **/ -static int ixgbe_ipsec_add_sa(struct xfrm_state *xs, +static int ixgbe_ipsec_add_sa(struct net_device *dev, + struct xfrm_state *xs, struct netlink_ext_ack *extack) { - struct net_device *dev = xs->xso.real_dev; struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_ipsec *ipsec = adapter->ipsec; struct ixgbe_hw *hw = &adapter->hw; @@ -581,7 +583,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs, return -EINVAL; }
- if (ixgbe_ipsec_check_mgmt_ip(xs)) { + if (ixgbe_ipsec_check_mgmt_ip(dev, xs)) { NL_SET_ERR_MSG_MOD(extack, "IPsec IP addr clash with mgmt filters"); return -EINVAL; } @@ -615,7 +617,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs, rsa.decrypt = xs->ealg || xs->aead;
/* get the key and salt */ - ret = ixgbe_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt); + ret = ixgbe_ipsec_parse_proto_keys(dev, xs, rsa.key, &rsa.salt); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table"); return ret; @@ -724,7 +726,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs, if (xs->id.proto & IPPROTO_ESP) tsa.encrypt = xs->ealg || xs->aead;
- ret = ixgbe_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt); + ret = ixgbe_ipsec_parse_proto_keys(dev, xs, tsa.key, &tsa.salt); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table"); memset(&tsa, 0, sizeof(tsa)); @@ -752,11 +754,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
/** * ixgbe_ipsec_del_sa - clear out this specific SA + * @dev: pointer to device to program * @xs: pointer to transformer state struct **/ -static void ixgbe_ipsec_del_sa(struct xfrm_state *xs) +static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs) { - struct net_device *dev = xs->xso.real_dev; struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_ipsec *ipsec = adapter->ipsec; struct ixgbe_hw *hw = &adapter->hw; @@ -841,7 +843,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf) continue; if (ipsec->rx_tbl[i].mode & IXGBE_RXTXMOD_VF && ipsec->rx_tbl[i].vf == vf) - ixgbe_ipsec_del_sa(ipsec->rx_tbl[i].xs); + ixgbe_ipsec_del_sa(adapter->netdev, + ipsec->rx_tbl[i].xs); }
/* search tx sa table */ @@ -850,7 +853,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf) continue; if (ipsec->tx_tbl[i].mode & IXGBE_RXTXMOD_VF && ipsec->tx_tbl[i].vf == vf) - ixgbe_ipsec_del_sa(ipsec->tx_tbl[i].xs); + ixgbe_ipsec_del_sa(adapter->netdev, + ipsec->tx_tbl[i].xs); } }
@@ -930,7 +934,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) memcpy(xs->aead->alg_name, aes_gcm_name, sizeof(aes_gcm_name));
/* set up the HW offload */ - err = ixgbe_ipsec_add_sa(xs, NULL); + err = ixgbe_ipsec_add_sa(adapter->netdev, xs, NULL); if (err) goto err_aead;
@@ -1034,7 +1038,7 @@ int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) xs = ipsec->tx_tbl[sa_idx].xs; }
- ixgbe_ipsec_del_sa(xs); + ixgbe_ipsec_del_sa(adapter->netdev, xs);
/* remove the xs that was made-up in the add request */ kfree_sensitive(xs); diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c index 8ba037e3d9c2..70002c89c884 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c +++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c @@ -208,10 +208,10 @@ struct xfrm_state *ixgbevf_ipsec_find_rx_state(struct ixgbevf_ipsec *ipsec, * This copies the protocol keys and salt to our own data tables. The * 82599 family only supports the one algorithm. **/ -static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs, +static int ixgbevf_ipsec_parse_proto_keys(struct net_device *dev, + struct xfrm_state *xs, u32 *mykey, u32 *mysalt) { - struct net_device *dev = xs->xso.real_dev; unsigned char *key_data; char *alg_name = NULL; int key_len; @@ -256,13 +256,14 @@ static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs,
/** * ixgbevf_ipsec_add_sa - program device with a security association + * @dev: pointer to net device to program * @xs: pointer to transformer state struct * @extack: extack point to fill failure reason **/ -static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs, +static int ixgbevf_ipsec_add_sa(struct net_device *dev, + struct xfrm_state *xs, struct netlink_ext_ack *extack) { - struct net_device *dev = xs->xso.real_dev; struct ixgbevf_adapter *adapter; struct ixgbevf_ipsec *ipsec; u16 sa_idx; @@ -310,7 +311,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs, rsa.decrypt = xs->ealg || xs->aead;
/* get the key and salt */ - ret = ixgbevf_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt); + ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, rsa.key, + &rsa.salt); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table"); return ret; @@ -363,7 +365,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs, if (xs->id.proto & IPPROTO_ESP) tsa.encrypt = xs->ealg || xs->aead;
- ret = ixgbevf_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt); + ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, tsa.key, + &tsa.salt); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table"); memset(&tsa, 0, sizeof(tsa)); @@ -389,10 +392,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs, /** * ixgbevf_ipsec_del_sa - clear out this specific SA * @xs: pointer to transformer state struct + * @dev: pointer to net device to program **/ -static void ixgbevf_ipsec_del_sa(struct xfrm_state *xs) +static void ixgbevf_ipsec_del_sa(struct net_device *dev, + struct xfrm_state *xs) { - struct net_device *dev = xs->xso.real_dev; struct ixgbevf_adapter *adapter; struct ixgbevf_ipsec *ipsec; u16 sa_idx; diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c index fc59e50bafce..a6500e3673f2 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c @@ -663,10 +663,10 @@ static int cn10k_ipsec_inb_add_state(struct xfrm_state *x, return -EOPNOTSUPP; }
-static int cn10k_ipsec_outb_add_state(struct xfrm_state *x, +static int cn10k_ipsec_outb_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { - struct net_device *netdev = x->xso.dev; struct cn10k_tx_sa_s *sa_entry; struct qmem *sa_info; struct otx2_nic *pf; @@ -676,7 +676,7 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x, if (err) return err;
- pf = netdev_priv(netdev); + pf = netdev_priv(dev);
err = qmem_alloc(pf->dev, &sa_info, pf->ipsec.sa_size, OTX2_ALIGN); if (err) @@ -700,18 +700,18 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x, return 0; }
-static int cn10k_ipsec_add_state(struct xfrm_state *x, +static int cn10k_ipsec_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { if (x->xso.dir == XFRM_DEV_OFFLOAD_IN) return cn10k_ipsec_inb_add_state(x, extack); else - return cn10k_ipsec_outb_add_state(x, extack); + return cn10k_ipsec_outb_add_state(dev, x, extack); }
-static void cn10k_ipsec_del_state(struct xfrm_state *x) +static void cn10k_ipsec_del_state(struct net_device *dev, struct xfrm_state *x) { - struct net_device *netdev = x->xso.dev; struct cn10k_tx_sa_s *sa_entry; struct qmem *sa_info; struct otx2_nic *pf; @@ -720,7 +720,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x) if (x->xso.dir == XFRM_DEV_OFFLOAD_IN) return;
- pf = netdev_priv(netdev); + pf = netdev_priv(dev);
sa_info = (struct qmem *)x->xso.offload_handle; sa_entry = (struct cn10k_tx_sa_s *)sa_info->base; @@ -732,7 +732,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
err = cn10k_outb_write_sa(pf, sa_info); if (err) - netdev_err(netdev, "Error (%d) deleting SA\n", err); + netdev_err(dev, "Error (%d) deleting SA\n", err);
x->xso.offload_handle = 0; qmem_free(pf->dev, sa_info); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 0dfbbe21936f..77f61cd28a79 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -689,17 +689,17 @@ static int mlx5e_ipsec_create_dwork(struct mlx5e_ipsec_sa_entry *sa_entry) return 0; }
-static int mlx5e_xfrm_add_state(struct xfrm_state *x, +static int mlx5e_xfrm_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { struct mlx5e_ipsec_sa_entry *sa_entry = NULL; - struct net_device *netdev = x->xso.real_dev; struct mlx5e_ipsec *ipsec; struct mlx5e_priv *priv; gfp_t gfp; int err;
- priv = netdev_priv(netdev); + priv = netdev_priv(dev); if (!priv->ipsec) return -EOPNOTSUPP;
@@ -710,7 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, return -ENOMEM;
sa_entry->x = x; - sa_entry->dev = netdev; + sa_entry->dev = dev; sa_entry->ipsec = ipsec; /* Check if this SA is originated from acquire flow temporary SA */ if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ) @@ -807,7 +807,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, return err; }
-static void mlx5e_xfrm_del_state(struct xfrm_state *x) +static void mlx5e_xfrm_del_state(struct net_device *dev, struct xfrm_state *x) { struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x); struct mlx5e_ipsec *ipsec = sa_entry->ipsec; @@ -820,7 +820,7 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x) WARN_ON(old != sa_entry); }
-static void mlx5e_xfrm_free_state(struct xfrm_state *x) +static void mlx5e_xfrm_free_state(struct net_device *dev, struct xfrm_state *x) { struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x); struct mlx5e_ipsec *ipsec = sa_entry->ipsec; diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c index 671af5d4c5d2..9e7c285eaa6b 100644 --- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c +++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c @@ -266,17 +266,17 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len) } }
-static int nfp_net_xfrm_add_state(struct xfrm_state *x, +static int nfp_net_xfrm_add_state(struct net_device *dev, + struct xfrm_state *x, struct netlink_ext_ack *extack) { - struct net_device *netdev = x->xso.real_dev; struct nfp_ipsec_cfg_mssg msg = {}; int i, key_len, trunc_len, err = 0; struct nfp_ipsec_cfg_add_sa *cfg; struct nfp_net *nn; unsigned int saidx;
- nn = netdev_priv(netdev); + nn = netdev_priv(dev); cfg = &msg.cfg_add_sa;
/* General */ @@ -546,17 +546,16 @@ static int nfp_net_xfrm_add_state(struct xfrm_state *x, return 0; }
-static void nfp_net_xfrm_del_state(struct xfrm_state *x) +static void nfp_net_xfrm_del_state(struct net_device *dev, struct xfrm_state *x) { struct nfp_ipsec_cfg_mssg msg = { .cmd = NFP_IPSEC_CFG_MSSG_INV_SA, .sa_idx = x->xso.offload_handle - 1, }; - struct net_device *netdev = x->xso.real_dev; struct nfp_net *nn; int err;
- nn = netdev_priv(netdev); + nn = netdev_priv(dev); err = nfp_net_sched_mbox_amsg_work(nn, NFP_NET_CFG_MBOX_CMD_IPSEC, &msg, sizeof(msg), nfp_net_ipsec_cfg); if (err) diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c index d88bdb9a1717..47cdee5577d4 100644 --- a/drivers/net/netdevsim/ipsec.c +++ b/drivers/net/netdevsim/ipsec.c @@ -85,11 +85,11 @@ static int nsim_ipsec_find_empty_idx(struct nsim_ipsec *ipsec) return -ENOSPC; }
-static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs, +static int nsim_ipsec_parse_proto_keys(struct net_device *dev, + struct xfrm_state *xs, u32 *mykey, u32 *mysalt) { const char aes_gcm_name[] = "rfc4106(gcm(aes))"; - struct net_device *dev = xs->xso.real_dev; unsigned char *key_data; char *alg_name = NULL; int key_len; @@ -129,17 +129,16 @@ static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs, return 0; }
-static int nsim_ipsec_add_sa(struct xfrm_state *xs, +static int nsim_ipsec_add_sa(struct net_device *dev, + struct xfrm_state *xs, struct netlink_ext_ack *extack) { struct nsim_ipsec *ipsec; - struct net_device *dev; struct netdevsim *ns; struct nsim_sa sa; u16 sa_idx; int ret;
- dev = xs->xso.real_dev; ns = netdev_priv(dev); ipsec = &ns->ipsec;
@@ -174,7 +173,7 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs, sa.crypt = xs->ealg || xs->aead;
/* get the key and salt */ - ret = nsim_ipsec_parse_proto_keys(xs, sa.key, &sa.salt); + ret = nsim_ipsec_parse_proto_keys(dev, xs, sa.key, &sa.salt); if (ret) { NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for SA table"); return ret; @@ -200,9 +199,9 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs, return 0; }
-static void nsim_ipsec_del_sa(struct xfrm_state *xs) +static void nsim_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs) { - struct netdevsim *ns = netdev_priv(xs->xso.real_dev); + struct netdevsim *ns = netdev_priv(dev); struct nsim_ipsec *ipsec = &ns->ipsec; u16 sa_idx;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fa79145518d1..4f8163783956 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1012,9 +1012,13 @@ struct netdev_bpf {
#ifdef CONFIG_XFRM_OFFLOAD struct xfrmdev_ops { - int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); - void (*xdo_dev_state_delete) (struct xfrm_state *x); - void (*xdo_dev_state_free) (struct xfrm_state *x); + int (*xdo_dev_state_add)(struct net_device *dev, + struct xfrm_state *x, + struct netlink_ext_ack *extack); + void (*xdo_dev_state_delete)(struct net_device *dev, + struct xfrm_state *x); + void (*xdo_dev_state_free)(struct net_device *dev, + struct xfrm_state *x); bool (*xdo_dev_offload_ok) (struct sk_buff *skb, struct xfrm_state *x); void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 39365fd2ea17..3d2f6c879311 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -147,8 +147,16 @@ enum { };
struct xfrm_dev_offload { + /* The device for this offload. + * Device drivers should not use this directly, as that will prevent + * them from working with bonding device. Instead, the device passed + * to the add/delete callbacks should be used. + */ struct net_device *dev; netdevice_tracker dev_tracker; + /* This is a private pointer used by the bonding driver. + * Device drivers should not use it. + */ struct net_device *real_dev; unsigned long offload_handle; u8 dir : 2; diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 0be5f7ffd019..3be0139373f7 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -309,7 +309,6 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
xso->dev = dev; netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC); - xso->real_dev = dev;
if (xuo->flags & XFRM_OFFLOAD_INBOUND) xso->dir = XFRM_DEV_OFFLOAD_IN; @@ -321,11 +320,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, else xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
- err = dev->xfrmdev_ops->xdo_dev_state_add(x, extack); + err = dev->xfrmdev_ops->xdo_dev_state_add(dev, x, extack); if (err) { xso->dev = NULL; xso->dir = 0; - xso->real_dev = NULL; netdev_put(dev, &xso->dev_tracker); xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 33d1f5679e8b..e9b6e25508f6 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -767,7 +767,7 @@ void xfrm_dev_state_delete(struct xfrm_state *x) struct net_device *dev = READ_ONCE(xso->dev);
if (dev) { - dev->xfrmdev_ops->xdo_dev_state_delete(x); + dev->xfrmdev_ops->xdo_dev_state_delete(dev, x); spin_lock_bh(&xfrm_state_dev_gc_lock); hlist_add_head(&x->dev_gclist, &xfrm_state_dev_gc_list); spin_unlock_bh(&xfrm_state_dev_gc_lock); @@ -789,7 +789,7 @@ void xfrm_dev_state_free(struct xfrm_state *x) spin_unlock_bh(&xfrm_state_dev_gc_lock);
if (dev->xfrmdev_ops->xdo_dev_state_free) - dev->xfrmdev_ops->xdo_dev_state_free(x); + dev->xfrmdev_ops->xdo_dev_state_free(dev, x); WRITE_ONCE(xso->dev, NULL); xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; netdev_put(dev, &xso->dev_tracker); @@ -1551,16 +1551,18 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { struct xfrm_dev_offload *xdo = &pol->xdo; struct xfrm_dev_offload *xso = &x->xso; + struct net_device *dev = xdo->dev;
xso->type = XFRM_DEV_OFFLOAD_PACKET; xso->dir = xdo->dir; - xso->dev = xdo->dev; + xso->dev = dev; xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ; - netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC); - error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL); + netdev_hold(dev, &xso->dev_tracker, GFP_ATOMIC); + error = dev->xfrmdev_ops->xdo_dev_state_add(dev, x, + NULL); if (error) { xso->dir = 0; - netdev_put(xso->dev, &xso->dev_tracker); + netdev_put(dev, &xso->dev_tracker); xso->dev = NULL; xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; x->km.state = XFRM_STATE_DEAD;
On Mon, 7 Apr 2025 16:35:40 +0300 Cosmin Ratiu wrote:
@@ -424,10 +424,10 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
- This copies the protocol keys and salt to our own data tables. The
- 82599 family only supports the one algorithm.
**/ -static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs, +static int ixgbe_ipsec_parse_proto_keys(struct net_device *dev,
struct xfrm_state *xs, u32 *mykey, u32 *mysalt)
nit: you missed a few kdoc changes in this patch
Hi Cosmin,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Cosmin-Ratiu/net-mlx5-Avoid-u... base: net-next/main patch link: https://lore.kernel.org/r/20250407133542.2668491-5-cratiu%40nvidia.com patch subject: [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} config: hexagon-randconfig-002-20250408 (https://download.01.org/0day-ci/archive/20250409/202504091346.cvaZAxVI-lkp@i...) compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 92c93f5286b9ff33f27ff694d2dc33da1c07afdd) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250409/202504091346.cvaZAxVI-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202504091346.cvaZAxVI-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/net/bonding/bond_main.c:462: warning: Function parameter or struct member 'bond_dev' not described in 'bond_ipsec_add_sa' drivers/net/bonding/bond_main.c:559: warning: Function parameter or struct member 'bond_dev' not described in 'bond_ipsec_del_sa'
vim +462 drivers/net/bonding/bond_main.c
1ddec5d0eec4b7 Hangbin Liu 2024-09-04 453 18cb261afd7bf5 Jarod Wilson 2020-06-19 454 /** 18cb261afd7bf5 Jarod Wilson 2020-06-19 455 * bond_ipsec_add_sa - program device with a security association 18cb261afd7bf5 Jarod Wilson 2020-06-19 456 * @xs: pointer to transformer state struct 7681a4f58fb9c3 Leon Romanovsky 2023-01-24 457 * @extack: extack point to fill failure reason 18cb261afd7bf5 Jarod Wilson 2020-06-19 458 **/ 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 459 static int bond_ipsec_add_sa(struct net_device *bond_dev, 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 460 struct xfrm_state *xs, 7681a4f58fb9c3 Leon Romanovsky 2023-01-24 461 struct netlink_ext_ack *extack) 18cb261afd7bf5 Jarod Wilson 2020-06-19 @462 { 907ed83a7583e8 Jianbo Liu 2024-08-23 463 struct net_device *real_dev; 2aeeef906d5a52 Jianbo Liu 2024-08-23 464 netdevice_tracker tracker; 9a5605505d9c7d Taehee Yoo 2021-07-05 465 struct bond_ipsec *ipsec; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 466 struct bonding *bond; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 467 struct slave *slave; b648eba4c69e58 Taehee Yoo 2021-07-05 468 int err; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 469 5cd24cbe7dca62 Jarod Wilson 2020-07-08 470 if (!bond_dev) 5cd24cbe7dca62 Jarod Wilson 2020-07-08 471 return -EINVAL; 18cb261afd7bf5 Jarod Wilson 2020-06-19 472 b648eba4c69e58 Taehee Yoo 2021-07-05 473 rcu_read_lock(); 5cd24cbe7dca62 Jarod Wilson 2020-07-08 474 bond = netdev_priv(bond_dev); f548a476268d62 Jarod Wilson 2020-07-08 475 slave = rcu_dereference(bond->curr_active_slave); 2aeeef906d5a52 Jianbo Liu 2024-08-23 476 real_dev = slave ? slave->dev : NULL; 2aeeef906d5a52 Jianbo Liu 2024-08-23 477 netdev_hold(real_dev, &tracker, GFP_ATOMIC); 105cd17a866017 Taehee Yoo 2021-07-05 478 rcu_read_unlock(); 2aeeef906d5a52 Jianbo Liu 2024-08-23 479 if (!real_dev) { 2aeeef906d5a52 Jianbo Liu 2024-08-23 480 err = -ENODEV; 2aeeef906d5a52 Jianbo Liu 2024-08-23 481 goto out; 105cd17a866017 Taehee Yoo 2021-07-05 482 } 105cd17a866017 Taehee Yoo 2021-07-05 483 907ed83a7583e8 Jianbo Liu 2024-08-23 484 if (!real_dev->xfrmdev_ops || 907ed83a7583e8 Jianbo Liu 2024-08-23 485 !real_dev->xfrmdev_ops->xdo_dev_state_add || 907ed83a7583e8 Jianbo Liu 2024-08-23 486 netif_is_bond_master(real_dev)) { 3fe57986271aee Leon Romanovsky 2023-01-24 487 NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload"); 2aeeef906d5a52 Jianbo Liu 2024-08-23 488 err = -EINVAL; 2aeeef906d5a52 Jianbo Liu 2024-08-23 489 goto out; 18cb261afd7bf5 Jarod Wilson 2020-06-19 490 } 18cb261afd7bf5 Jarod Wilson 2020-06-19 491 2aeeef906d5a52 Jianbo Liu 2024-08-23 492 ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL); 9a5605505d9c7d Taehee Yoo 2021-07-05 493 if (!ipsec) { 2aeeef906d5a52 Jianbo Liu 2024-08-23 494 err = -ENOMEM; 2aeeef906d5a52 Jianbo Liu 2024-08-23 495 goto out; 9a5605505d9c7d Taehee Yoo 2021-07-05 496 } 9a5605505d9c7d Taehee Yoo 2021-07-05 497 907ed83a7583e8 Jianbo Liu 2024-08-23 498 xs->xso.real_dev = real_dev; 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 499 err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); 9a5605505d9c7d Taehee Yoo 2021-07-05 500 if (!err) { 9a5605505d9c7d Taehee Yoo 2021-07-05 501 ipsec->xs = xs; 9a5605505d9c7d Taehee Yoo 2021-07-05 502 INIT_LIST_HEAD(&ipsec->list); 2aeeef906d5a52 Jianbo Liu 2024-08-23 503 mutex_lock(&bond->ipsec_lock); 9a5605505d9c7d Taehee Yoo 2021-07-05 504 list_add(&ipsec->list, &bond->ipsec_list); 2aeeef906d5a52 Jianbo Liu 2024-08-23 505 mutex_unlock(&bond->ipsec_lock); 9a5605505d9c7d Taehee Yoo 2021-07-05 506 } else { 9a5605505d9c7d Taehee Yoo 2021-07-05 507 kfree(ipsec); 9a5605505d9c7d Taehee Yoo 2021-07-05 508 } 2aeeef906d5a52 Jianbo Liu 2024-08-23 509 out: 2aeeef906d5a52 Jianbo Liu 2024-08-23 510 netdev_put(real_dev, &tracker); b648eba4c69e58 Taehee Yoo 2021-07-05 511 return err; 18cb261afd7bf5 Jarod Wilson 2020-06-19 512 } 18cb261afd7bf5 Jarod Wilson 2020-06-19 513 9a5605505d9c7d Taehee Yoo 2021-07-05 514 static void bond_ipsec_add_sa_all(struct bonding *bond) 9a5605505d9c7d Taehee Yoo 2021-07-05 515 { 9a5605505d9c7d Taehee Yoo 2021-07-05 516 struct net_device *bond_dev = bond->dev; 907ed83a7583e8 Jianbo Liu 2024-08-23 517 struct net_device *real_dev; 9a5605505d9c7d Taehee Yoo 2021-07-05 518 struct bond_ipsec *ipsec; 9a5605505d9c7d Taehee Yoo 2021-07-05 519 struct slave *slave; 9a5605505d9c7d Taehee Yoo 2021-07-05 520 2aeeef906d5a52 Jianbo Liu 2024-08-23 521 slave = rtnl_dereference(bond->curr_active_slave); 2aeeef906d5a52 Jianbo Liu 2024-08-23 522 real_dev = slave ? slave->dev : NULL; 2aeeef906d5a52 Jianbo Liu 2024-08-23 523 if (!real_dev) 2aeeef906d5a52 Jianbo Liu 2024-08-23 524 return; 9a5605505d9c7d Taehee Yoo 2021-07-05 525 2aeeef906d5a52 Jianbo Liu 2024-08-23 526 mutex_lock(&bond->ipsec_lock); 907ed83a7583e8 Jianbo Liu 2024-08-23 527 if (!real_dev->xfrmdev_ops || 907ed83a7583e8 Jianbo Liu 2024-08-23 528 !real_dev->xfrmdev_ops->xdo_dev_state_add || 907ed83a7583e8 Jianbo Liu 2024-08-23 529 netif_is_bond_master(real_dev)) { 9a5605505d9c7d Taehee Yoo 2021-07-05 530 if (!list_empty(&bond->ipsec_list)) 907ed83a7583e8 Jianbo Liu 2024-08-23 531 slave_warn(bond_dev, real_dev, 9a5605505d9c7d Taehee Yoo 2021-07-05 532 "%s: no slave xdo_dev_state_add\n", 9a5605505d9c7d Taehee Yoo 2021-07-05 533 __func__); 9a5605505d9c7d Taehee Yoo 2021-07-05 534 goto out; 9a5605505d9c7d Taehee Yoo 2021-07-05 535 } 9a5605505d9c7d Taehee Yoo 2021-07-05 536 9a5605505d9c7d Taehee Yoo 2021-07-05 537 list_for_each_entry(ipsec, &bond->ipsec_list, list) { 2aeeef906d5a52 Jianbo Liu 2024-08-23 538 /* If new state is added before ipsec_lock acquired */ 2aeeef906d5a52 Jianbo Liu 2024-08-23 539 if (ipsec->xs->xso.real_dev == real_dev) 2aeeef906d5a52 Jianbo Liu 2024-08-23 540 continue; 2aeeef906d5a52 Jianbo Liu 2024-08-23 541 907ed83a7583e8 Jianbo Liu 2024-08-23 542 ipsec->xs->xso.real_dev = real_dev; 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 543 if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 544 ipsec->xs, NULL)) { 907ed83a7583e8 Jianbo Liu 2024-08-23 545 slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); 9a5605505d9c7d Taehee Yoo 2021-07-05 546 ipsec->xs->xso.real_dev = NULL; 9a5605505d9c7d Taehee Yoo 2021-07-05 547 } 9a5605505d9c7d Taehee Yoo 2021-07-05 548 } 9a5605505d9c7d Taehee Yoo 2021-07-05 549 out: 2aeeef906d5a52 Jianbo Liu 2024-08-23 550 mutex_unlock(&bond->ipsec_lock); 9a5605505d9c7d Taehee Yoo 2021-07-05 551 } 9a5605505d9c7d Taehee Yoo 2021-07-05 552 18cb261afd7bf5 Jarod Wilson 2020-06-19 553 /** 18cb261afd7bf5 Jarod Wilson 2020-06-19 554 * bond_ipsec_del_sa - clear out this specific SA 18cb261afd7bf5 Jarod Wilson 2020-06-19 555 * @xs: pointer to transformer state struct 18cb261afd7bf5 Jarod Wilson 2020-06-19 556 **/ 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 557 static void bond_ipsec_del_sa(struct net_device *bond_dev, 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 558 struct xfrm_state *xs) 18cb261afd7bf5 Jarod Wilson 2020-06-19 @559 { 907ed83a7583e8 Jianbo Liu 2024-08-23 560 struct net_device *real_dev; 2aeeef906d5a52 Jianbo Liu 2024-08-23 561 netdevice_tracker tracker; 9a5605505d9c7d Taehee Yoo 2021-07-05 562 struct bond_ipsec *ipsec; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 563 struct bonding *bond; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 564 struct slave *slave; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 565 5cd24cbe7dca62 Jarod Wilson 2020-07-08 566 if (!bond_dev) 5cd24cbe7dca62 Jarod Wilson 2020-07-08 567 return; 5cd24cbe7dca62 Jarod Wilson 2020-07-08 568 a22c39b831a081 Taehee Yoo 2021-07-05 569 rcu_read_lock(); 5cd24cbe7dca62 Jarod Wilson 2020-07-08 570 bond = netdev_priv(bond_dev); f548a476268d62 Jarod Wilson 2020-07-08 571 slave = rcu_dereference(bond->curr_active_slave); 2aeeef906d5a52 Jianbo Liu 2024-08-23 572 real_dev = slave ? slave->dev : NULL; 2aeeef906d5a52 Jianbo Liu 2024-08-23 573 netdev_hold(real_dev, &tracker, GFP_ATOMIC); 2aeeef906d5a52 Jianbo Liu 2024-08-23 574 rcu_read_unlock(); 18cb261afd7bf5 Jarod Wilson 2020-06-19 575 18cb261afd7bf5 Jarod Wilson 2020-06-19 576 if (!slave) a22c39b831a081 Taehee Yoo 2021-07-05 577 goto out; 18cb261afd7bf5 Jarod Wilson 2020-06-19 578 9a5605505d9c7d Taehee Yoo 2021-07-05 579 if (!xs->xso.real_dev) 9a5605505d9c7d Taehee Yoo 2021-07-05 580 goto out; 9a5605505d9c7d Taehee Yoo 2021-07-05 581 907ed83a7583e8 Jianbo Liu 2024-08-23 582 WARN_ON(xs->xso.real_dev != real_dev); 18cb261afd7bf5 Jarod Wilson 2020-06-19 583 907ed83a7583e8 Jianbo Liu 2024-08-23 584 if (!real_dev->xfrmdev_ops || 907ed83a7583e8 Jianbo Liu 2024-08-23 585 !real_dev->xfrmdev_ops->xdo_dev_state_delete || 907ed83a7583e8 Jianbo Liu 2024-08-23 586 netif_is_bond_master(real_dev)) { 907ed83a7583e8 Jianbo Liu 2024-08-23 587 slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__); a22c39b831a081 Taehee Yoo 2021-07-05 588 goto out; 18cb261afd7bf5 Jarod Wilson 2020-06-19 589 } 18cb261afd7bf5 Jarod Wilson 2020-06-19 590 1e4d9370eb7223 Cosmin Ratiu 2025-04-07 591 real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs); a22c39b831a081 Taehee Yoo 2021-07-05 592 out: 2aeeef906d5a52 Jianbo Liu 2024-08-23 593 netdev_put(real_dev, &tracker); 2aeeef906d5a52 Jianbo Liu 2024-08-23 594 mutex_lock(&bond->ipsec_lock); 9a5605505d9c7d Taehee Yoo 2021-07-05 595 list_for_each_entry(ipsec, &bond->ipsec_list, list) { 9a5605505d9c7d Taehee Yoo 2021-07-05 596 if (ipsec->xs == xs) { 9a5605505d9c7d Taehee Yoo 2021-07-05 597 list_del(&ipsec->list); 9a5605505d9c7d Taehee Yoo 2021-07-05 598 kfree(ipsec); 9a5605505d9c7d Taehee Yoo 2021-07-05 599 break; 9a5605505d9c7d Taehee Yoo 2021-07-05 600 } 9a5605505d9c7d Taehee Yoo 2021-07-05 601 } 2aeeef906d5a52 Jianbo Liu 2024-08-23 602 mutex_unlock(&bond->ipsec_lock); 9a5605505d9c7d Taehee Yoo 2021-07-05 603 } 9a5605505d9c7d Taehee Yoo 2021-07-05 604
When the active link is changed for a bond device, the existing xfrm states need to be migrated over to the new link. This is done with: - bond_ipsec_del_sa_all() goes through the offloaded states list and removes all of them from hw. - bond_ipsec_add_sa_all() re-offloads all states to the new device.
But because the offload status of xfrm states isn't marked in any way, there can be bugs.
When all bond links are down, bond_ipsec_del_sa_all() unoffloads everything from the previous active link. If the same link then comes back up, nothing gets reoffloaded by bond_ipsec_add_sa_all(). This results in a stack trace like this a bit later when user space removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to remove a rule that's no longer offloaded:
[] Call Trace: [] <TASK> [] ? __warn+0x7d/0x110 [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] ? report_bug+0x16d/0x180 [] ? handle_bug+0x4f/0x90 [] ? exc_invalid_op+0x14/0x70 [] ? asm_exc_invalid_op+0x16/0x20 [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core] [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] bond_ipsec_del_sa+0x1ab/0x200 [bonding] [] xfrm_dev_state_delete+0x1f/0x60 [] __xfrm_state_delete+0x196/0x200 [] xfrm_state_delete+0x21/0x40 [] xfrm_del_sa+0x69/0x110 [] xfrm_user_rcv_msg+0x11d/0x300 [] ? release_pages+0xca/0x140 [] ? copy_to_user_tmpl.part.0+0x110/0x110 [] netlink_rcv_skb+0x54/0x100 [] xfrm_netlink_rcv+0x31/0x40 [] netlink_unicast+0x1fc/0x2d0 [] netlink_sendmsg+0x1e4/0x410 [] __sock_sendmsg+0x38/0x60 [] sock_write_iter+0x94/0xf0 [] vfs_write+0x338/0x3f0 [] ksys_write+0xba/0xd0 [] do_syscall_64+0x4c/0x100 [] entry_SYSCALL_64_after_hwframe+0x4b/0x53
There's also another theoretical bug: Calling bond_ipsec_del_sa_all() multiple times can result in corruption in the driver implementation if the double-free isn't tolerated. This isn't nice.
Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm state was unoffloaded from a device, but a race with netdevsim's .xdo_dev_offload_ok() accessing real_dev was considered a sufficient reason to not set real_dev to NULL anymore. This unfortunately introduced the new bugs.
Since .xdo_dev_offload_ok() was significantly refactored by [1] and there are no more users in the stack of xso.real_dev, that race is now gone and xs->xso.real_dev can now once again be used to represent which device (if any) currently holds the offloaded rule.
Go one step further and set real_dev after add/before delete calls, to catch any future driver misuses of real_dev.
[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/ Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- drivers/net/bonding/bond_main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2ffde8f672c2..443624504767 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -495,9 +495,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev, goto out; }
- xs->xso.real_dev = real_dev; err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); if (!err) { + xs->xso.real_dev = real_dev; ipsec->xs = xs; INIT_LIST_HEAD(&ipsec->list); mutex_lock(&bond->ipsec_lock); @@ -539,12 +539,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) if (ipsec->xs->xso.real_dev == real_dev) continue;
- ipsec->xs->xso.real_dev = real_dev; if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, ipsec->xs, NULL)) { slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); - ipsec->xs->xso.real_dev = NULL; + continue; } + ipsec->xs->xso.real_dev = real_dev; } out: mutex_unlock(&bond->ipsec_lock); @@ -627,6 +627,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) __func__); continue; } + ipsec->xs->xso.real_dev = NULL; real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, ipsec->xs); if (real_dev->xfrmdev_ops->xdo_dev_state_free) @@ -662,6 +663,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
WARN_ON(xs->xso.real_dev != real_dev);
+ xs->xso.real_dev = NULL; if (real_dev && real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
Refactor the bonding ipsec offload operations to fix a number of long-standing control plane races between state migration and user deletion and a few other issues.
xfrm state deletion can happen concurrently with bond_change_active_slave() operation. This manifests itself as a bond_ipsec_del_sa() call with x->lock held, followed by a bond_ipsec_free_sa() a bit later from a wq. The alternate path of these calls coming from xfrm_dev_state_flush() can't happen, as that needs the RTNL lock and bond_change_active_slave() already holds it.
1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second time on an xfrm state that was concurrently killed. This is bad. 2. bond_ipsec_add_sa_all() can add a state on the new device, but pending bond_ipsec_free_sa() calls from the old device will then hit the WARN_ON() and then, worse, call xdo_dev_state_free() on the new device without a corresponding xdo_dev_state_delete(). 3. Resolve a sleeping in atomic context introduced by the mentioned "Fixes" commit.
bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock and check for x->km.state to help with problems 1 and 2. And since xso.real_dev is now a private pointer managed by the bonding driver in xfrm state, make better use of it to fully fix problems 1 and 2. In bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor bond_ipsec_free_sa() could run concurrently.
Fix problem 3 by moving the list cleanup (which requires the mutex) from bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()
Finally, simplify bond_ipsec_free_sa() by not using current_active_slave at all, because now that xso.real_dev is protected by locks it can be trusted to always reflect the offload device.
Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com --- drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 25 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 443624504767..ede3287318f8 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -544,7 +544,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); continue; } + + spin_lock_bh(&ipsec->xs->lock); + /* xs might have been killed by the user during the migration + * to the new dev, but bond_ipsec_del_sa() should have done + * nothing, as xso.real_dev is NULL. + * Delete it from the device we just added it to. The pending + * bond_ipsec_free_sa() call will do the rest of the cleanup. + */ + if (ipsec->xs->km.state == XFRM_STATE_DEAD && + real_dev->xfrmdev_ops->xdo_dev_state_delete) + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, + ipsec->xs); ipsec->xs->xso.real_dev = real_dev; + spin_unlock_bh(&ipsec->xs->lock); } out: mutex_unlock(&bond->ipsec_lock); @@ -559,7 +572,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev, { struct net_device *real_dev; netdevice_tracker tracker; - struct bond_ipsec *ipsec; struct bonding *bond; struct slave *slave;
@@ -591,15 +603,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev, real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs); out: netdev_put(real_dev, &tracker); - mutex_lock(&bond->ipsec_lock); - list_for_each_entry(ipsec, &bond->ipsec_list, list) { - if (ipsec->xs == xs) { - list_del(&ipsec->list); - kfree(ipsec); - break; - } - } - mutex_unlock(&bond->ipsec_lock); }
static void bond_ipsec_del_sa_all(struct bonding *bond) @@ -627,9 +630,15 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) __func__); continue; } + + spin_lock_bh(&ipsec->xs->lock); ipsec->xs->xso.real_dev = NULL; - real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, - ipsec->xs); + /* Don't double delete states killed by the user. */ + if (ipsec->xs->km.state != XFRM_STATE_DEAD) + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, + ipsec->xs); + spin_unlock_bh(&ipsec->xs->lock); + if (real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, ipsec->xs); @@ -641,34 +650,33 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev, struct xfrm_state *xs) { struct net_device *real_dev; - netdevice_tracker tracker; + struct bond_ipsec *ipsec; struct bonding *bond; - struct slave *slave;
if (!bond_dev) return;
- rcu_read_lock(); bond = netdev_priv(bond_dev); - slave = rcu_dereference(bond->curr_active_slave); - real_dev = slave ? slave->dev : NULL; - netdev_hold(real_dev, &tracker, GFP_ATOMIC); - rcu_read_unlock(); - - if (!slave) - goto out;
+ mutex_lock(&bond->ipsec_lock); if (!xs->xso.real_dev) goto out;
- WARN_ON(xs->xso.real_dev != real_dev); + real_dev = xs->xso.real_dev;
xs->xso.real_dev = NULL; - if (real_dev && real_dev->xfrmdev_ops && + if (real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs); out: - netdev_put(real_dev, &tracker); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->xs == xs) { + list_del(&ipsec->list); + kfree(ipsec); + break; + } + } + mutex_unlock(&bond->ipsec_lock); }
/**
Hi Cosmin, On Mon, Apr 07, 2025 at 04:35:42PM +0300, Cosmin Ratiu wrote:
Refactor the bonding ipsec offload operations to fix a number of long-standing control plane races between state migration and user deletion and a few other issues.
xfrm state deletion can happen concurrently with bond_change_active_slave() operation. This manifests itself as a bond_ipsec_del_sa() call with x->lock held, followed by a bond_ipsec_free_sa() a bit later from a wq. The alternate path of these calls coming from xfrm_dev_state_flush() can't happen, as that needs the RTNL lock and bond_change_active_slave() already holds it.
- bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second time on an xfrm state that was concurrently killed. This is bad.
- bond_ipsec_add_sa_all() can add a state on the new device, but pending bond_ipsec_free_sa() calls from the old device will then hit the WARN_ON() and then, worse, call xdo_dev_state_free() on the new device without a corresponding xdo_dev_state_delete().
- Resolve a sleeping in atomic context introduced by the mentioned "Fixes" commit.
bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock and check for x->km.state to help with problems 1 and 2. And since xso.real_dev is now a private pointer managed by the bonding driver in xfrm state, make better use of it to fully fix problems 1 and 2. In bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor bond_ipsec_free_sa() could run concurrently.
Fix problem 3 by moving the list cleanup (which requires the mutex) from bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()
Finally, simplify bond_ipsec_free_sa() by not using current_active_slave at all, because now that xso.real_dev is protected by locks it can be trusted to always reflect the offload device.
Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com
drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 25 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 443624504767..ede3287318f8 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -544,7 +544,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); continue; }
spin_lock_bh(&ipsec->xs->lock);
/* xs might have been killed by the user during the migration
* to the new dev, but bond_ipsec_del_sa() should have done
* nothing, as xso.real_dev is NULL.
* Delete it from the device we just added it to. The pending
* bond_ipsec_free_sa() call will do the rest of the cleanup.
*/
if (ipsec->xs->km.state == XFRM_STATE_DEAD &&
real_dev->xfrmdev_ops->xdo_dev_state_delete)
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs->xso.real_dev = real_dev;ipsec->xs);
}spin_unlock_bh(&ipsec->xs->lock);
out: mutex_unlock(&bond->ipsec_lock); @@ -559,7 +572,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev, { struct net_device *real_dev; netdevice_tracker tracker;
- struct bond_ipsec *ipsec; struct bonding *bond; struct slave *slave;
@@ -591,15 +603,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev, real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
Thanks a lot for the fixes. With your patch applied. I see the bond_ipsec_del_sa() still has WARN_ON(xs->xso.real_dev != real_dev);
Do you think if we still has this possibility? If yes, should we do xdo_dev_state_delete() on xs->xso.real_dev or real_dev?
Thanks Hangbin
On Tue, 2025-04-08 at 08:08 +0000, Hangbin Liu wrote:
Hi Cosmin,
Thanks a lot for the fixes. With your patch applied. I see the bond_ipsec_del_sa() still has WARN_ON(xs->xso.real_dev != real_dev);
Do you think if we still has this possibility? If yes, should we do xdo_dev_state_delete() on xs->xso.real_dev or real_dev?
You are right, the WARN_ON doesn't make sense any more. bond_ipsec_del_sa() can get the same treatment as bond_ipsec_free_sa(), I'll do so in the next version.
Cosmin.
linux-kselftest-mirror@lists.linaro.org