The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e219440da0c3a63b3cec23d08473436ae7d95fa6 Mon Sep 17 00:00:00 2001
From: Maor Dickman <maord(a)nvidia.com>
Date: Tue, 23 Nov 2021 14:37:11 +0200
Subject: [PATCH] net/mlx5: E-Switch, Use indirect table only if all
destinations support it
When adding rule with multiple destinations, indirect table is used for all of
the destinations if at least one of the destinations support it, this can cause
creation of invalid indirect tables for the destinations that doesn't support it.
Fixed it by using indirect table only if all destinations support it.
Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Maor Dickman <maord(a)nvidia.com>
Reviewed-by: Roi Dayan <roid(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 275af1d2b4d3..32bc08a39925 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -329,14 +329,25 @@ static bool
esw_is_indir_table(struct mlx5_eswitch *esw, struct mlx5_flow_attr *attr)
{
struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr;
+ bool result = false;
int i;
- for (i = esw_attr->split_count; i < esw_attr->out_count; i++)
+ /* Indirect table is supported only for flows with in_port uplink
+ * and the destination is vport on the same eswitch as the uplink,
+ * return false in case at least one of destinations doesn't meet
+ * this criteria.
+ */
+ for (i = esw_attr->split_count; i < esw_attr->out_count; i++) {
if (esw_attr->dests[i].rep &&
mlx5_esw_indir_table_needed(esw, attr, esw_attr->dests[i].rep->vport,
- esw_attr->dests[i].mdev))
- return true;
- return false;
+ esw_attr->dests[i].mdev)) {
+ result = true;
+ } else {
+ result = false;
+ break;
+ }
+ }
+ return result;
}
static int
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4cce2ccf08fbc27ae34ce0e72db15166e7b5f6a7 Mon Sep 17 00:00:00 2001
From: Tariq Toukan <tariqt(a)nvidia.com>
Date: Mon, 13 Sep 2021 13:54:30 +0300
Subject: [PATCH] net/mlx5e: Sync TIR params updates against concurrent
create/modify
Transport Interface Receive (TIR) objects perform the packet processing and
reassembly and is also responsible for demultiplexing the packets into the
different RQs.
There are certain TIR context attributes that propagate to the pointed RQs
and applied to them (like packet_merge offloads (LRO/SHAMPO) and
tunneled_offload_en). When TIRs do not agree on attributes values, a "last
one wins" policy is applied. Hence, if not synced properly, a race between
TIR params update and a concurrent TIR create/modify operation might yield
to a mismatch between the shadow parameters in SW and the actual applied
state of the RQs in HW.
tunneled_offload_en is a fixed attribute per profile, while packet merge
offload state might be toggled and get out-of-sync. When this happens,
packet_merge offload might be working although not requested, or the
opposite.
All updates to packet_merge state and all create/modify operations of
regular redirection/steering TIRs are done under the same priv->state_lock,
so they do not run in parallel, and no race is possible.
However, there are other kind of TIRs (acceleration offloads TIRs, like TLS
TIRs) which are created on demand for each new connection without holding
the coarse priv->state_lock, hence might race.
Fix this by synchronizing all packet_merge state reads and writes against
all TIR create/modify operations. Include the modify operations of the
regular redirection steering TIRs under the new lock, for better code
layering and division of responsibilities.
Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt(a)nvidia.com>
Reviewed-by: Moshe Shemesh <moshe(a)nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
index 142953847996..0015a81eb9a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
@@ -13,6 +13,9 @@ struct mlx5e_rx_res {
unsigned int max_nch;
u32 drop_rqn;
+ struct mlx5e_packet_merge_param pkt_merge_param;
+ struct rw_semaphore pkt_merge_param_sem;
+
struct mlx5e_rss *rss[MLX5E_MAX_NUM_RSS];
bool rss_active;
u32 rss_rqns[MLX5E_INDIR_RQT_SIZE];
@@ -392,6 +395,7 @@ static int mlx5e_rx_res_ptp_init(struct mlx5e_rx_res *res)
if (err)
goto out;
+ /* Separated from the channels RQs, does not share pkt_merge state with them */
mlx5e_tir_builder_build_rqt(builder, res->mdev->mlx5e_res.hw_objs.td.tdn,
mlx5e_rqt_get_rqtn(&res->ptp.rqt),
inner_ft_support);
@@ -447,6 +451,9 @@ int mlx5e_rx_res_init(struct mlx5e_rx_res *res, struct mlx5_core_dev *mdev,
res->max_nch = max_nch;
res->drop_rqn = drop_rqn;
+ res->pkt_merge_param = *init_pkt_merge_param;
+ init_rwsem(&res->pkt_merge_param_sem);
+
err = mlx5e_rx_res_rss_init_def(res, init_pkt_merge_param, init_nch);
if (err)
goto err_out;
@@ -513,7 +520,7 @@ u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res)
return mlx5e_tir_get_tirn(&res->ptp.tir);
}
-u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix)
+static u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix)
{
return mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt);
}
@@ -656,6 +663,9 @@ int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res,
if (!builder)
return -ENOMEM;
+ down_write(&res->pkt_merge_param_sem);
+ res->pkt_merge_param = *pkt_merge_param;
+
mlx5e_tir_builder_build_packet_merge(builder, pkt_merge_param);
final_err = 0;
@@ -681,6 +691,7 @@ int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res,
}
}
+ up_write(&res->pkt_merge_param_sem);
mlx5e_tir_builder_free(builder);
return final_err;
}
@@ -689,3 +700,31 @@ struct mlx5e_rss_params_hash mlx5e_rx_res_get_current_hash(struct mlx5e_rx_res *
{
return mlx5e_rss_get_hash(res->rss[0]);
}
+
+int mlx5e_rx_res_tls_tir_create(struct mlx5e_rx_res *res, unsigned int rxq,
+ struct mlx5e_tir *tir)
+{
+ bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT;
+ struct mlx5e_tir_builder *builder;
+ u32 rqtn;
+ int err;
+
+ builder = mlx5e_tir_builder_alloc(false);
+ if (!builder)
+ return -ENOMEM;
+
+ rqtn = mlx5e_rx_res_get_rqtn_direct(res, rxq);
+
+ mlx5e_tir_builder_build_rqt(builder, res->mdev->mlx5e_res.hw_objs.td.tdn, rqtn,
+ inner_ft_support);
+ mlx5e_tir_builder_build_direct(builder);
+ mlx5e_tir_builder_build_tls(builder);
+ down_read(&res->pkt_merge_param_sem);
+ mlx5e_tir_builder_build_packet_merge(builder, &res->pkt_merge_param);
+ err = mlx5e_tir_init(tir, builder, res->mdev, false);
+ up_read(&res->pkt_merge_param_sem);
+
+ mlx5e_tir_builder_free(builder);
+
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
index d09f7d174a51..b39b20a720e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
@@ -37,9 +37,6 @@ u32 mlx5e_rx_res_get_tirn_rss(struct mlx5e_rx_res *res, enum mlx5_traffic_types
u32 mlx5e_rx_res_get_tirn_rss_inner(struct mlx5e_rx_res *res, enum mlx5_traffic_types tt);
u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res);
-/* RQTN getters for modules that create their own TIRs */
-u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix);
-
/* Activate/deactivate API */
void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs);
void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res);
@@ -69,4 +66,7 @@ struct mlx5e_rss *mlx5e_rx_res_rss_get(struct mlx5e_rx_res *res, u32 rss_idx);
/* Workaround for hairpin */
struct mlx5e_rss_params_hash mlx5e_rx_res_get_current_hash(struct mlx5e_rx_res *res);
+/* Accel TIRs */
+int mlx5e_rx_res_tls_tir_create(struct mlx5e_rx_res *res, unsigned int rxq,
+ struct mlx5e_tir *tir);
#endif /* __MLX5_EN_RX_RES_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index a2a9f68579dd..15711814d2d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -100,25 +100,6 @@ mlx5e_ktls_rx_resync_create_resp_list(void)
return resp_list;
}
-static int mlx5e_ktls_create_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir, u32 rqtn)
-{
- struct mlx5e_tir_builder *builder;
- int err;
-
- builder = mlx5e_tir_builder_alloc(false);
- if (!builder)
- return -ENOMEM;
-
- mlx5e_tir_builder_build_rqt(builder, mdev->mlx5e_res.hw_objs.td.tdn, rqtn, false);
- mlx5e_tir_builder_build_direct(builder);
- mlx5e_tir_builder_build_tls(builder);
- err = mlx5e_tir_init(tir, builder, mdev, false);
-
- mlx5e_tir_builder_free(builder);
-
- return err;
-}
-
static void accel_rule_handle_work(struct work_struct *work)
{
struct mlx5e_ktls_offload_context_rx *priv_rx;
@@ -609,7 +590,6 @@ int mlx5e_ktls_add_rx(struct net_device *netdev, struct sock *sk,
struct mlx5_core_dev *mdev;
struct mlx5e_priv *priv;
int rxq, err;
- u32 rqtn;
tls_ctx = tls_get_ctx(sk);
priv = netdev_priv(netdev);
@@ -635,9 +615,7 @@ int mlx5e_ktls_add_rx(struct net_device *netdev, struct sock *sk,
priv_rx->sw_stats = &priv->tls->sw_stats;
mlx5e_set_ktls_rx_priv_ctx(tls_ctx, priv_rx);
- rqtn = mlx5e_rx_res_get_rqtn_direct(priv->rx_res, rxq);
-
- err = mlx5e_ktls_create_tir(mdev, &priv_rx->tir, rqtn);
+ err = mlx5e_rx_res_tls_tir_create(priv->rx_res, rxq, &priv_rx->tir);
if (err)
goto err_create_tir;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4cce2ccf08fbc27ae34ce0e72db15166e7b5f6a7 Mon Sep 17 00:00:00 2001
From: Tariq Toukan <tariqt(a)nvidia.com>
Date: Mon, 13 Sep 2021 13:54:30 +0300
Subject: [PATCH] net/mlx5e: Sync TIR params updates against concurrent
create/modify
Transport Interface Receive (TIR) objects perform the packet processing and
reassembly and is also responsible for demultiplexing the packets into the
different RQs.
There are certain TIR context attributes that propagate to the pointed RQs
and applied to them (like packet_merge offloads (LRO/SHAMPO) and
tunneled_offload_en). When TIRs do not agree on attributes values, a "last
one wins" policy is applied. Hence, if not synced properly, a race between
TIR params update and a concurrent TIR create/modify operation might yield
to a mismatch between the shadow parameters in SW and the actual applied
state of the RQs in HW.
tunneled_offload_en is a fixed attribute per profile, while packet merge
offload state might be toggled and get out-of-sync. When this happens,
packet_merge offload might be working although not requested, or the
opposite.
All updates to packet_merge state and all create/modify operations of
regular redirection/steering TIRs are done under the same priv->state_lock,
so they do not run in parallel, and no race is possible.
However, there are other kind of TIRs (acceleration offloads TIRs, like TLS
TIRs) which are created on demand for each new connection without holding
the coarse priv->state_lock, hence might race.
Fix this by synchronizing all packet_merge state reads and writes against
all TIR create/modify operations. Include the modify operations of the
regular redirection steering TIRs under the new lock, for better code
layering and division of responsibilities.
Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt(a)nvidia.com>
Reviewed-by: Moshe Shemesh <moshe(a)nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
index 142953847996..0015a81eb9a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.c
@@ -13,6 +13,9 @@ struct mlx5e_rx_res {
unsigned int max_nch;
u32 drop_rqn;
+ struct mlx5e_packet_merge_param pkt_merge_param;
+ struct rw_semaphore pkt_merge_param_sem;
+
struct mlx5e_rss *rss[MLX5E_MAX_NUM_RSS];
bool rss_active;
u32 rss_rqns[MLX5E_INDIR_RQT_SIZE];
@@ -392,6 +395,7 @@ static int mlx5e_rx_res_ptp_init(struct mlx5e_rx_res *res)
if (err)
goto out;
+ /* Separated from the channels RQs, does not share pkt_merge state with them */
mlx5e_tir_builder_build_rqt(builder, res->mdev->mlx5e_res.hw_objs.td.tdn,
mlx5e_rqt_get_rqtn(&res->ptp.rqt),
inner_ft_support);
@@ -447,6 +451,9 @@ int mlx5e_rx_res_init(struct mlx5e_rx_res *res, struct mlx5_core_dev *mdev,
res->max_nch = max_nch;
res->drop_rqn = drop_rqn;
+ res->pkt_merge_param = *init_pkt_merge_param;
+ init_rwsem(&res->pkt_merge_param_sem);
+
err = mlx5e_rx_res_rss_init_def(res, init_pkt_merge_param, init_nch);
if (err)
goto err_out;
@@ -513,7 +520,7 @@ u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res)
return mlx5e_tir_get_tirn(&res->ptp.tir);
}
-u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix)
+static u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix)
{
return mlx5e_rqt_get_rqtn(&res->channels[ix].direct_rqt);
}
@@ -656,6 +663,9 @@ int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res,
if (!builder)
return -ENOMEM;
+ down_write(&res->pkt_merge_param_sem);
+ res->pkt_merge_param = *pkt_merge_param;
+
mlx5e_tir_builder_build_packet_merge(builder, pkt_merge_param);
final_err = 0;
@@ -681,6 +691,7 @@ int mlx5e_rx_res_packet_merge_set_param(struct mlx5e_rx_res *res,
}
}
+ up_write(&res->pkt_merge_param_sem);
mlx5e_tir_builder_free(builder);
return final_err;
}
@@ -689,3 +700,31 @@ struct mlx5e_rss_params_hash mlx5e_rx_res_get_current_hash(struct mlx5e_rx_res *
{
return mlx5e_rss_get_hash(res->rss[0]);
}
+
+int mlx5e_rx_res_tls_tir_create(struct mlx5e_rx_res *res, unsigned int rxq,
+ struct mlx5e_tir *tir)
+{
+ bool inner_ft_support = res->features & MLX5E_RX_RES_FEATURE_INNER_FT;
+ struct mlx5e_tir_builder *builder;
+ u32 rqtn;
+ int err;
+
+ builder = mlx5e_tir_builder_alloc(false);
+ if (!builder)
+ return -ENOMEM;
+
+ rqtn = mlx5e_rx_res_get_rqtn_direct(res, rxq);
+
+ mlx5e_tir_builder_build_rqt(builder, res->mdev->mlx5e_res.hw_objs.td.tdn, rqtn,
+ inner_ft_support);
+ mlx5e_tir_builder_build_direct(builder);
+ mlx5e_tir_builder_build_tls(builder);
+ down_read(&res->pkt_merge_param_sem);
+ mlx5e_tir_builder_build_packet_merge(builder, &res->pkt_merge_param);
+ err = mlx5e_tir_init(tir, builder, res->mdev, false);
+ up_read(&res->pkt_merge_param_sem);
+
+ mlx5e_tir_builder_free(builder);
+
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
index d09f7d174a51..b39b20a720e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rx_res.h
@@ -37,9 +37,6 @@ u32 mlx5e_rx_res_get_tirn_rss(struct mlx5e_rx_res *res, enum mlx5_traffic_types
u32 mlx5e_rx_res_get_tirn_rss_inner(struct mlx5e_rx_res *res, enum mlx5_traffic_types tt);
u32 mlx5e_rx_res_get_tirn_ptp(struct mlx5e_rx_res *res);
-/* RQTN getters for modules that create their own TIRs */
-u32 mlx5e_rx_res_get_rqtn_direct(struct mlx5e_rx_res *res, unsigned int ix);
-
/* Activate/deactivate API */
void mlx5e_rx_res_channels_activate(struct mlx5e_rx_res *res, struct mlx5e_channels *chs);
void mlx5e_rx_res_channels_deactivate(struct mlx5e_rx_res *res);
@@ -69,4 +66,7 @@ struct mlx5e_rss *mlx5e_rx_res_rss_get(struct mlx5e_rx_res *res, u32 rss_idx);
/* Workaround for hairpin */
struct mlx5e_rss_params_hash mlx5e_rx_res_get_current_hash(struct mlx5e_rx_res *res);
+/* Accel TIRs */
+int mlx5e_rx_res_tls_tir_create(struct mlx5e_rx_res *res, unsigned int rxq,
+ struct mlx5e_tir *tir);
#endif /* __MLX5_EN_RX_RES_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index a2a9f68579dd..15711814d2d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -100,25 +100,6 @@ mlx5e_ktls_rx_resync_create_resp_list(void)
return resp_list;
}
-static int mlx5e_ktls_create_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir, u32 rqtn)
-{
- struct mlx5e_tir_builder *builder;
- int err;
-
- builder = mlx5e_tir_builder_alloc(false);
- if (!builder)
- return -ENOMEM;
-
- mlx5e_tir_builder_build_rqt(builder, mdev->mlx5e_res.hw_objs.td.tdn, rqtn, false);
- mlx5e_tir_builder_build_direct(builder);
- mlx5e_tir_builder_build_tls(builder);
- err = mlx5e_tir_init(tir, builder, mdev, false);
-
- mlx5e_tir_builder_free(builder);
-
- return err;
-}
-
static void accel_rule_handle_work(struct work_struct *work)
{
struct mlx5e_ktls_offload_context_rx *priv_rx;
@@ -609,7 +590,6 @@ int mlx5e_ktls_add_rx(struct net_device *netdev, struct sock *sk,
struct mlx5_core_dev *mdev;
struct mlx5e_priv *priv;
int rxq, err;
- u32 rqtn;
tls_ctx = tls_get_ctx(sk);
priv = netdev_priv(netdev);
@@ -635,9 +615,7 @@ int mlx5e_ktls_add_rx(struct net_device *netdev, struct sock *sk,
priv_rx->sw_stats = &priv->tls->sw_stats;
mlx5e_set_ktls_rx_priv_ctx(tls_ctx, priv_rx);
- rqtn = mlx5e_rx_res_get_rqtn_direct(priv->rx_res, rxq);
-
- err = mlx5e_ktls_create_tir(mdev, &priv_rx->tir, rqtn);
+ err = mlx5e_rx_res_tls_tir_create(priv->rx_res, rxq, &priv_rx->tir);
if (err)
goto err_create_tir;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c65d638ab39034cbaa36773b980d28106cfc81fa Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Wed, 17 Nov 2021 13:33:57 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix Software parser inner l3 type setting
in case of encapsulation
Current code wrongly uses the skb->protocol field which reflects the
outer l3 protocol to set the inner l3 type in Software Parser (SWP)
fields settings in the ethernet segment (eseg) in flows where inner
l3 exists like in Vxlan over ESP flow, the above method wrongly use
the outer protocol type instead of the inner one. thus breaking cases
where inner and outer headers have different protocols.
Fix by setting the inner l3 type in SWP according to the inner l3 ip
header version.
Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fb5397324aa4..2db9573a3fe6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -191,7 +191,7 @@ static void mlx5e_ipsec_set_swp(struct sk_buff *skb,
eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
eseg->swp_inner_l4_offset =
(skb->csum_start + skb->head - skb->data) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (inner_ip_hdr(skb)->version == 6)
eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
break;
default:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c65d638ab39034cbaa36773b980d28106cfc81fa Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Wed, 17 Nov 2021 13:33:57 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix Software parser inner l3 type setting
in case of encapsulation
Current code wrongly uses the skb->protocol field which reflects the
outer l3 protocol to set the inner l3 type in Software Parser (SWP)
fields settings in the ethernet segment (eseg) in flows where inner
l3 exists like in Vxlan over ESP flow, the above method wrongly use
the outer protocol type instead of the inner one. thus breaking cases
where inner and outer headers have different protocols.
Fix by setting the inner l3 type in SWP according to the inner l3 ip
header version.
Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fb5397324aa4..2db9573a3fe6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -191,7 +191,7 @@ static void mlx5e_ipsec_set_swp(struct sk_buff *skb,
eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
eseg->swp_inner_l4_offset =
(skb->csum_start + skb->head - skb->data) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (inner_ip_hdr(skb)->version == 6)
eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
break;
default:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c65d638ab39034cbaa36773b980d28106cfc81fa Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Wed, 17 Nov 2021 13:33:57 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix Software parser inner l3 type setting
in case of encapsulation
Current code wrongly uses the skb->protocol field which reflects the
outer l3 protocol to set the inner l3 type in Software Parser (SWP)
fields settings in the ethernet segment (eseg) in flows where inner
l3 exists like in Vxlan over ESP flow, the above method wrongly use
the outer protocol type instead of the inner one. thus breaking cases
where inner and outer headers have different protocols.
Fix by setting the inner l3 type in SWP according to the inner l3 ip
header version.
Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fb5397324aa4..2db9573a3fe6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -191,7 +191,7 @@ static void mlx5e_ipsec_set_swp(struct sk_buff *skb,
eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
eseg->swp_inner_l4_offset =
(skb->csum_start + skb->head - skb->data) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (inner_ip_hdr(skb)->version == 6)
eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
break;
default:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c65d638ab39034cbaa36773b980d28106cfc81fa Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Wed, 17 Nov 2021 13:33:57 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix Software parser inner l3 type setting
in case of encapsulation
Current code wrongly uses the skb->protocol field which reflects the
outer l3 protocol to set the inner l3 type in Software Parser (SWP)
fields settings in the ethernet segment (eseg) in flows where inner
l3 exists like in Vxlan over ESP flow, the above method wrongly use
the outer protocol type instead of the inner one. thus breaking cases
where inner and outer headers have different protocols.
Fix by setting the inner l3 type in SWP according to the inner l3 ip
header version.
Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fb5397324aa4..2db9573a3fe6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -191,7 +191,7 @@ static void mlx5e_ipsec_set_swp(struct sk_buff *skb,
eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
eseg->swp_inner_l4_offset =
(skb->csum_start + skb->head - skb->data) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
+ if (inner_ip_hdr(skb)->version == 6)
eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
break;
default:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From dacb5d8875cc6cd3a553363b4d6f06760fcbe70c Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 26 Nov 2021 19:34:21 +0100
Subject: [PATCH] tcp: fix page frag corruption on page fault
Steffen reported a TCP stream corruption for HTTP requests
served by the apache web-server using a cifs mount-point
and memory mapping the relevant file.
The root cause is quite similar to the one addressed by
commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
memory reclaim"). Here the nested access to the task page frag
is caused by a page fault on the (mmapped) user-space memory
buffer coming from the cifs file.
The page fault handler performs an smb transaction on a different
socket, inside the same process context. Since sk->sk_allaction
for such socket does not prevent the usage for the task_frag,
the nested allocation modify "under the hood" the page frag
in use by the outer sendmsg call, corrupting the stream.
The overall relevant stack trace looks like the following:
httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked:
ffffffff91461d91 tcp_sendmsg_locked+0x1
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139814e sock_sendmsg+0x3e
ffffffffc06dfe1d smb_send_kvec+0x28
[...]
ffffffffc06cfaf8 cifs_readpages+0x213
ffffffff90e83c4b read_pages+0x6b
ffffffff90e83f31 __do_page_cache_readahead+0x1c1
ffffffff90e79e98 filemap_fault+0x788
ffffffff90eb0458 __do_fault+0x38
ffffffff90eb5280 do_fault+0x1a0
ffffffff90eb7c84 __handle_mm_fault+0x4d4
ffffffff90eb8093 handle_mm_fault+0xc3
ffffffff90c74f6d __do_page_fault+0x1ed
ffffffff90c75277 do_page_fault+0x37
ffffffff9160111e page_fault+0x1e
ffffffff9109e7b5 copyin+0x25
ffffffff9109eb40 _copy_from_iter_full+0xe0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139815c sock_sendmsg+0x4c
ffffffff913981f7 sock_write_iter+0x97
ffffffff90f2cc56 do_iter_readv_writev+0x156
ffffffff90f2dff0 do_iter_write+0x80
ffffffff90f2e1c3 vfs_writev+0xa3
ffffffff90f2e27c do_writev+0x5c
ffffffff90c042bb do_syscall_64+0x5b
ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
we can avoid the nesting using the sk page frag for allocation
lacking the __GFP_FS flag. Do not define an additional mm-helper
for that, as this is strictly tied to the sk page frag usage.
v1 -> v2:
- use a stricted sk_page_frag() check instead of reordering the
code (Eric)
Reported-by: Steffen Froemer <sfroemer(a)redhat.com>
Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/include/net/sock.h b/include/net/sock.h
index b32906e1ab55..715cdb4b2b79 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2430,19 +2430,22 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk)
* @sk: socket
*
* Use the per task page_frag instead of the per socket one for
- * optimization when we know that we're in the normal context and owns
+ * optimization when we know that we're in process context and own
* everything that's associated with %current.
*
- * gfpflags_allow_blocking() isn't enough here as direct reclaim may nest
- * inside other socket operations and end up recursing into sk_page_frag()
- * while it's already in use.
+ * Both direct reclaim and page faults can nest inside other
+ * socket operations and end up recursing into sk_page_frag()
+ * while it's already in use: explicitly avoid task page_frag
+ * usage if the caller is potentially doing any of them.
+ * This assumes that page fault handlers use the GFP_NOFS flags.
*
* Return: a per task page_frag if context allows that,
* otherwise a per socket one.
*/
static inline struct page_frag *sk_page_frag(struct sock *sk)
{
- if (gfpflags_normal_context(sk->sk_allocation))
+ if ((sk->sk_allocation & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC | __GFP_FS)) ==
+ (__GFP_DIRECT_RECLAIM | __GFP_FS))
return ¤t->task_frag;
return &sk->sk_frag;