From: Tianyu Yuan tianyu.yuan@corigine.com
When CONFIG_PROVE_LOCKING and CONFIG_PROVE_RCU are enabled, using OVS with vf reprs on bridge will lead to following log in dmesg:
''' .../nfp/flower/main.c:269 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/15/0.
...... Call Trace: <IRQ> dump_stack_lvl+0x8c/0xa0 lockdep_rcu_suspicious+0x118/0x1a0 nfp_flower_dev_get+0xc1/0x240 [nfp] nfp_nfd3_rx+0x419/0xb90 [nfp] ? validate_chain+0x640/0x1880 nfp_nfd3_poll+0x3e/0x180 [nfp] __napi_poll+0x28/0x1d0 net_rx_action+0x2bd/0x3c0 ? _raw_spin_unlock_irqrestore+0x42/0x70 __do_softirq+0xc3/0x3c6 irq_exit_rcu+0xeb/0x130 common_interrupt+0xb9/0xd0 </IRQ> <TASK> ...... </TASK> '''
This debug log is caused by missing of rcu_read_lock()/unlock(). In previous patch rcu_read_lock/unlock are removed while they are still needed when calling rcu_dereference() in nfp_app_dev_get().
Fixes: d5789621b658 ("nfp: Remove rcu_read_lock() around XDP program invocation") CC: stable@vger.kernel.org Signed-off-by: Tianyu Yuan tianyu.yuan@corigine.com Acked-by: Simon Horman simon.horman@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com --- drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 2 +- drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfp_app.h | 16 ++++++++++++++++ 4 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c index 0cc026b0aefd..3e5a84370d8a 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/dp.c @@ -1058,8 +1058,8 @@ static int nfp_nfd3_rx(struct nfp_net_rx_ring *rx_ring, int budget) struct nfp_net *nn;
nn = netdev_priv(dp->netdev); - netdev = nfp_app_dev_get(nn->app, meta.portid, - &redir_egress); + netdev = nfp_app_dev_get_locked(nn->app, meta.portid, + &redir_egress); if (unlikely(!netdev)) { nfp_nfd3_rx_drop(dp, r_vec, rx_ring, rxbuf, NULL); diff --git a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c index 5d9db8c2a5b4..6ec5b0d430ea 100644 --- a/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c +++ b/drivers/net/ethernet/netronome/nfp/nfd3/xsk.c @@ -71,7 +71,7 @@ static void nfp_nfd3_xsk_rx_skb(struct nfp_net_rx_ring *rx_ring, } else { struct nfp_net *nn = netdev_priv(dp->netdev);
- netdev = nfp_app_dev_get(nn->app, meta->portid, NULL); + netdev = nfp_app_dev_get_locked(nn->app, meta->portid, NULL); if (unlikely(!netdev)) { nfp_net_xsk_rx_drop(r_vec, xrxbuf); return; diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c index 33b6d74adb4b..0ac68985d289 100644 --- a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c +++ b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c @@ -1177,8 +1177,8 @@ static int nfp_nfdk_rx(struct nfp_net_rx_ring *rx_ring, int budget) struct nfp_net *nn;
nn = netdev_priv(dp->netdev); - netdev = nfp_app_dev_get(nn->app, meta.portid, - &redir_egress); + netdev = nfp_app_dev_get_locked(nn->app, meta.portid, + &redir_egress); if (unlikely(!netdev)) { nfp_nfdk_rx_drop(dp, r_vec, rx_ring, rxbuf, NULL); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 90707346a4ef..639bb1a6349b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -417,6 +417,22 @@ struct net_device *nfp_app_dev_get(struct nfp_app *app, u32 id, return app->type->dev_get(app, id, redir_egress); }
+static inline +struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id, + bool *redir_egress) +{ + struct net_device *dev; + + if (unlikely(!app || !app->type->dev_get)) + return NULL; + + rcu_read_lock(); + dev = app->type->dev_get(app, id, redir_egress); + rcu_read_unlock(); + + return dev; +} + struct nfp_app *nfp_app_from_netdev(struct net_device *netdev);
u64 *nfp_app_port_get_stats(struct nfp_port *port, u64 *data);
On Tue, May 09, 2023 at 08:06:32AM +0200, Louis Peens wrote:
From: Tianyu Yuan tianyu.yuan@corigine.com
When CONFIG_PROVE_LOCKING and CONFIG_PROVE_RCU are enabled, using OVS with vf reprs on bridge will lead to following log in dmesg:
''' .../nfp/flower/main.c:269 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/15/0.
...... Call Trace:
<IRQ> dump_stack_lvl+0x8c/0xa0 lockdep_rcu_suspicious+0x118/0x1a0 nfp_flower_dev_get+0xc1/0x240 [nfp] nfp_nfd3_rx+0x419/0xb90 [nfp] ? validate_chain+0x640/0x1880 nfp_nfd3_poll+0x3e/0x180 [nfp] __napi_poll+0x28/0x1d0 net_rx_action+0x2bd/0x3c0 ? _raw_spin_unlock_irqrestore+0x42/0x70 __do_softirq+0xc3/0x3c6 irq_exit_rcu+0xeb/0x130 common_interrupt+0xb9/0xd0 </IRQ> <TASK> ...... </TASK> '''
This debug log is caused by missing of rcu_read_lock()/unlock(). In previous patch rcu_read_lock/unlock are removed while they are still needed when calling rcu_dereference() in nfp_app_dev_get().
Fixes: d5789621b658 ("nfp: Remove rcu_read_lock() around XDP program invocation") CC: stable@vger.kernel.org Signed-off-by: Tianyu Yuan tianyu.yuan@corigine.com Acked-by: Simon Horman simon.horman@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com
drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfd3/xsk.c | 2 +- drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 4 ++-- drivers/net/ethernet/netronome/nfp/nfp_app.h | 16 ++++++++++++++++ 4 files changed, 21 insertions(+), 5 deletions(-)
Thanks, Reviewed-by: Leon Romanovsky leonro@nvidia.com
On Tue, 9 May 2023 08:06:32 +0200 Louis Peens wrote:
+static inline +struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id,
_locked() in what way? RCU functions typically use an _rcu suffix, no?
bool *redir_egress)
+{
- struct net_device *dev;
- if (unlikely(!app || !app->type->dev_get))
return NULL;
- rcu_read_lock();
- dev = app->type->dev_get(app, id, redir_egress);
- rcu_read_unlock();
- return dev;
this looks very suspicious, RCU takes care primarily of the lifetime of objects, in this case dev. Returning it after dropping the lock seems wrong.
If the context is safe maybe it's a better idea to change the condition in rcu_dereference_check() to include rcu_read_lock_bh_held()?
On Tue, May 09, 2023 at 07:40:13PM -0700, Jakub Kicinski wrote:
On Tue, 9 May 2023 08:06:32 +0200 Louis Peens wrote:
+static inline +struct net_device *nfp_app_dev_get_locked(struct nfp_app *app, u32 id,
_locked() in what way? RCU functions typically use an _rcu suffix, no?
We were discussing the naming during internal review, for some reason didn't think about using _rcu, will update if there is a v2.
bool *redir_egress)
+{
- struct net_device *dev;
- if (unlikely(!app || !app->type->dev_get))
return NULL;
- rcu_read_lock();
- dev = app->type->dev_get(app, id, redir_egress);
- rcu_read_unlock();
- return dev;
this looks very suspicious, RCU takes care primarily of the lifetime of objects, in this case dev. Returning it after dropping the lock seems wrong.
If the context is safe maybe it's a better idea to change the condition in rcu_dereference_check() to include rcu_read_lock_bh_held()?
Thanks, will take a closer look at this.
-- pw-bot: cr
linux-stable-mirror@lists.linaro.org