On Tue, Nov 23, 2021 at 01:48:32PM +0100, msizanoen1 wrote:
> commit cdef485217d30382f3bf6448c54b4401648fe3f1 upstream.
>
> The kernel leaks memory when a `fib` rule is present in IPv6 nftables
> firewall rules and a suppress_prefix rule is present in the IPv6 routing
> rules (used by certain tools such as wg-quick). In such scenarios, every
> incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
>
> After some hours of `bpftrace`-ing and source code reading, I tracked
> down the issue to ca7a03c41753 ("ipv6: do not free rt if
> FIB_LOOKUP_NOREF is set on suppress rule").
>
> The problem with that change is that the generic `args->flags` always have
> `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
> `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
> decreasing the refcount when needed.
>
> How to reproduce:
> - Add the following nftables rule to a prerouting chain:
> meta nfproto ipv6 fib saddr . mark . iif oif missing drop
> This can be done with:
> sudo nft create table inet test
> sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
> sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
> - Run:
> sudo ip -6 rule add table main suppress_prefixlength 0
> - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
> with every incoming ipv6 packet.
>
> This patch exposes the protocol-specific flags to the protocol
> specific `suppress` function, and check the protocol-specific `flags`
> argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
> FIB_LOOKUP_NOREF when decreasing the refcount, like this.
>
> [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c326…
> [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c326…
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
> Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> This is a backport of the patch to the 5.4 LTS kernels.
Wonderful, now queued up, thanks!
greg k-h
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0c980a006d3fbee86c4d0698f66d6f5381831787 Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime(a)cerno.tech>
Date: Wed, 17 Nov 2021 10:45:22 +0100
Subject: [PATCH] drm/vc4: kms: Wait for the commit before increasing our clock
rate
Several DRM/KMS atomic commits can run in parallel if they affect
different CRTC. These commits share the global HVS state, so we have
some code to make sure we run commits in sequence. This synchronization
code is one of the first thing that runs in vc4_atomic_commit_tail().
Another constraints we have is that we need to make sure the HVS clock
gets a boost during the commit. That code relies on clk_set_min_rate and
will remove the old minimum and set a new one. We also need another,
temporary, minimum for the duration of the commit.
The algorithm is thus to set a temporary minimum, drop the previous
one, do the commit, and finally set the minimum for the current mode.
However, the part that sets the temporary minimum and drops the older
one runs before the commit synchronization code.
Thus, under the proper conditions, we can end up mixing up the minimums
and ending up with the wrong one for our current step.
To avoid it, let's move the clock setup in the protected section.
Fixes: d7d96c00e585 ("drm/vc4: hvs: Boost the core clock during modeset")
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Tested-by: Jian-Hong Pan <jhp(a)endlessos.org>
Link: https://lore.kernel.org/r/20211117094527.146275-2-maxime@cerno.tech
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index f0b3e4cf5bce..764ddb41a4ce 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -353,9 +353,6 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state)
vc4_hvs_mask_underrun(dev, vc4_crtc_state->assigned_channel);
}
- if (vc4->hvs->hvs5)
- clk_set_min_rate(hvs->core_clk, 500000000);
-
old_hvs_state = vc4_hvs_get_old_global_state(state);
if (!old_hvs_state)
return;
@@ -377,6 +374,9 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state *state)
drm_err(dev, "Timed out waiting for commit\n");
}
+ if (vc4->hvs->hvs5)
+ clk_set_min_rate(hvs->core_clk, 500000000);
+
drm_atomic_helper_commit_modeset_disables(dev, state);
vc4_ctm_commit(vc4, state);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f8e7dfd6fdabb831846ab1970a875746559d491b Mon Sep 17 00:00:00 2001
From: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Date: Fri, 26 Nov 2021 16:51:15 +0100
Subject: [PATCH] net: stmmac: Avoid DMA_CHAN_CONTROL write if no Split Header
support
The driver assumes that split headers can be enabled/disabled without
stopping/starting the device, so it writes DMA_CHAN_CONTROL from
stmmac_set_features(). However, on my system (IP v5.10a without Split
Header support), simply writing DMA_CHAN_CONTROL when DMA is running
(for example, with the commands below) leads to a TX watchdog timeout.
host$ socat TCP-LISTEN:1024,fork,reuseaddr - &
device$ ethtool -K eth0 tso off
device$ ethtool -K eth0 tso on
device$ dd if=/dev/zero bs=1M count=10 | socat - TCP4:host:1024
<tx watchdog timeout>
Note that since my IP is configured without Split Header support, the
driver always just reads and writes the same value to the
DMA_CHAN_CONTROL register.
I don't have access to any platforms with Split Header support so I
don't know if these writes to the DMA_CHAN_CONTROL while DMA is running
actually work properly on such systems. I could not find anything in
the databook that says that DMA_CHAN_CONTROL should not be written when
the DMA is running.
But on systems without Split Header support, there is in any case no
need to call enable_sph() in stmmac_set_features() at all since SPH can
never be toggled, so we can avoid the watchdog timeout there by skipping
this call.
Fixes: 8c6fc097a2f4acf ("net: stmmac: gmac4+: Add Split Header support")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch(a)axis.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 748195697e5a..da8306f60730 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5540,8 +5540,6 @@ static int stmmac_set_features(struct net_device *netdev,
netdev_features_t features)
{
struct stmmac_priv *priv = netdev_priv(netdev);
- bool sph_en;
- u32 chan;
/* Keep the COE Type in case of csum is supporting */
if (features & NETIF_F_RXCSUM)
@@ -5553,10 +5551,13 @@ static int stmmac_set_features(struct net_device *netdev,
*/
stmmac_rx_ipc(priv, priv->hw);
- sph_en = (priv->hw->rx_csum > 0) && priv->sph;
+ if (priv->sph_cap) {
+ bool sph_en = (priv->hw->rx_csum > 0) && priv->sph;
+ u32 chan;
- for (chan = 0; chan < priv->plat->rx_queues_to_use; chan++)
- stmmac_enable_sph(priv, priv->ioaddr, sph_en, chan);
+ for (chan = 0; chan < priv->plat->rx_queues_to_use; chan++)
+ stmmac_enable_sph(priv, priv->ioaddr, sph_en, chan);
+ }
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From addad7643142f500080417dd7272f49b7a185570 Mon Sep 17 00:00:00 2001
From: Zhou Qingyang <zhou1615(a)umn.edu>
Date: Wed, 1 Dec 2021 00:44:38 +0800
Subject: [PATCH] net/mlx4_en: Fix an use-after-free bug in
mlx4_en_try_alloc_resources()
In mlx4_en_try_alloc_resources(), mlx4_en_copy_priv() is called and
tmp->tx_cq will be freed on the error path of mlx4_en_copy_priv().
After that mlx4_en_alloc_resources() is called and there is a dereference
of &tmp->tx_cq[t][i] in mlx4_en_alloc_resources(), which could lead to
a use after free problem on failure of mlx4_en_copy_priv().
Fix this bug by adding a check of mlx4_en_copy_priv()
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_MLX4_EN=m show no new warnings,
and our static analyzer no longer warns about this code.
Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems")
Signed-off-by: Zhou Qingyang <zhou1615(a)umn.edu>
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Link: https://lore.kernel.org/r/20211130164438.190591-1-zhou1615@umn.edu
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 3f6d5c384637..f1c10f2bda78 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2286,9 +2286,14 @@ int mlx4_en_try_alloc_resources(struct mlx4_en_priv *priv,
bool carry_xdp_prog)
{
struct bpf_prog *xdp_prog;
- int i, t;
+ int i, t, ret;
- mlx4_en_copy_priv(tmp, priv, prof);
+ ret = mlx4_en_copy_priv(tmp, priv, prof);
+ if (ret) {
+ en_warn(priv, "%s: mlx4_en_copy_priv() failed, return\n",
+ __func__);
+ return ret;
+ }
if (mlx4_en_alloc_resources(tmp)) {
en_warn(priv,