 
            [Re: Possible linux-stable mis-backport in ethernet/mellanox/mlx5] On 09/06/2020 (Tue 19:29) Greg Kroah-Hartman wrote:
On Sun, Jun 07, 2020 at 04:34:25PM -0400, Paul Gortmaker wrote:
I happened to notice this commit:
9ca415399dae - "net/mlx5: Annotate mutex destroy for root ns"
...was backported to 4.19 and 5.4 and v5.6 in linux-stable.
It patches del_sw_root_ns() - which only exists after v5.7-rc7 from:
6eb7a268a99b - "net/mlx5: Don't maintain a case of del_sw_func being null"
which creates the one line del_sw_root_ns stub function around kfree(node) by breaking it out of tree_put_node().
In the absense of del_sw_root_ns - the backport finds an identical one line kfree stub fcn - named del_sw_prio from this earlier commit:
139ed6c6c46a - "net/mlx5: Fix steering memory leak" [in v4.15-rc5]
and then puts the mutex_destroy() into that (wrong) function, instead of putting it into tree_put_node where the root ns case used to be handled.
Ugh, good catch. I'll go revert this from everywhere. If you could, can you provide a working backport?
Maybe the simplest option is to just forget the commit, now that you reverted the mis-applied version. In doing so, we are assuming mutex_destroy is just a trap for future bugs/issues and not really a run-time bug-fix (it is a no-op unless CONFIG_DEBUG_MUTEXES=y).
Or, if you want a valid backport I'd suggest the following:
Patching tree_put_node() in-place with the mutex_destroy results in long lines, and isn't very future-proof if there are future mellanox backports looking for del_sw_root_ns() -- and IIRC, the mellanox stuff does seem to see a higher than average number of stable backports.
Instead as indicated below, establishing del_sw_root_ns() in the various currently active stable versions can be achieved w/o any real runtime impact - thus allowing for any future backports to have a higher probability of applying, and applying properly.
I've read the code, and compile tested the below on x86-64 allmodconfig, but I'm not familiar with the code and don't have the hardware, so I defer to the Mellanox guys to double check on what I've outlined below.
Hope this helps, Paul.
v5.6 ===== -apply 6eb7a268a99b so there is now a del_sw_root_ns(). [applies as-is] This commit claims that it "Fixes: 2cc43b494a6c" [v4.5-rc1] but my reading of the commit just says to me that it simplifies the code by assigning the root ns a trivial delete so it doesn't have to be special-cased. Runtime doesn't appear to change IMHO.
-now apply 9ca415399dae - the original mutex_destroy() commit, and it will apply as-is, and in the right function this time.`
v5.4 ===== -apply 476a028f0a2d ("net/mlx5: Fix cleaning unmanaged flow tables"). While it claims that it "Fixes: 5281a0c90919" [v5.6-rc1] - and I'm assuming that is true, but what it really does is remove all the fragile assumptions about who has a parent and how that translates to being the root ns or not -- and simply replaces it with the more logical "if you have a delete function, then I'll use it."
-now you can do the v5.6 steps above.
v4.19 ===== -two choices here, because v4.19 doesn't have 476d61b783e5 ("net/mlx5: Add a locked flag to node removal functions") [v5.1-rc1] which does: -static void down_write_ref_node(struct fs_node *node) +static void down_write_ref_node(struct fs_node *node, bool locked) ...and the same for up.
Choice one is to strip the ", locked" from the commits used above in v5.4/5.6; choice #2 is to simply apply this 476d61 as it applies as is with only one minor fuzz warn, and as the commit says itself, it is inert since "locked == false" at this stage of development.
-I went with #2 and then the steps above in v5.4 went "hands-free"
--
thanks,
greg k-h