On Fri, Jan 07, 2022 at 06:57:51AM +0100, Thorsten Leemhuis wrote:
Hi Greg!
On 01.01.22 11:56, Thorsten Leemhuis wrote:
Hi, this is your Linux kernel regression tracker speaking.
On 15.12.21 18:20, Greg Kroah-Hartman wrote:
From: Alaa Hleihel alaa@nvidia.com
[ Upstream commit f0ae4afe3d35e67db042c58a52909e06262b740f ]
For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though it is a user MR. This causes function mlx5_free_priv_descs() to think that it is a kernel MR, leading to wrongly accessing mr->descs that will get wrong values in the union which leads to attempt to release resources that were not allocated in the first place.
TWIMC, that commit made it into 5.15.y, but is known to cause a regression in v5.16-rc:
https://lore.kernel.org/lkml/f298db4ec5fdf7a2d1d166ca2f66020fd9397e5c.164007... https://lore.kernel.org/all/EEBA2D1C-F29C-4237-901C-587B60CEE113@oracle.com/
A fix for mainline was posted, but got stuck afaics: https://lore.kernel.org/lkml/f298db4ec5fdf7a2d1d166ca2f66020fd9397e5c.164007...
A revert was also discussed, but not performed: https://lore.kernel.org/all/20211222101312.1358616-1-maorg@nvidia.com/
I assume your scripts will catch this, nevertheless FYI:
Below patch was reverted in mainline, as it "is not the full fix and still causes to call traces". You likely want to revert it from v5.15.y as well. For details see
4163cb3d1980 ("Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"")
https://git.kernel.org/torvalds/c/4163cb3d1980383220ad7043002b930995dcba33
Thanks for the heads-up, I have now queued this patch up for 5.15.y.
greg k-h