Hi Pavel,
On 02/09/2021 23.50, Pavel Machek wrote:
[ Upstream commit fb4b1373dcab086d0619c29310f0466a0b2ceb8a ]
Function "dma_map_sg" is entitled to merge adjacent entries and return a value smaller than what was passed as "nents".
Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len") rather than the original "nents" parameter ("sg_len").
This old RDS bug was exposed and reliably causes kernel panics (using RDMA operations "rds-stress -D") on x86_64 starting with: commit c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Simply put: Linux 5.11 and later.
I see this queued for 4.19 and 5.10 where "iommu/vt-d: Convert intel iommu driver to the iommu ops" is not present. It may be okay for older kernels, too, but I wanted to double-check.
It should be okay for older kernels as well.
The bug has always been there, but only started to cause panics in cases where "dma_map_sg" actually did merge adjacent entries.
We bisected the crash down to the commit mentioned above (c588072bba6b), on platforms that use the intel iommu.
That intel-iommu commit wasn't there on Linux-5.10 and older. But the RDS bug was.
Hope this helps,
Gerd