From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit 333e4d79316c9ed5877d7aac8b8ed22efc74e96d ]
The GID cache warning messages can flood the kernel log when there are multiple failed attempts to add GIDs. This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table.
Change pr_warn to pr_warn_ratelimited to prevent log flooding while still maintaining visibility of the issue.
Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.17500623... Signed-off-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Key Reasons for Backporting:
### 1. **Fixes a Real-World Bug** The commit addresses a concrete problem where warning messages can flood the kernel log. This is not a theoretical issue - it occurs when users create many virtual interfaces without sufficient GID table space. This is a common scenario in production environments with RDMA/InfiniBand deployments.
### 2. **Small and Contained Change** The change is minimal - only replacing `pr_warn()` with `pr_warn_ratelimited()` on lines 585-586. This is exactly 2 lines of functional change with no architectural modifications or new features added.
### 3. **Low Risk of Regression** - The change only affects logging behavior, not functional behavior - `pr_warn_ratelimited()` is a well-established kernel API that behaves identically to `pr_warn()` except for rate limiting - No data structures are modified - No locking changes - No protocol or API changes
### 4. **Follows Stable Kernel Rules** According to stable kernel rules, this fix qualifies because: - It fixes a bug that bothers people (kernel log flooding) - It's a trivial change (2 lines) - It has been tested in mainline - It doesn't add new features
### 5. **Similar to Historical Backports** Looking at the similar commits provided: - Similar Commit #1 (backported): Fixed a hardware limitation issue with clear bounds - Similar Commit #2 (NOT backported): Added rate limiting to ipoib_warn but was more invasive with macro changes - This commit is more like #1 - a targeted fix for a specific issue
### 6. **Production Impact** Log flooding can have serious production impacts: - Fills up `/var/log` partitions - Makes it difficult to find other important kernel messages - Can impact system performance due to excessive logging - Can trigger log rotation issues
### 7. **Context-Specific Analysis** The error path where this warning occurs (line 571: `ret = -ENOSPC`) specifically handles the case when the GID table is full. In environments with many virtual interfaces, this condition can be hit repeatedly in quick succession, causing the flood described in the commit message.
The commit message clearly states this is fixing an observed problem: "This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table."
This is a perfect example of a stable-appropriate fix: minimal change, fixes a real bug, no risk of functional regression, and improves system stability in production environments.
drivers/infiniband/core/cache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 0b88203720b05..77c0b89259911 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -582,8 +582,8 @@ static int __ib_cache_gid_add(struct ib_device *ib_dev, u32 port, out_unlock: mutex_unlock(&table->lock); if (ret) - pr_warn("%s: unable to add gid %pI6 error=%d\n", - __func__, gid->raw, ret); + pr_warn_ratelimited("%s: unable to add gid %pI6 error=%d\n", + __func__, gid->raw, ret); return ret; }