From: Chen Linxuan chenlinxuan@uniontech.com
[ Upstream commit ffe1cee21f8b533ae27c3a31bfa56b8c1b27fa6e ]
On x86_64 with gcc version 13.3.0, I compile drivers/infiniband/hw/hns/hns_roce_hw_v2.c with:
make defconfig ./scripts/kconfig/merge_config.sh .config <( echo CONFIG_COMPILE_TEST=y echo CONFIG_HNS3=m echo CONFIG_INFINIBAND=m echo CONFIG_INFINIBAND_HNS_HIP08=m ) make KCFLAGS="-fno-inline-small-functions -fno-inline-functions-called-once" \ drivers/infiniband/hw/hns/hns_roce_hw_v2.o
Then I get a compile error:
CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers CC [M] drivers/infiniband/hw/hns/hns_roce_hw_v2.o In file included from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:47: drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'update_srq_db': drivers/infiniband/hw/hns/hns_roce_common.h:74:17: error: 'db' is used uninitialized [-Werror=uninitialized] 74 | *((__le32 *)_ptr + (field_h) / 32) &= \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:90:17: note: in expansion of macro '_hr_reg_clear' 90 | _hr_reg_clear(ptr, field_type, field_h, field_l); \ | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write' 95 | #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val) | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:948:9: note: in expansion of macro 'hr_reg_write' 948 | hr_reg_write(&db, DB_TAG, srq->srqn); | ^~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:946:31: note: 'db' declared here 946 | struct hns_roce_v2_db db; | ^~ cc1: all warnings being treated as errors
Signed-off-by: Chen Linxuan chenlinxuan@uniontech.com Co-developed-by: Winston Wen wentao@uniontech.com Signed-off-by: Winston Wen wentao@uniontech.com Link: https://patch.msgid.link/FF922C77946229B6+20250411105459.90782-5-chenlinxuan... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my analysis of the commit and examination of the kernel source code, here is my assessment: **YES** This commit should be backported to stable kernel trees for the following extensive reasons: ## 1. **Critical Hardware Register Corruption Risk** The commit fixes a genuine bug where an uninitialized stack variable containing random data is written directly to hardware registers. In the `update_srq_db()` function: ```c struct hns_roce_v2_db db; // UNINITIALIZED - contains random stack data hr_reg_write(&db, DB_TAG, srq->srqn); // Only sets specific bits hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB); // Only sets specific bits hr_reg_write(&db, DB_PI, srq->idx_que.head); // Only sets specific bits hns_roce_write64(hr_dev, (__le32 *)&db, srq->db_reg); // Writes ENTIRE structure to hardware ``` The `hr_reg_write()` macros only modify specific bit fields within the 64-bit structure. Any bits not explicitly set retain their random initial values from the stack, which then get written to the hardware doorbell register. ## 2. **Pattern Inconsistency Indicates Bug** My examination of the codebase shows that ALL other similar functions correctly initialize their database structures: - `update_sq_db()`: `struct hns_roce_v2_db sq_db = {};` ✓ - `update_rq_db()`: `struct hns_roce_v2_db rq_db = {};` ✓ - `update_cq_db()`: `struct hns_roce_v2_db cq_db = {};` ✓ - `update_srq_db()`: `struct hns_roce_v2_db db;` ✗ (the only exception) This pattern strongly indicates that the uninitialized `db` variable in `update_srq_db()` is a bug rather than intentional design. ## 3. **Real Runtime Impact Potential** This is not merely a cosmetic compiler warning. The uninitialized data can cause: - **Subtle SRQ functionality degradation**: Random bits in hardware doorbell registers can confuse the hardware - **Performance issues**: Incorrect doorbell values may cause hardware to misinterpret commands - **Silent data corruption**: Unlike crashes, this bug could cause hard-to-debug networking issues - **Hardware state corruption**: Writing garbage to hardware registers can put the device in unexpected states ## 4. **Production Environment Risk** SRQ (Shared Receive Queue) is a critical InfiniBand/RoCE feature used in high-performance computing and data center environments. Any corruption in SRQ operations can: - Impact network throughput and latency - Cause intermittent packet loss or reordering - Lead to difficult-to-reproduce performance regressions - Affect reliability of distributed applications ## 5. **Low-Risk, High-Value Fix** The fix is minimal and safe: - Changes only initialization: `struct hns_roce_v2_db db = {};` - Zero-initialization has no functional side effects - Follows established patterns in the same driver - No API changes or behavioral modifications - Extremely low regression risk ## 6. **Similar Historical Precedent** Looking at the similar commits provided, commit #4 (Status: YES) shows a nearly identical fix for uninitialized variable usage in the same driver family: ```c doorbell[0] = cons_index & ((hr_cq->cq_depth << 1) - 1); +doorbell[1] = 0; // Initialize before use roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1); ``` This historical precedent confirms that uninitialized variable fixes in HNS RoCE hardware register operations are considered important enough for stable backporting. ## 7. **Affects Multiple Compiler Versions** The commit message shows this affects GCC 13.3.0 with specific optimization flags, but uninitialized variable usage is fundamentally unsafe regardless of compiler version or warning visibility. **Conclusion**: This commit fixes a real hardware register corruption bug with potential for silent data corruption in production RoCE/InfiniBand environments. The fix is minimal, safe, and follows established patterns. It should be backported to ensure system reliability in stable kernel deployments.
drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 160e8927d364e..afd2ea6da3ee2 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -943,7 +943,7 @@ static void fill_wqe_idx(struct hns_roce_srq *srq, unsigned int wqe_idx) static void update_srq_db(struct hns_roce_srq *srq) { struct hns_roce_dev *hr_dev = to_hr_dev(srq->ibsrq.device); - struct hns_roce_v2_db db; + struct hns_roce_v2_db db = {};
hr_reg_write(&db, DB_TAG, srq->srqn); hr_reg_write(&db, DB_CMD, HNS_ROCE_V2_SRQ_DB);