From: Chandrakanth Patil chandrakanth.patil@broadcom.com
[ Upstream commit d6c8e8b7c98c3cb326515ef4bc5c57e16ac5ae4e ]
During enclosure reboot or expander reset, firmware may report a link speed of 0 in "Device Add" events while the link is still coming up. The driver drops such devices, leaving them missing even after the link recovers.
Fix this by treating link speed 0 as 1.5 Gbps during device addition so the device is exposed to the OS. The actual link speed will be updated later when link-up events arrive.
Signed-off-by: Chandrakanth Patil chandrakanth.patil@broadcom.com Link: https://lore.kernel.org/r/20250820084138.228471-2-chandrakanth.patil@broadco... Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my comprehensive analysis of this commit, I can now provide my final recommendation.
## **BACKPORT RECOMMENDATION: YES**
---
## **DETAILED ANALYSIS**
### **1. PROBLEM DESCRIPTION**
The commit fixes a critical device loss issue in the Broadcom mpi3mr SAS/SATA HBA driver. During enclosure reboot or expander reset operations, the firmware reports a link speed of 0 (MPI3_SAS_NEG_LINK_RATE_UNKNOWN_LINK_RATE = 0x00) in "Device Add" events while the physical link is still initializing. The driver was incorrectly dropping these devices, and they remained missing even after the link fully recovered to operational speed.
### **2. CODE CHANGES ANALYSIS**
The fix consists of four distinct changes across two files:
#### **Change 1: mpi3mr_expander_add() (mpi3mr_transport.c:2084-2085)** ```c +if (link_rate < MPI3_SAS_NEG_LINK_RATE_1_5) + link_rate = MPI3_SAS_NEG_LINK_RATE_1_5; ``` **Impact**: During expander device addition, treats link speeds below 1.5 Gbps (including 0) as 1.5 Gbps, allowing the device to be exposed to the OS.
#### **Change 2: mpi3mr_report_tgtdev_to_sas_transport() (mpi3mr_transport.c:2395-2396)** ```c +if (link_rate < MPI3_SAS_NEG_LINK_RATE_1_5) + link_rate = MPI3_SAS_NEG_LINK_RATE_1_5; ``` **Impact**: Same treatment for target device reporting to SAS transport layer.
#### **Change 3: mpi3mr_remove_device_by_sas_address() (mpi3mr_transport.c:417-420)** ```c -list_del_init(&tgtdev->list); was_on_tgtdev_list = 1; -mpi3mr_tgtdev_put(tgtdev); +if (tgtdev->state == MPI3MR_DEV_REMOVE_HS_STARTED) { + list_del_init(&tgtdev->list); + mpi3mr_tgtdev_put(tgtdev); +} ``` **Impact**: Prevents premature device list deletion by checking the device state. Only removes devices from the list if they're in the MPI3MR_DEV_REMOVE_HS_STARTED state, avoiding race conditions during device state transitions.
#### **Change 4: Debug logging improvements (mpi3mr_os.c:2058, 3078)** **Impact**: Adds event context (0x%08x) to debug messages for better diagnostics. Purely cosmetic, aids debugging.
### **3. HISTORICAL CONTEXT & PATTERN CONSISTENCY**
My research reveals this fix **extends an existing pattern** already established in the codebase:
- **Commit 42fc9fee116fc6** (August 2022, v6.1): Introduced similar link rate handling in `mpi3mr_sas_host_refresh()` at line 1174: ```c if (attached_handle && link_rate < MPI3_SAS_NEG_LINK_RATE_1_5) link_rate = MPI3_SAS_NEG_LINK_RATE_1_5; ```
- **Commit 3f1254ed01d086** (March 2023, v6.4): Added the `mpi3mr_dev_state` enum to fix "Successive VD delete and add causes FW fault"
This commit applies the same defensive link rate handling to two additional code paths that were missing it.
### **4. DEPENDENCY ANALYSIS**
**Required for v6.4+:** - ✅ MPI3_SAS_NEG_LINK_RATE constants (present since driver introduction) - ✅ mpi3mr_update_links() function (added v6.1) - ✅ `enum mpi3mr_dev_state` with MPI3MR_DEV_REMOVE_HS_STARTED (added v6.4)
**Backporting to < v6.4:** Would require either: 1. Backporting commit 3f1254ed01d086 first, OR 2. Omitting the device state check portion (changes 1-2 would still provide value)
### **5. RISK ASSESSMENT**
**RISK LEVEL: LOW**
✅ **Positive factors:** - Small, surgical changes (13 insertions, 6 deletions) - Follows established code pattern (line 1174) - No API changes or function signature modifications - Confined to single driver subsystem (mpi3mr) - No new functionality - purely defensive fix - All code paths already exist, just adding validation
⚠️ **Considerations:** - Device state check requires v6.4+ (manageable dependency) - Affects device lifecycle management (but improves correctness)
### **6. USER IMPACT**
**SEVERITY: HIGH - User-Visible Bug**
**Affected scenario:** Common operational maintenance activities: - Enclosure reboots/power cycles - Expander resets - Fabric reconfiguration - Hot-plug operations
**User experience without fix:** 1. Enclosure/expander is rebooted for maintenance 2. Firmware sends Device Add events with link_rate=0 during link initialization 3. Driver drops the device 4. Link comes up to full speed (1.5/3.0/6.0/12.0 Gbps) 5. **Device remains missing permanently** - requires driver reload or system reboot
**User experience with fix:** 1. Enclosure/expander is rebooted 2. Device is added with temporary 1.5 Gbps link rate 3. Link completes initialization 4. Actual link speed is updated via subsequent link-up events 5. **Device remains accessible throughout**
**No workaround available** - users cannot prevent firmware from reporting link_rate=0 during initialization.
### **7. TESTING & VALIDATION**
Based on git history analysis: - ✅ No reverts found for this commit - ✅ No follow-up regression fixes found - ✅ Similar pattern at line 1174 has been stable since v6.1 (2+ years) - ✅ One stable backport already exists (837115eeb55b3), indicating it's being considered for stable trees
### **8. STABLE TREE CRITERIA COMPLIANCE**
Checking against stable kernel rules:
| Criterion | Status | Evidence | |-----------|--------|----------| | **Fixes important bug** | ✅ YES | Devices lost during enclosure operations | | **User-visible impact** | ✅ YES | Missing storage devices, requires reboot | | **Small and contained** | ✅ YES | 13 lines added, 6 removed, 2 files | | **No new features** | ✅ YES | Pure bug fix | | **Minimal regression risk** | ✅ YES | Follows existing pattern, low complexity | | **Confined to subsystem** | ✅ YES | Only affects mpi3mr driver | | **Real-world issue** | ✅ YES | Common during enclosure maintenance |
**Note:** Commit does NOT have explicit Cc: stable tag, but meets all technical criteria.
### **9. ARCHITECTURAL CONSIDERATIONS**
The fix demonstrates good defensive programming:
1. **Graceful degradation**: Treats unknown/zero link speed as minimum viable speed rather than failing 2. **Eventual consistency**: Allows initial device exposure with conservative speed, updates later 3. **State machine correctness**: Device removal now properly respects state transitions 4. **Consistency**: Applies same defensive pattern across all device addition code paths
### **10. FINAL RECOMMENDATION**
## **YES - STRONGLY RECOMMEND BACKPORTING to v6.4+**
**Justification:**
1. ✅ **Critical user-facing bug** - Devices permanently lost during routine maintenance 2. ✅ **Low risk** - Small, well-contained changes following established patterns 3. ✅ **High confidence** - Similar code has been stable since v6.1 4. ✅ **Clean dependencies** - All prerequisites present in v6.4+ 5. ✅ **Stable tree appropriate** - Meets all criteria for important bugfixes 6. ✅ **Common scenario** - Affects standard operational procedures 7. ✅ **No workarounds** - Users cannot avoid this firmware behavior
**Target stable versions:** 6.4+, 6.6 LTS, 6.12 LTS
**Adaptation note for < v6.4:** If backporting to kernels older than v6.4, the device state check in `mpi3mr_remove_device_by_sas_address()` would need to be omitted or commit 3f1254ed01d086 backported first. However, the link rate fixes (changes 1-2) alone would still provide significant value.
drivers/scsi/mpi3mr/mpi3mr_os.c | 8 ++++---- drivers/scsi/mpi3mr/mpi3mr_transport.c | 11 +++++++++-- 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c index e467b56949e98..1582cdbc66302 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_os.c +++ b/drivers/scsi/mpi3mr/mpi3mr_os.c @@ -2049,8 +2049,8 @@ static void mpi3mr_fwevt_bh(struct mpi3mr_ioc *mrioc, if (!fwevt->process_evt) goto evt_ack;
- dprint_event_bh(mrioc, "processing event(0x%02x) in the bottom half handler\n", - fwevt->event_id); + dprint_event_bh(mrioc, "processing event(0x%02x) -(0x%08x) in the bottom half handler\n", + fwevt->event_id, fwevt->evt_ctx);
switch (fwevt->event_id) { case MPI3_EVENT_DEVICE_ADDED: @@ -3076,8 +3076,8 @@ void mpi3mr_os_handle_events(struct mpi3mr_ioc *mrioc, } if (process_evt_bh || ack_req) { dprint_event_th(mrioc, - "scheduling bottom half handler for event(0x%02x),ack_required=%d\n", - evt_type, ack_req); + "scheduling bottom half handler for event(0x%02x) - (0x%08x), ack_required=%d\n", + evt_type, le32_to_cpu(event_reply->event_context), ack_req); sz = event_reply->event_data_length * 4; fwevt = mpi3mr_alloc_fwevt(sz); if (!fwevt) { diff --git a/drivers/scsi/mpi3mr/mpi3mr_transport.c b/drivers/scsi/mpi3mr/mpi3mr_transport.c index c8d6ced5640e9..d70f002d6487d 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_transport.c +++ b/drivers/scsi/mpi3mr/mpi3mr_transport.c @@ -413,9 +413,11 @@ static void mpi3mr_remove_device_by_sas_address(struct mpi3mr_ioc *mrioc, sas_address, hba_port); if (tgtdev) { if (!list_empty(&tgtdev->list)) { - list_del_init(&tgtdev->list); was_on_tgtdev_list = 1; - mpi3mr_tgtdev_put(tgtdev); + if (tgtdev->state == MPI3MR_DEV_REMOVE_HS_STARTED) { + list_del_init(&tgtdev->list); + mpi3mr_tgtdev_put(tgtdev); + } } } spin_unlock_irqrestore(&mrioc->tgtdev_lock, flags); @@ -2079,6 +2081,8 @@ int mpi3mr_expander_add(struct mpi3mr_ioc *mrioc, u16 handle) link_rate = (expander_pg1.negotiated_link_rate & MPI3_SAS_NEG_LINK_RATE_LOGICAL_MASK) >> MPI3_SAS_NEG_LINK_RATE_LOGICAL_SHIFT; + if (link_rate < MPI3_SAS_NEG_LINK_RATE_1_5) + link_rate = MPI3_SAS_NEG_LINK_RATE_1_5; mpi3mr_update_links(mrioc, sas_address_parent, handle, i, link_rate, hba_port); } @@ -2388,6 +2392,9 @@ int mpi3mr_report_tgtdev_to_sas_transport(struct mpi3mr_ioc *mrioc,
link_rate = mpi3mr_get_sas_negotiated_logical_linkrate(mrioc, tgtdev);
+ if (link_rate < MPI3_SAS_NEG_LINK_RATE_1_5) + link_rate = MPI3_SAS_NEG_LINK_RATE_1_5; + mpi3mr_update_links(mrioc, sas_address_parent, tgtdev->dev_handle, parent_phy_number, link_rate, hba_port);