From: Sumeet Pawnikar sumeet4linux@gmail.com
[ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ]
Fix inconsistent error handling for sscanf() return value check.
Implicit boolean conversion is used instead of explicit return value checks. The code checks if (!sscanf(...)) which is incorrect because: 1. sscanf returns the number of successfully parsed items 2. On success, it returns 1 (one item passed) 3. On failure, it returns 0 or EOF 4. The check 'if (!sscanf(...))' is wrong because it treats success (1) as failure
All occurrences of sscanf() now uses explicit return value check. With this behavior it returns '-EINVAL' when parsing fails (returns 0 or EOF), and continues when parsing succeeds (returns 1).
Signed-off-by: Sumeet Pawnikar sumeet4linux@gmail.com [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Looking at this commit carefully to assess its suitability for stable backporting.
## Commit Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit fixes incorrect error handling for `sscanf()` return value checks in the powercap subsystem. The message claims that `if (!sscanf(...))` is wrong because: - `sscanf()` returns the number of successfully parsed items - On success: returns 1 - On failure: returns 0 or EOF (typically -1)
### 2. CODE CHANGE ANALYSIS
Three identical changes in `drivers/powercap/powercap_sys.c`: ```c // Before: if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id))
// After: if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) ```
Let me trace through the bug: - When `sscanf()` returns 0 (no match): `!0` = 1 (true) → error path taken ✓ - When `sscanf()` returns 1 (success): `!1` = 0 (false) → continues ✓ - When `sscanf()` returns EOF (-1): `!(-1)` = 0 (false) → **error path NOT taken** ✗
The critical issue is the EOF case. If `sscanf()` returns EOF (e.g., empty string input), the check passes incorrectly and execution continues with an **uninitialized `id` variable**. This then leads to: ```c if (id >= power_zone->const_id_cnt) // garbage comparison return -EINVAL; pconst = &power_zone->constraints[id]; // POTENTIAL OUT-OF-BOUNDS ACCESS ```
### 3. CLASSIFICATION
This is a **bug fix** that prevents: - Potential out-of-bounds array access - Use of uninitialized variable - Possible kernel crash or memory corruption in edge cases
Not a feature addition, code cleanup, or optimization.
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 3 lines (identical pattern) - **Files affected**: 1 file - **Risk**: Extremely low - the change only makes the check stricter and more explicit - **Could break anything?**: No - the new check `!= 1` is strictly more conservative than `!`
### 5. USER IMPACT
The powercap subsystem manages: - Intel RAPL power capping - Power domain constraints - Used by tools like powertop, thermald
While the EOF triggering condition is rare (would require malformed attribute names), the consequence (OOB access) could be severe.
### 6. STABILITY INDICATORS
- Signed-off by Rafael J. Wysocki (Intel power management maintainer) - Self-contained fix with no dependencies - The powercap subsystem has existed since kernel 3.13
### 7. DEPENDENCY CHECK
- No dependencies on other commits - Simple standalone fix - Code exists in all active stable trees
## Summary
**What it fixes**: Incorrect sscanf error handling that could allow execution with uninitialized data, potentially leading to out-of-bounds array access.
**Why it matters for stable**: While the triggering condition (EOF from sscanf) is rare, the fix prevents a potential memory safety issue. The change is trivially correct with zero regression risk.
**Meets stable criteria**: - ✓ Obviously correct (explicit `!= 1` check is cleaner and more robust) - ✓ Fixes a real bug (uninitialized variable use, potential OOB access) - ✓ Small and contained (3 identical one-line changes) - ✓ No new features - ✓ No dependencies
**Risk vs benefit**: The benefit (preventing potential memory corruption) outweighs the near-zero risk of regression.
**YES**
drivers/powercap/powercap_sys.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c index d14b36b75189..1ff369880beb 100644 --- a/drivers/powercap/powercap_sys.c +++ b/drivers/powercap/powercap_sys.c @@ -68,7 +68,7 @@ static ssize_t show_constraint_##_attr(struct device *dev, \ int id; \ struct powercap_zone_constraint *pconst;\ \ - if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \ + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \ return -EINVAL; \ if (id >= power_zone->const_id_cnt) \ return -EINVAL; \ @@ -93,7 +93,7 @@ static ssize_t store_constraint_##_attr(struct device *dev,\ int id; \ struct powercap_zone_constraint *pconst;\ \ - if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \ + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) \ return -EINVAL; \ if (id >= power_zone->const_id_cnt) \ return -EINVAL; \ @@ -162,7 +162,7 @@ static ssize_t show_constraint_name(struct device *dev, ssize_t len = -ENODATA; struct powercap_zone_constraint *pconst;
- if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) + if (sscanf(dev_attr->attr.name, "constraint_%d_", &id) != 1) return -EINVAL; if (id >= power_zone->const_id_cnt) return -EINVAL;
From: Florian Westphal fw@strlen.de
[ Upstream commit 8e1a1bc4f5a42747c08130b8242ebebd1210b32f ]
Hamza Mahfooz reports cpu soft lock-ups in nft_chain_validate():
watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547] [..] RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables] [..] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_immediate_validate+0x36/0x50 [nf_tables] nft_chain_validate+0xc9/0x110 [nf_tables] nft_table_validate+0x6b/0xb0 [nf_tables] nf_tables_validate+0x8b/0xa0 [nf_tables] nf_tables_commit+0x1df/0x1eb0 [nf_tables] [..]
Currently nf_tables will traverse the entire table (chain graph), starting from the entry points (base chains), exploring all possible paths (chain jumps). But there are cases where we could avoid revalidation.
Consider: 1 input -> j2 -> j3 2 input -> j2 -> j3 3 input -> j1 -> j2 -> j3
Then the second rule does not need to revalidate j2, and, by extension j3, because this was already checked during validation of the first rule. We need to validate it only for rule 3.
This is needed because chain loop detection also ensures we do not exceed the jump stack: Just because we know that j2 is cycle free, its last jump might now exceed the allowed stack size. We also need to update all reachable chains with the new largest observed call depth.
Care has to be taken to revalidate even if the chain depth won't be an issue: chain validation also ensures that expressions are not called from invalid base chains. For example, the masquerade expression can only be called from NAT postrouting base chains.
Therefore we also need to keep record of the base chain context (type, hooknum) and revalidate if the chain becomes reachable from a different hook location.
Reported-by: Hamza Mahfooz hamzamahfooz@linux.microsoft.com Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3... Tested-by: Hamza Mahfooz hamzamahfooz@linux.microsoft.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Commit Analysis: netfilter: nf_tables: avoid chain re-validation if possible
### 1. COMMIT MESSAGE ANALYSIS
**Bug Indicators:** - Reports CPU soft lock-ups in `nft_chain_validate()` - a serious denial-of-service condition - Stack trace shows recursive validation causing 27+ second lockups - Has `Reported-by:` and `Tested-by:` tags from Hamza Mahfooz (Microsoft) - Has `Closes:` link to lore.kernel.org bug report
The commit explains that chain validation currently traverses the entire table graph, causing exponential complexity when chains are reachable through multiple paths. This is a clear performance/DoS bug, not a feature addition.
### 2. CODE CHANGE ANALYSIS
**Changes to nf_tables.h:** - Adds new `struct nft_chain_validate_state` with hook_mask and depth tracking - Adds `vstate` field to `struct nft_chain` for validation state caching - Changes `nft_chain_validate()` signature to allow chain modification
**Changes to nf_tables_api.c:** - `nft_chain_vstate_valid()`: New function checking if chain is already validated for current context - `nft_chain_vstate_update()`: Updates chain's validation state after successful validation - `nft_chain_validate()`: Added early-exit when chain already validated, added depth/base-chain checks - `nft_table_validate()`: Added cleanup loop to clear vstate after validation completes
**The technical fix:** Implements memoization to avoid redundant chain validation. If a chain has already been validated at the current call depth and hook context, skip re-validation.
### 3. CLASSIFICATION
This is a **bug fix** for a denial-of-service condition. While it adds new structures for state tracking, the purpose is purely to fix the exponential validation complexity causing soft lockups.
### 4. SCOPE AND RISK ASSESSMENT
**Size:** ~60 lines added across 2 files **Complexity:** Moderate - adds memoization logic with proper cleanup
**Risks:** - Changes validation logic which is security-critical - If memoization is incorrect, could theoretically allow invalid chains or block valid ones - New field in `struct nft_chain`
**Mitigations:** - Well-structured with defensive checks (`WARN_ON_ONCE`, `BUILD_BUG_ON`) - Cleanup loop at end of `nft_table_validate()` ensures no stale state - vstate only used during control plane commit phase - Trusted author (Florian Westphal, netfilter maintainer)
### 5. USER IMPACT
**Severity: HIGH** - CPU soft lockups make the system unresponsive for 27+ seconds - Triggered by iptables-nft/nftables operations with complex rulesets - Affects any user managing firewalls with multiple jump paths between chains - Could potentially be exploited for DoS if unprivileged users can modify rules
### 6. STABILITY INDICATORS
- `Reported-by:` and `Tested-by:` from the same reporter confirms the fix works - Author is a well-known netfilter maintainer - Clear technical explanation in commit message - The fix targets existing core netfilter code that exists in stable trees
### 7. DEPENDENCY CHECK
Looking at the code, this doesn't appear to depend on other recent commits. It uses existing macros and structures (`nft_is_base_chain`, `nft_base_chain`, etc.) that should exist in stable kernels.
### DECISION ANALYSIS
**Pro-backport:** 1. Fixes real user-reported DoS (CPU soft lockup for 27+ seconds) 2. Tested by reporter, confirmed to resolve the issue 3. Authored by trusted netfilter maintainer 4. Netfilter is critical infrastructure - firewall lockups are serious 5. The bug affects real-world complex rulesets
**Against-backport:** 1. Larger than typical stable patches (~60 lines) 2. Adds new data structure and field 3. Touches security-critical validation path 4. Some complexity in the memoization logic
### CONCLUSION
Despite being larger than a typical stable fix, this commit addresses a **severe denial-of-service condition** (soft lockups) in critical firewall infrastructure. The 27+ second lockups render systems unusable and this has real user impact. The fix is well-designed with proper cleanup, tested by the reporter, and authored by a trusted netfilter maintainer. The stable kernel rules explicitly allow fixes for "serious crash, deadlock" issues - soft lockups fall into this category.
The benefit (fixing DoS) significantly outweighs the risk, and the code quality/testing gives confidence in the fix's correctness.
**YES**
include/net/netfilter/nf_tables.h | 34 +++++++++++---- net/netfilter/nf_tables_api.c | 69 +++++++++++++++++++++++++++++-- 2 files changed, 91 insertions(+), 12 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index fab7dc73f738..0e266c2d0e7f 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1091,6 +1091,29 @@ struct nft_rule_blob { __attribute__((aligned(__alignof__(struct nft_rule_dp)))); };
+enum nft_chain_types { + NFT_CHAIN_T_DEFAULT = 0, + NFT_CHAIN_T_ROUTE, + NFT_CHAIN_T_NAT, + NFT_CHAIN_T_MAX +}; + +/** + * struct nft_chain_validate_state - validation state + * + * If a chain is encountered again during table validation it is + * possible to avoid revalidation provided the calling context is + * compatible. This structure stores relevant calling context of + * previous validations. + * + * @hook_mask: the hook numbers and locations the chain is linked to + * @depth: the deepest call chain level the chain is linked to + */ +struct nft_chain_validate_state { + u8 hook_mask[NFT_CHAIN_T_MAX]; + u8 depth; +}; + /** * struct nft_chain - nf_tables chain * @@ -1109,6 +1132,7 @@ struct nft_rule_blob { * @udlen: user data length * @udata: user data in the chain * @blob_next: rule blob pointer to the next in the chain + * @vstate: validation state */ struct nft_chain { struct nft_rule_blob __rcu *blob_gen_0; @@ -1128,9 +1152,10 @@ struct nft_chain {
/* Only used during control plane commit phase: */ struct nft_rule_blob *blob_next; + struct nft_chain_validate_state vstate; };
-int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain); +int nft_chain_validate(const struct nft_ctx *ctx, struct nft_chain *chain); int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_set_iter *iter, struct nft_elem_priv *elem_priv); @@ -1138,13 +1163,6 @@ int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set); int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain); void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
-enum nft_chain_types { - NFT_CHAIN_T_DEFAULT = 0, - NFT_CHAIN_T_ROUTE, - NFT_CHAIN_T_NAT, - NFT_CHAIN_T_MAX -}; - /** * struct nft_chain_type - nf_tables chain type info * diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index eed434e0a970..7fbfa1e5d27c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -123,6 +123,29 @@ static void nft_validate_state_update(struct nft_table *table, u8 new_validate_s
table->validate_state = new_validate_state; } + +static bool nft_chain_vstate_valid(const struct nft_ctx *ctx, + const struct nft_chain *chain) +{ + const struct nft_base_chain *base_chain; + enum nft_chain_types type; + u8 hooknum; + + if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain))) + return false; + + base_chain = nft_base_chain(ctx->chain); + hooknum = base_chain->ops.hooknum; + type = base_chain->type->type; + + /* chain is already validated for this call depth */ + if (chain->vstate.depth >= ctx->level && + chain->vstate.hook_mask[type] & BIT(hooknum)) + return true; + + return false; +} + static void nf_tables_trans_destroy_work(struct work_struct *w);
static void nft_trans_gc_work(struct work_struct *work); @@ -4079,6 +4102,29 @@ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *r nf_tables_rule_destroy(ctx, rule); }
+static void nft_chain_vstate_update(const struct nft_ctx *ctx, struct nft_chain *chain) +{ + const struct nft_base_chain *base_chain; + enum nft_chain_types type; + u8 hooknum; + + /* ctx->chain must hold the calling base chain. */ + if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain))) { + memset(&chain->vstate, 0, sizeof(chain->vstate)); + return; + } + + base_chain = nft_base_chain(ctx->chain); + hooknum = base_chain->ops.hooknum; + type = base_chain->type->type; + + BUILD_BUG_ON(BIT(NF_INET_NUMHOOKS) > U8_MAX); + + chain->vstate.hook_mask[type] |= BIT(hooknum); + if (chain->vstate.depth < ctx->level) + chain->vstate.depth = ctx->level; +} + /** nft_chain_validate - loop detection and hook validation * * @ctx: context containing call depth and base chain @@ -4088,15 +4134,25 @@ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *r * and set lookups until either the jump limit is hit or all reachable * chains have been validated. */ -int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) +int nft_chain_validate(const struct nft_ctx *ctx, struct nft_chain *chain) { struct nft_expr *expr, *last; struct nft_rule *rule; int err;
+ BUILD_BUG_ON(NFT_JUMP_STACK_SIZE > 255); if (ctx->level == NFT_JUMP_STACK_SIZE) return -EMLINK;
+ if (ctx->level > 0) { + /* jumps to base chains are not allowed. */ + if (nft_is_base_chain(chain)) + return -ELOOP; + + if (nft_chain_vstate_valid(ctx, chain)) + return 0; + } + list_for_each_entry(rule, &chain->rules, list) { if (fatal_signal_pending(current)) return -EINTR; @@ -4117,6 +4173,7 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) } }
+ nft_chain_vstate_update(ctx, chain); return 0; } EXPORT_SYMBOL_GPL(nft_chain_validate); @@ -4128,7 +4185,7 @@ static int nft_table_validate(struct net *net, const struct nft_table *table) .net = net, .family = table->family, }; - int err; + int err = 0;
list_for_each_entry(chain, &table->chains, list) { if (!nft_is_base_chain(chain)) @@ -4137,12 +4194,16 @@ static int nft_table_validate(struct net *net, const struct nft_table *table) ctx.chain = chain; err = nft_chain_validate(&ctx, chain); if (err < 0) - return err; + goto err;
cond_resched(); }
- return 0; +err: + list_for_each_entry(chain, &table->chains, list) + memset(&chain->vstate, 0, sizeof(chain->vstate)); + + return err; }
int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
Sasha Levin sashal@kernel.org wrote:
- if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain)))
return false;
This WARN will fire unless you also pick up a67fd55f6a09 ("netfilter: nf_tables: remove redundant chain validation on register store")
Alternatively, drop the WARN: "if (!nft_is_base_chain...".
From: Fei Shao fshao@chromium.org
[ Upstream commit 8c04b77f87e6e321ae6acd28ce1de5553916153f ]
This driver is migrated to use threaded IRQ since commit 5972eb05ca32 ("spi: spi-mt65xx: Use threaded interrupt for non-SPIMEM transfer"), and we almost always want to disable the interrupt line to avoid excess interrupts while the threaded handler is processing SPI transfer. Use IRQF_ONESHOT for that purpose.
In practice, we see MediaTek devices show SPI transfer timeout errors when communicating with ChromeOS EC in certain scenarios, and with IRQF_ONESHOT, the issue goes away.
Signed-off-by: Fei Shao fshao@chromium.org Link: https://patch.msgid.link/20251217101131.1975131-1-fshao@chromium.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed:** 1 line - **Files touched:** 1 file - **Complexity:** Trivially low - just a flag change - **Risk:** Very low
The fix is: - **Obviously correct**: IRQF_ONESHOT is the documented standard for threaded interrupts - **Small and contained**: One flag change - **Fixes a real bug**: Transfer timeout errors on real hardware
Looking at the interrupt handler code: - `mtk_spi_interrupt` returns `IRQ_WAKE_THREAD` for non-SPIMEM transfers - Without IRQF_ONESHOT, the IRQ line stays enabled, allowing repeated interrupts while the thread processes - This causes race conditions on `mdata->state` and transfer handling, leading to timeouts
### 5. USER IMPACT
- **Who is affected:** Users of MediaTek SPI devices, particularly ChromeOS users with embedded controllers - **Severity:** Medium-High - SPI transfer timeouts can cause significant functionality issues - **Real-world impact:** Confirmed on MediaTek devices communicating with ChromeOS EC
### 6. DEPENDENCY CHECK
The prerequisite commit `5972eb05ca32` ("spi: spi-mt65xx: Use threaded interrupt for non-SPIMEM transfer") was first included in **v6.12-rc1**. This means: - This fix is only applicable to 6.12.y stable trees and newer - Older stable trees (6.6.y, 6.1.y, 5.15.y) don't have the threaded IRQ code, so this fix doesn't apply
### 7. SUMMARY
| Criterion | Assessment | |-----------|------------| | Fixes real bug | ✅ Yes - SPI transfer timeouts | | Obviously correct | ✅ Yes - standard pattern | | Small and contained | ✅ Yes - 1 line change | | Tested | ✅ Yes - real hardware testing | | Risk | ✅ Very low | | Dependencies | ✅ Clear - requires 5972eb05ca32 (v6.12+) |
### Conclusion
This commit is an excellent candidate for stable backporting:
1. **Fixes a real, user-visible bug**: SPI transfer timeouts on MediaTek hardware 2. **Minimal change**: Single flag addition, no logic changes 3. **Obviously correct**: IRQF_ONESHOT is the proper pattern for threaded IRQs with hardirq handlers that return IRQ_WAKE_THREAD 4. **Low risk**: Well-understood kernel mechanism with no side effects 5. **Tested**: Confirmed to resolve the issue on real ChromeOS EC hardware
The only consideration is that this fix only applies to stable trees with kernel 6.12 or later (which contain the prerequisite threaded IRQ commit). For those trees, this is a straightforward, low-risk fix that resolves real hardware issues.
**YES**
drivers/spi/spi-mt65xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-mt65xx.c b/drivers/spi/spi-mt65xx.c index 4b40985af1ea..90e5813cfdc3 100644 --- a/drivers/spi/spi-mt65xx.c +++ b/drivers/spi/spi-mt65xx.c @@ -1320,7 +1320,7 @@ static int mtk_spi_probe(struct platform_device *pdev)
ret = devm_request_threaded_irq(dev, irq, mtk_spi_interrupt, mtk_spi_interrupt_thread, - IRQF_TRIGGER_NONE, dev_name(dev), host); + IRQF_ONESHOT, dev_name(dev), host); if (ret) return dev_err_probe(dev, ret, "failed to register irq\n");
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 5d5602236f5db19e8b337a2cd87a90ace5ea776d ]
syzbot is still reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
even after commit 93a27b5891b8 ("can: j1939: add missing calls in NETDEV_UNREGISTER notification handler") was added. A debug printk() patch found that j1939_session_activate() can succeed even after j1939_cancel_active_session() from j1939_netdev_notify(NETDEV_UNREGISTER) has completed.
Since j1939_cancel_active_session() is processed with the session list lock held, checking ndev->reg_state in j1939_session_activate() with the session list lock held can reliably close the race window.
Reported-by: syzbot syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/b9653191-d479-4c8b-8536-1326d028db5c@I-love.SAKURA.... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
---
## FINAL ANALYSIS SUMMARY
### What the commit fixes: A race condition in the CAN J1939 protocol driver where `j1939_session_activate()` can succeed even after `j1939_cancel_active_session()` from the NETDEV_UNREGISTER handler has completed. This leaves orphaned sessions with references to devices being unregistered, causing the "waiting for device to become free" hang.
### Why it matters to stable users: - **System hang**: The bug can cause network device unregistration to hang indefinitely - **Syzbot reproducible**: The bug was found by automated fuzzing, meaning it's triggerable in practice - **CAN J1939 users**: Affects automotive and industrial systems using the CAN J1939 protocol
### Stable kernel rule compliance:
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Yes - simple check using established kernel pattern | | Fixes real bug | ✅ Yes - syzbot-reported hang | | Small and contained | ✅ Yes - 2 lines added, 1 file changed | | No new features | ✅ Correct - only adds error path | | Tested | ✅ Has Acked-by from maintainer |
### Risk assessment: - **Very low risk**: The change only adds a conditional check before adding a session to the active list - **Uses established pattern**: `dev->reg_state != NETREG_REGISTERED` is used throughout net/can/raw.c and other networking code - **Cannot break existing functionality**: Adds a new error return path that's already properly handled (callers handle error returns)
### Dependencies: - The fix is logically independent - it adds a safety check that works regardless of other commits - The related commit 93a27b5891b8 is about a different part of the fix (adding missing calls in the unregister handler) - Both commits are improvements to the same underlying issue but can be applied separately
### Conclusion: This is a textbook example of a good stable backport candidate: 1. Fixes a real, user-visible bug (system hang) 2. Minimal change (2 lines) 3. Uses established kernel patterns 4. Has proper subsystem maintainer review 5. Low risk of regression
**YES**
net/can/j1939/transport.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index fbf5c8001c9d..613a911dda10 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1567,6 +1567,8 @@ int j1939_session_activate(struct j1939_session *session) if (active) { j1939_session_put(active); ret = -EAGAIN; + } else if (priv->ndev->reg_state != NETREG_REGISTERED) { + ret = -ENODEV; } else { WARN_ON_ONCE(session->state != J1939_SESSION_NEW); list_add_tail(&session->active_session_list_entry,
From: Sumeet Pawnikar sumeet4linux@gmail.com
[ Upstream commit 7bda1910c4bccd4b8d4726620bb3d6bbfb62286e ]
The device becomes visible to userspace via device_register() even before it fully initialized by idr_init(). If userspace or another thread tries to register a zone immediately after device_register(), the control_type_valid() will fail because the control_type is not yet in the list. The IDR is not yet initialized, so this race condition causes zone registration failure.
Move idr_init() and list addition before device_register() fix the race condition.
Signed-off-by: Sumeet Pawnikar sumeet4linux@gmail.com [ rjw: Subject adjustment, empty line added ] Link: https://patch.msgid.link/20251205190216.5032-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
This confirms the race condition mechanism:
1. `control_type_valid()` walks `powercap_cntrl_list` to validate control types (lines 329-345) 2. Called at `powercap_register_zone()` entry (line 497) 3. **Race window**: After `device_register()` but before `list_add_tail()`, any zone registration fails because the control_type isn't in the list yet
### 8. FINAL ASSESSMENT
| Criteria | Assessment | |----------|------------| | Fixes a real bug | ✅ Race condition causing zone registration failure | | Obviously correct | ✅ Classic init-before-publish pattern | | Small and contained | ✅ ~20 lines in one function | | No new features | ✅ Only reorders existing code | | No new APIs | ✅ No userspace-visible changes | | Low regression risk | ✅ Standard pattern, proper error handling | | Affects stable users | ✅ Powercap used on many Intel systems |
**Why this matters to stable users:** - Intel RAPL (Running Average Power Limit) uses powercap - Common on laptops and servers for power management - Race can cause power capping features to fail during boot/module load - This is an existing bug that has been present since powercap was introduced
**Risk vs Benefit:** - **Benefit**: Fixes a real race that causes functional failures - **Risk**: Very low - the fix follows standard kernel patterns and adds proper cleanup - **Trade-off**: Strongly favors backporting
**Concerns:** - No dependencies on other commits - The change is self-contained and should apply cleanly to stable trees - The powercap subsystem has existed since kernel 3.11, so applicable to all current stable trees
The fix is small, surgical, obviously correct, and addresses a real race condition that can cause user-visible failures. It meets all stable kernel criteria.
**YES**
drivers/powercap/powercap_sys.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c index 4112a0097338..d14b36b75189 100644 --- a/drivers/powercap/powercap_sys.c +++ b/drivers/powercap/powercap_sys.c @@ -625,17 +625,23 @@ struct powercap_control_type *powercap_register_control_type( INIT_LIST_HEAD(&control_type->node); control_type->dev.class = &powercap_class; dev_set_name(&control_type->dev, "%s", name); - result = device_register(&control_type->dev); - if (result) { - put_device(&control_type->dev); - return ERR_PTR(result); - } idr_init(&control_type->idr);
mutex_lock(&powercap_cntrl_list_lock); list_add_tail(&control_type->node, &powercap_cntrl_list); mutex_unlock(&powercap_cntrl_list_lock);
+ result = device_register(&control_type->dev); + if (result) { + mutex_lock(&powercap_cntrl_list_lock); + list_del(&control_type->node); + mutex_unlock(&powercap_cntrl_list_lock); + + idr_destroy(&control_type->idr); + put_device(&control_type->dev); + return ERR_PTR(result); + } + return control_type; } EXPORT_SYMBOL_GPL(powercap_register_control_type);
From: Caleb Sander Mateos csander@purestorage.com
[ Upstream commit ccb8a3c08adf8121e2afb8e704f007ce99324d79 ]
The PI tuple must be contained within the metadata value, so validate that pi_offset + pi_tuple_size <= metadata_size. This guards against block drivers that report invalid pi_offset values.
Signed-off-by: Caleb Sander Mateos csander@purestorage.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Summary
### Problem Being Fixed The commit fixes an **incomplete bounds validation bug** in the block layer's integrity limit validation code. When the PI (Protection Information) tuple can be placed at a non-zero offset within the metadata buffer (a feature added in kernel 6.9), the validation check only verified that `pi_tuple_size <= metadata_size`, but **completely ignored the pi_offset**.
This is mathematically incorrect. If: - `metadata_size = 8` bytes - `pi_tuple_size = 8` bytes - `pi_offset = 4` bytes
The old check (`8 > 8`) passes, but the PI tuple would extend 4 bytes beyond the buffer boundary, potentially causing out-of-bounds memory access.
### Why It Matters to Stable Users - **Data Integrity Risk**: Block layer integrity/PI is used in enterprise storage environments (NVMe, SAS with T10 PI) - **Defense Against Buggy Drivers**: The fix prevents the kernel from accepting invalid configurations from misbehaving block drivers - **Potential for Memory Corruption**: Without this validation, accessing PI data could read/write beyond buffer bounds
### Stable Kernel Criteria Assessment
| Criterion | Met? | Notes | |-----------|------|-------| | Obviously correct | ✅ | Mathematical correctness: offset + size must fit in buffer | | Fixes real bug | ✅ | Incomplete bounds check could allow invalid configurations | | Small scope | ✅ | 4 lines changed, single file | | No new features | ✅ | Only strengthens existing validation | | No API changes | ✅ | No user-visible changes |
### Risk vs Benefit
**Risk**: Extremely low - The change only makes validation stricter - Can only reject configurations that were previously (incorrectly) accepted - Any configuration rejected by the new check was already semantically invalid and potentially dangerous
**Benefit**: Moderate to high - Prevents kernel from accepting invalid integrity configurations - Guards against memory corruption in PI-enabled storage stacks - Important for enterprise environments using DIF/PI
### Concerns
**Applicability**: The `pi_offset` field was introduced in kernel 6.9 (commit 60d21aac52e2). This fix is only applicable to stable kernels 6.9.y and later.
**Dependencies**: None - this is a standalone validation fix.
### Quality Indicators - Reviewed-by: Christoph Hellwig (highly respected kernel developer) - Signed-off-by: Jens Axboe (block layer maintainer) - Part of a series of validation improvements (similar commit for `interval_exp`) - No follow-up fixes needed
The fix is small, surgical, obviously correct, and addresses a real validation gap that could lead to memory safety issues. It meets all stable kernel criteria.
**YES**
block/blk-settings.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c index d74b13ec8e54..f2c1940fe6f1 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -148,10 +148,9 @@ static int blk_validate_integrity_limits(struct queue_limits *lim) return -EINVAL; }
- if (bi->pi_tuple_size > bi->metadata_size) { - pr_warn("pi_tuple_size (%u) exceeds metadata_size (%u)\n", - bi->pi_tuple_size, - bi->metadata_size); + if (bi->pi_offset + bi->pi_tuple_size > bi->metadata_size) { + pr_warn("pi_offset (%u) + pi_tuple_size (%u) exceeds metadata_size (%u)\n", + bi->pi_offset, bi->pi_tuple_size, bi->metadata_size); return -EINVAL; }
From: Charlene Liu Charlene.Liu@amd.com
[ Upstream commit 3886b198bd6e49c801fe9552fcfbfc387a49fbbc ]
[why] need to enable APG_CLOCK_ENABLE enable first also need to wake up az from D3 before access az block
Reviewed-by: Swapnil Patel swapnil.patel@amd.com Signed-off-by: Charlene Liu Charlene.Liu@amd.com Signed-off-by: Chenyu Chen chen-yu.chen@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit bf5e396957acafd46003318965500914d5f4edfa) Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
### 3. TECHNICAL ANALYSIS
**The Bug:** The original code had an incorrect operation sequence: 1. Called `az_enable()` - attempts to access audio controller registers 2. Called `enable_pme_wa()` - wakes audio controller from D3 power state
This is backwards - you cannot access hardware registers while the device is still in D3 (sleeping). The hardware must be woken up FIRST.
**The Fix:** The fix simply reorders these operations: 1. Call `enable_pme_wa()` - wake audio controller from D3 first 2. Call `az_enable()` - now safe to access registers
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed:** ~8 lines (just reordering) - **Files touched:** 1 file (dce110_hwseq.c) - **Complexity:** Very low - no new logic, just reordering existing function calls - **Subsystem:** AMD display driver, well-tested and mature - **Risk:** Minimal - same operations, correct sequence
### 5. USER IMPACT
- **Affected users:** AMD GPU users with DisplayPort audio output - **Severity:** High - complete loss of audio functionality - **Usage scenario:** Very common (monitors with speakers, AV receivers, docking stations) - **Symptoms:** "DP no audio issue" - a clearly user-visible bug
### 6. STABILITY INDICATORS
- Reviewed-by: Swapnil Patel (AMD engineer) - Tested-by: Daniel Wheeler (AMD engineer) - Multiple sign-offs from AMD display team - Cherry-picked from mainline - indicates AMD considers it important
### 7. DEPENDENCY CHECK
The dce110_hwseq.c file has been in the kernel for years and the `dce110_enable_audio_stream` function is stable code. The fix only reorders existing calls to `enable_pme_wa()` and `az_enable()` - both of which already exist in stable trees.
## Summary
**Meets stable criteria:** - ✅ **Obviously correct:** Simple reordering to match hardware requirements - ✅ **Fixes real bug:** Complete audio failure on DisplayPort is a significant user-visible bug - ✅ **Small and contained:** ~8 lines changed, same functions, just different order - ✅ **No new features:** No new functionality added - ✅ **Tested:** Has Tested-by and Reviewed-by tags from AMD engineers - ✅ **No dependencies:** Uses existing functions already in stable trees
**Risk vs Benefit:** - **Benefit:** Restores DisplayPort audio functionality for affected AMD GPU users - **Risk:** Very low - the same operations are performed, just in the correct hardware-required sequence
This is an ideal stable backport candidate: a small, surgical fix for a clear user-visible bug (no audio on DP), with proper code review and testing from the hardware vendor.
**YES**
drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c index ebc220b29d14..0bf98d834e61 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c @@ -1098,13 +1098,13 @@ void dce110_enable_audio_stream(struct pipe_ctx *pipe_ctx) if (dc->current_state->res_ctx.pipe_ctx[i].stream_res.audio != NULL) num_audio++; } + if (num_audio >= 1 && clk_mgr->funcs->enable_pme_wa) { + /*wake AZ from D3 first before access az endpoint*/ + clk_mgr->funcs->enable_pme_wa(clk_mgr); + }
pipe_ctx->stream_res.audio->funcs->az_enable(pipe_ctx->stream_res.audio);
- if (num_audio >= 1 && clk_mgr->funcs->enable_pme_wa) - /*this is the first audio. apply the PME w/a in order to wake AZ from D3*/ - clk_mgr->funcs->enable_pme_wa(clk_mgr); - link_hwss->enable_audio_packet(pipe_ctx);
if (pipe_ctx->stream_res.audio)
From: Niklas Cassel cassel@kernel.org
[ Upstream commit ba624ba88d9f5c3e2ace9bb6697dbeb05b2dbc44 ]
According to a user report, the ST2000DM008-2FR102 has problems with LPM.
Reported-by: Emerson Pinter e@pinter.dev Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220693 Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis of Commit: Disable LPM on ST2000DM008-2FR102
### 1. COMMIT MESSAGE ANALYSIS
The commit message is clear and provides essential information: - **Purpose:** Disabling Link Power Management (LPM) for a specific Seagate hard drive model - **Evidence of real issue:** Has `Reported-by:` tag and references kernel bugzilla #220693 - **Proper sign-offs:** Signed by both the patch author (Niklas Cassel) and the libata maintainer (Damien Le Moal)
### 2. CODE CHANGE ANALYSIS
The change adds exactly 3 lines to the `__ata_dev_quirks[]` array in `drivers/ata/libata-core.c`:
```c /* Seagate disks with LPM issues */ { "ST2000DM008-2FR102", NULL, ATA_QUIRK_NOLPM }, ```
This follows the exact same pattern as existing quirks in the table. Looking at the context, there's already another Seagate drive with `ATA_QUIRK_NOLPM`: ```c { "ST1000LM024 HN-M101MBB", NULL, ATA_QUIRK_BROKEN_FPDMA_AA | ATA_QUIRK_NOLPM }, ```
The mechanism is well-understood: when this specific drive model is detected, LPM is disabled to avoid the hardware-specific issues it causes.
### 3. CLASSIFICATION
This is a **hardware quirk/workaround** - one of the explicit exceptions allowed in stable: - It's adding a device-specific workaround to an existing quirk table - Identical pattern to dozens of other quirks already in the table - Fixes real-world hardware issues even though it "adds code"
### 4. SCOPE AND RISK ASSESSMENT
| Metric | Assessment | |--------|------------| | Lines changed | 3 (minimal) | | Files touched | 1 | | Complexity | Trivial - data table addition | | Risk to other users | Zero - only affects this specific drive model | | Regression potential | None - well-established mechanism |
The risk is essentially zero. This is a data-only change to a static array. The quirk mechanism has been stable for many years and is used extensively for similar hardware workarounds.
### 5. USER IMPACT
- **Affected users:** Owners of ST2000DM008-2FR102 (2TB Seagate Barracuda) - a popular consumer HDD - **Without the fix:** LPM-related issues typically cause drive disconnects, I/O errors, or system hangs - **Severity:** Medium to High - LPM issues can cause data corruption or loss in severe cases - **Real bug evidence:** Bugzilla report indicates actual users hitting this problem
### 6. STABILITY INDICATORS
- Approved by libata maintainer (Damien Le Moal) - Proper bug tracking with Closes: tag - Reported-by indicates real user testing/reporting
### 7. DEPENDENCY CHECK
- **Dependencies:** None - standalone quirk addition - **Code exists in stable:** Yes - the quirk table and `ATA_QUIRK_NOLPM` exist in all maintained stable kernels - **Backport complexity:** Trivial - should apply cleanly to any stable kernel
### FINAL ASSESSMENT
This commit is a textbook example of what SHOULD be backported to stable:
**Meets all stable criteria:** 1. ✅ Obviously correct - identical pattern to existing quirks 2. ✅ Fixes a real bug - LPM issues reported by users 3. ✅ Small and contained - 3 lines, 1 file 4. ✅ No new features - just a hardware workaround 5. ✅ Falls into "quirks/workarounds" exception category
**Risk vs Benefit:** - **Risk:** Essentially zero - only affects one specific drive model, cannot break anything else - **Benefit:** Fixes LPM-related issues (drive disconnects, errors, potential data corruption) for users of this popular Seagate drive
This is exactly the type of low-risk, high-value hardware workaround that stable trees are designed to include.
**YES**
drivers/ata/libata-core.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index f48fb63d7e85..1216b4f2eb90 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4132,6 +4132,9 @@ static const struct ata_dev_quirks_entry __ata_dev_quirks[] = { { "ST3320[68]13AS", "SD1[5-9]", ATA_QUIRK_NONCQ | ATA_QUIRK_FIRMWARE_WARN },
+ /* Seagate disks with LPM issues */ + { "ST2000DM008-2FR102", NULL, ATA_QUIRK_NOLPM }, + /* drives which fail FPDMA_AA activation (some may freeze afterwards) the ST disks also have LPM issues */ { "ST1000LM024 HN-M101MBB", NULL, ATA_QUIRK_BROKEN_FPDMA_AA |
From: "Mario Limonciello (AMD)" superm1@kernel.org
[ Upstream commit 7bbf6d15e935abbb3d604c1fa157350e84a26f98 ]
SVA support is required, which isn't configured by hypervisor solutions.
Closes: https://github.com/QubesOS/qubes-issues/issues/10275 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4656 Reviewed-by: Lizhi Hou lizhi.hou@amd.com Link: https://patch.msgid.link/20251213054513.87925-1-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) superm1@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Commit Analysis: accel/amdxdna: Block running under a hypervisor
### 1. COMMIT MESSAGE ANALYSIS
The commit message indicates this fix prevents the amdxdna driver from running under hypervisors because SVA (Shared Virtual Addressing) support is required but not configured in hypervisor solutions. Two bug tracker links are provided: - QubesOS issue #10275 - freedesktop.org AMD issue #4656
This shows real users are hitting this problem.
### 2. CODE CHANGE ANALYSIS
The change is minimal:
```c +#include <asm/hypervisor.h> ... + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { + XDNA_ERR(xdna, "Running under hypervisor not supported"); + return -EINVAL; + } ```
The fix adds an early check in `aie2_init()` that: 1. Uses the well-established x86 hypervisor detection infrastructure 2. If not running on bare metal (native), prints an error and returns -EINVAL 3. This happens before any resource allocation, making it a clean early- exit
**The bug mechanism:** Without this check, when users run this driver in virtualized environments (QubesOS, etc.), the driver attempts to initialize but fails due to missing SVA support. This leads to confusing errors, potential crashes, or undefined behavior. The fix makes the driver fail gracefully with a clear message.
### 3. CLASSIFICATION
This is a **bug fix** - specifically a "graceful failure" fix that prevents the driver from attempting an unsupported configuration. It does not add features; it blocks an unsupported environment with a clear error.
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed:** 5 lines (1 include + 4 lines of logic) - **Files touched:** 1 file - **Complexity:** Very low - trivial conditional check - **Risk:** Very low - early return before any resource allocation - **Dependencies:** Uses `hypervisor_is_type()` and `X86_HYPER_NATIVE` which have been in the kernel for years (x86 hypervisor detection is mature infrastructure)
### 5. USER IMPACT
- **Affected users:** Those running VMs (QubesOS, etc.) with AMD XDNA hardware - **Severity without fix:** Confusing failures, crashes, or undefined behavior when attempting to use the driver - **Severity with fix:** Clean error message explaining the limitation
The linked issues show this is a real problem users are encountering.
### 6. STABILITY INDICATORS
- Has `Reviewed-by:` from AMD engineer (Lizhi Hou) - Has real bug reports from multiple sources - Simple, obvious fix
### 7. DEPENDENCY CHECK
The amdxdna driver is relatively new (added around kernel 6.11-6.12). It exists in: - Recent mainline releases - Potentially only very recent stable trees (6.12.x if supported)
The hypervisor detection APIs used (`hypervisor_is_type`, `X86_HYPER_NATIVE`) are stable and have existed for years.
### 8. STABLE CRITERIA ASSESSMENT
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Yes - trivial check | | Fixes real bug | ✅ Yes - prevents crashes in VMs | | Important issue | ✅ Yes - affects VM users | | Small/contained | ✅ Yes - 5 lines | | No new features | ✅ Yes - disables unsupported config | | Low risk | ✅ Yes - clean early return |
### CONCLUSION
This commit is appropriate for stable backporting. It's a small, low- risk fix that prevents the driver from running in unsupported virtualized environments where it would otherwise fail with confusing behavior or crashes. The fix is surgically precise - adding an early check with a clean error return.
The stable maintainers will only apply this to kernel versions that contain the amdxdna driver (likely only very recent stable trees like 6.12.x). The patch itself meets all stable criteria: obviously correct, fixes a real bug that users are hitting, small scope, and no new features.
**YES**
drivers/accel/amdxdna/aie2_pci.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index 43f725e1a2d7..6e07793bbeac 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -17,6 +17,7 @@ #include <linux/iopoll.h> #include <linux/pci.h> #include <linux/xarray.h> +#include <asm/hypervisor.h>
#include "aie2_msg_priv.h" #include "aie2_pci.h" @@ -486,6 +487,11 @@ static int aie2_init(struct amdxdna_dev *xdna) unsigned long bars = 0; int i, nvec, ret;
+ if (!hypervisor_is_type(X86_HYPER_NATIVE)) { + XDNA_ERR(xdna, "Running under hypervisor not supported"); + return -EINVAL; + } + ndev = drmm_kzalloc(&xdna->ddev, sizeof(*ndev), GFP_KERNEL); if (!ndev) return -ENOMEM;
From: Brian Kocoloski brian.kocoloski@amd.com
[ Upstream commit 969faea4e9d01787c58bab4d945f7ad82dad222d ]
Pass character "0" rather than NULL terminator to properly format queue restoration SMI events. Currently, the NULL terminator precedes the newline character that is intended to delineate separate events in the SMI event buffer, which can break userspace parsers.
Signed-off-by: Brian Kocoloski brian.kocoloski@amd.com Reviewed-by: Philip Yang Philip.Yang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 6e7143e5e6e21f9d5572e0390f7089e6d53edf3c) Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Summary
### Technical Analysis
**The Bug:** Commit 663b0f1e141d introduced a refactoring that changed the inline format string to use the `KFD_EVENT_FMT_QUEUE_RESTORE` macro. This macro is defined as: ```c #define KFD_EVENT_FMT_QUEUE_RESTORE(ns, pid, node, rescheduled)\ "%lld -%d %x %c\n", (ns), (pid), (node), (rescheduled) ```
The format specifier `%c` expects a **character**. The buggy code passes integer `0` (which is the NULL terminator 0x00). When printed with `%c`, this NULL byte terminates the string before the newline `\n` gets written, causing: - Malformed event strings in the SMI buffer - Userspace parsers expecting newline-delimited events to fail
**The Fix:** Change `0` to `'0'` (ASCII character 0x30 = 48) so the `%c` format prints the character "0" followed by the newline.
**Evidence the fix is correct:** 1. The sibling function `kfd_smi_event_queue_restore_rescheduled` correctly uses `'R'` (a character) for the same parameter 2. The format specifier is `%c` which requires a character 3. The original pre-refactoring code had no character parameter at all (format was `"%lld -%d %x\n"`)
### Stable Kernel Criteria Assessment
| Criteria | Assessment | |----------|------------| | Obviously correct | ✅ Yes - format `%c` requires character, `'0'` vs `0` is clearly the fix | | Fixes real bug | ✅ Yes - breaks userspace parsers relying on newline- delimited events | | Important issue | ✅ Yes - affects userspace ABI/behavior | | Small and contained | ✅ Yes - 1 line, 1 file, single character change | | No new features | ✅ Correct - no new functionality | | Tested | ✅ Reviewed-by and cherry-picked from mainline |
### Risk Assessment
- **Risk:** Extremely low - the change is trivial and obviously correct - **Scope:** Only affects AMD GPU users using SMI event monitoring - **Regression potential:** None - this is restoring correct behavior
### Affected Versions
The bug was introduced in commit 663b0f1e141d which landed in v6.12-rc1. This fix is relevant for the 6.12.y stable branch.
### Conclusion
This is an ideal stable backport candidate: a small, obvious, low-risk fix for a user-visible bug that breaks userspace tools. The fix is trivial (single character), has clear evidence of correctness, and has been reviewed by AMD engineers. The commit message clearly explains the problem and the solution.
**YES**
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c index a499449fcb06..d2bc169e84b0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -312,7 +312,7 @@ void kfd_smi_event_queue_restore(struct kfd_node *node, pid_t pid) { kfd_smi_event_add(pid, node, KFD_SMI_EVENT_QUEUE_RESTORE, KFD_EVENT_FMT_QUEUE_RESTORE(ktime_get_boottime_ns(), pid, - node->id, 0)); + node->id, '0')); }
void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm)
From: Marcus Hughes marcus.hughes@betterinternet.ltd
[ Upstream commit 71cfa7c893a05d09e7dc14713b27a8309fd4a2db ]
Some Potron SFP+ XGSPON ONU sticks are shipped with different EEPROM vendor ID and vendor name strings, but are otherwise functionally identical to the existing "Potron SFP+ XGSPON ONU Stick" handled by sfp_quirk_potron().
These modules, including units distributed under the "Better Internet" branding, use the same UART pin assignment and require the same TX_FAULT/LOS behaviour and boot delay. Re-use the existing Potron quirk for this EEPROM variant.
Signed-off-by: Marcus Hughes marcus.hughes@betterinternet.ltd Link: https://patch.msgid.link/20251207210355.333451-1-marcus.hughes@betterinterne... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## SUMMARY
**What the commit does**: Adds a single entry to the SFP quirks array for hardware that has a different vendor/part ID ("BIDB"/"X-ONU-SFPP") but requires identical handling to already-supported Potron modules.
**Does it meet stable kernel rules?**
1. ✅ **Obviously correct**: It's a one-line addition using existing, proven quirk infrastructure 2. ✅ **Fixes a real bug**: Without this quirk, the hardware doesn't work properly (TX_FAULT/LOS pins are misinterpreted) 3. ✅ **Important issue**: Hardware that users have purchased doesn't function 4. ✅ **Small and contained**: Single line addition to an array 5. ✅ **No new features**: Just extends existing quirk to another device ID
**Dependency check**: The `sfp_fixup_potron()` function was introduced in commit `dfec1c14aece` (June 2025) and has already been backported to stable trees (confirmed by seeing backport commit `34a890983183`). This commit requires that parent commit to be present.
**Risk vs Benefit**: - **Risk**: Near zero - only affects specific hardware identified by exact vendor/part match - **Benefit**: High for affected users - enables hardware to work properly
## CONCLUSION
This commit is a textbook example of a hardware quirk addition that IS appropriate for stable backporting. It: - Uses existing, tested infrastructure - Has minimal code change (1 line) - Enables real hardware that users have in the field - Has zero risk of regression for anyone else - The parent quirk function is already in stable trees
The only caveat is that stable kernels must have the original Potron quirk commit (`dfec1c14aece`) first, which based on the git history appears to have already been backported.
**YES**
drivers/net/phy/sfp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 0401fa6b24d2..6166e9196364 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -497,6 +497,8 @@ static const struct sfp_quirk sfp_quirks[] = { SFP_QUIRK("ALCATELLUCENT", "3FE46541AA", sfp_quirk_2500basex, sfp_fixup_nokia),
+ SFP_QUIRK_F("BIDB", "X-ONU-SFPP", sfp_fixup_potron), + // FLYPRO SFP-10GT-CS-30M uses Rollball protocol to talk to the PHY. SFP_QUIRK_F("FLYPRO", "SFP-10GT-CS-30M", sfp_fixup_rollball),
linux-stable-mirror@lists.linaro.org