From: Jiufei Xue jiufei.xue@samsung.com
[ Upstream commit d02d2c98d25793902f65803ab853b592c7a96b29 ]
An use-after-free issue occurred when __mark_inode_dirty() get the bdi_writeback that was in the progress of switching.
CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1 ...... pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __mark_inode_dirty+0x124/0x418 lr : __mark_inode_dirty+0x118/0x418 sp : ffffffc08c9dbbc0 ........ Call trace: __mark_inode_dirty+0x124/0x418 generic_update_time+0x4c/0x60 file_modified+0xcc/0xd0 ext4_buffered_write_iter+0x58/0x124 ext4_file_write_iter+0x54/0x704 vfs_write+0x1c0/0x308 ksys_write+0x74/0x10c __arm64_sys_write+0x1c/0x28 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0xc0/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x40/0xe4 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x194/0x198
Root cause is:
systemd-random-seed kworker ---------------------------------------------------------------------- ___mark_inode_dirty inode_switch_wbs_work_fn
spin_lock(&inode->i_lock); inode_attach_wb locked_inode_to_wb_and_lock_list get inode->i_wb spin_unlock(&inode->i_lock); spin_lock(&wb->list_lock) spin_lock(&inode->i_lock) inode_io_list_move_locked spin_unlock(&wb->list_lock) spin_unlock(&inode->i_lock) spin_lock(&old_wb->list_lock) inode_do_switch_wbs spin_lock(&inode->i_lock) inode->i_wb = new_wb spin_unlock(&inode->i_lock) spin_unlock(&old_wb->list_lock) wb_put_many(old_wb, nr_switched) cgwb_release old wb released wb_wakeup_delayed() accesses wb, then trigger the use-after-free issue
Fix this race condition by holding inode spinlock until wb_wakeup_delayed() finished.
Signed-off-by: Jiufei Xue jiufei.xue@samsung.com Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@samsung.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my thorough analysis of the commit and the kernel code, here's my assessment:
**Backport Status: YES**
## Extensive Explanation:
This commit fixes a **critical use-after-free vulnerability** in the writeback subsystem that should definitely be backported to stable kernel trees. Here's why:
### 1. **Real Security Bug with Clear Impact** The commit fixes a genuine use-after-free vulnerability that occurs in `__mark_inode_dirty()`. The bug manifests as a kernel crash with a clear call trace showing memory corruption. This is not a theoretical issue - it has been observed in production (kernel 6.6.56).
### 2. **Race Condition Details** The race condition occurs between two concurrent operations: - **Thread A** (`__mark_inode_dirty`): Gets a reference to `inode->i_wb`, releases the inode lock, then calls `wb_wakeup_delayed(wb)` - **Thread B** (`inode_switch_wbs_work_fn`): Switches the inode's writeback context, releases the old wb via `wb_put_many()`, which can trigger `cgwb_release` and free the wb structure
The vulnerability window exists because Thread A accesses the wb structure (`wb_wakeup_delayed(wb)`) after releasing the inode lock but before completing its operation, while Thread B can free that same wb structure in parallel.
### 3. **Minimal and Contained Fix** The fix is remarkably simple and surgical - it only reorders lock releases: ```c - spin_unlock(&wb->list_lock); - spin_unlock(&inode->i_lock); - trace_writeback_dirty_inode_enqueue(inode); - if (wakeup_bdi && (wb->bdi->capabilities & BDI_CAP_WRITEBACK)) wb_wakeup_delayed(wb); + + spin_unlock(&wb->list_lock); + spin_unlock(&inode->i_lock); + trace_writeback_dirty_inode_enqueue(inode); ```
The fix ensures that `wb_wakeup_delayed()` is called while still holding the locks, preventing the wb from being freed during the operation. This is a classic lock ordering fix with minimal code change (just moving 3 lines).
### 4. **Critical Subsystem** The writeback subsystem is fundamental to filesystem operations - it handles dirty page writeback, inode management, and I/O scheduling. A use-after-free here can lead to: - System crashes/panics - Data corruption - Security vulnerabilities (potential for exploitation)
### 5. **Matches Stable Criteria** This fix perfectly aligns with stable kernel rules: - **Fixes a real bug**: Documented use-after-free with crash trace - **Already upstream**: Has been reviewed by Jan Kara (filesystem maintainer) and merged by Christian Brauner - **Small change**: Only 6 lines moved, no new logic - **No new features**: Pure bug fix - **Low regression risk**: Simply extends existing lock protection slightly
### 6. **Similar Historical Fixes** The git history shows multiple similar use-after-free fixes in this subsystem have been backported: - `f87904c07551` - writeback: avoid use-after-free after removing device - `8e00c4e9dd85` - writeback: fix use-after-free in finish_writeback_work() - `614a4e377314` - writeback, cgroup: fix premature wb_put()
### 7. **CONFIG_CGROUP_WRITEBACK Dependency** While this bug primarily affects systems with `CONFIG_CGROUP_WRITEBACK` enabled (used for cgroup-aware writeback), this is a common configuration in modern distributions, especially those using cgroups v2 for container workloads.
### 8. **No Side Effects** The change has no functional side effects beyond fixing the race: - The trace event still fires at the same logical point - The wb_wakeup_delayed() still executes under the same conditions - Only the lock hold duration is slightly extended to cover the wb access
This is a textbook example of a commit that should be backported to stable kernels - it fixes a serious bug with minimal, well-understood changes in a critical subsystem.
fs/fs-writeback.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index cc57367fb641..a07b8cf73ae2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2608,10 +2608,6 @@ void __mark_inode_dirty(struct inode *inode, int flags) wakeup_bdi = inode_io_list_move_locked(inode, wb, dirty_list);
- spin_unlock(&wb->list_lock); - spin_unlock(&inode->i_lock); - trace_writeback_dirty_inode_enqueue(inode); - /* * If this is the first dirty inode for this bdi, * we have to wake-up the corresponding bdi thread @@ -2621,6 +2617,11 @@ void __mark_inode_dirty(struct inode *inode, int flags) if (wakeup_bdi && (wb->bdi->capabilities & BDI_CAP_WRITEBACK)) wb_wakeup_delayed(wb); + + spin_unlock(&wb->list_lock); + spin_unlock(&inode->i_lock); + trace_writeback_dirty_inode_enqueue(inode); + return; } }
From: Lubomir Rintel lkundrak@v3.sk
[ Upstream commit 4a73a36cb704813f588af13d9842d0ba5a185758 ]
This lets NetworkManager/ModemManager know that this is a modem and needs to be connected first.
Signed-off-by: Lubomir Rintel lkundrak@v3.sk Link: https://patch.msgid.link/20250814154214.250103-1-lkundrak@v3.sk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Analysis of the Commit:
1. **Nature of the change**: This is a simple device ID addition that adds USB vendor/device ID pair (0x8087:0x095a) for the Intel OEM version of Fibocom L850-GL modem to the cdc_ncm driver's device table.
2. **Bug fix vs feature**: This is a **hardware enablement fix** rather than a new feature. Without this change, the Intel-branded Fibocom L850-GL modem won't be properly recognized as a WWAN device, causing NetworkManager/ModemManager to fail to handle it correctly. This directly impacts users with this hardware.
3. **Code impact**: The change is minimal - just 7 lines adding a new entry to the `cdc_devs[]` USB device table: ```c /* Intel modem (label from OEM reads Fibocom L850-GL) */ { USB_DEVICE_AND_INTERFACE_INFO(0x8087, 0x095a, USB_CLASS_COMM, USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE), .driver_info = (unsigned long)&wwan_info, }, ```
4. **Risk assessment**: - **Extremely low risk** - The change only adds a new device ID entry - No existing functionality is modified - Uses the existing `wwan_info` driver configuration (FLAG_WWAN flag) - Follows the same pattern as other WWAN devices in the driver - Cannot cause regressions for other hardware
5. **User impact**: Users with this specific hardware (Intel OEM version with VID:PID 0x8087:0x095a) cannot use their modem properly without this fix. The modem won't be recognized as a WWAN device, preventing proper network management.
6. **Stable tree criteria compliance**: - ✓ Fixes a real bug (hardware not working properly) - ✓ Minimal change (7 lines) - ✓ No architectural changes - ✓ Self-contained to specific hardware - ✓ Clear and obvious correctness
7. **Historical context**: The git history shows numerous quirks and fixes for the Fibocom L850-GL modem variants, indicating this is well-known hardware that has required various fixes over time. This particular Intel OEM variant (0x8087:0x095a) was simply missing from the device table.
This is a textbook example of a stable-worthy commit: it enables specific hardware that should already be working, with zero risk to existing functionality.
drivers/net/usb/cdc_ncm.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index ea0e5e276cd6..5d123df0a866 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -2087,6 +2087,13 @@ static const struct usb_device_id cdc_devs[] = { .driver_info = (unsigned long)&wwan_info, },
+ /* Intel modem (label from OEM reads Fibocom L850-GL) */ + { USB_DEVICE_AND_INTERFACE_INFO(0x8087, 0x095a, + USB_CLASS_COMM, + USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE), + .driver_info = (unsigned long)&wwan_info, + }, + /* DisplayLink docking stations */ { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_VENDOR,
From: Huacai Chen chenhuacai@loongson.cn
[ Upstream commit 112ca94f6c3b3e0b2002a240de43c487a33e0234 ]
Now if preemption happens between protected_save_fpu_context() and protected_save_lbt_context(), FTOP context is lost. Because FTOP is saved by protected_save_lbt_context() but protected_save_fpu_context() disables TM before that. So save LBT before FPU in setup_sigcontext() to avoid this potential risk.
Signed-off-by: Hanlu Li lihanlu@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit and the Linux kernel codebase, I can now provide a comprehensive assessment.
**Backport Status: YES**
## Extensive Analysis
This commit fixes a **critical race condition** in the LoongArch signal handling code that can lead to **data loss** of processor state during signal context saving. Here's the detailed analysis:
### The Bug Being Fixed
1. **Context Loss Issue**: The commit addresses a race condition where the FTOP (x87 FPU stack pointer) context can be lost during signal handling. This happens when: - A preemption occurs between `protected_save_fpu_context()` and `protected_save_lbt_context()` - The FPU context save operation disables TM (likely Transaction Memory or a similar mechanism) before LBT context is saved - Since FTOP is part of the LBT (Loongson Binary Translation) extension context, it gets lost
2. **Binary Translation Context**: LBT is a hardware extension used to accelerate binary translation on LoongArch processors. According to the original LBT support commit (bd3c5798484a), it includes: - 4 scratch registers (scr0-scr3) - x86/ARM eflags register - x87 FPU stack pointer (FTOP)
### Code Changes Analysis
The fix is **minimal and surgical** - it simply reorders the save operations:
**Before (buggy order):** ```c // Save FPU contexts first (LASX/LSX/FPU) if (extctx->lasx.addr) err |= protected_save_lasx_context(extctx); else if (extctx->lsx.addr) err |= protected_save_lsx_context(extctx); else if (extctx->fpu.addr) err |= protected_save_fpu_context(extctx);
// Save LBT context last - PROBLEM: FTOP may be lost by now #ifdef CONFIG_CPU_HAS_LBT if (extctx->lbt.addr) err |= protected_save_lbt_context(extctx); #endif ```
**After (fixed order):** ```c // Save LBT context FIRST to preserve FTOP #ifdef CONFIG_CPU_HAS_LBT if (extctx->lbt.addr) err |= protected_save_lbt_context(extctx); #endif
// Then save FPU contexts (LASX/LSX/FPU) if (extctx->lasx.addr) err |= protected_save_lasx_context(extctx); else if (extctx->lsx.addr) err |= protected_save_lsx_context(extctx); else if (extctx->fpu.addr) err |= protected_save_fpu_context(extctx); ```
### Why This Should Be Backported
1. **Data Corruption Risk**: This bug can cause loss of processor state during signal handling, which could lead to: - Incorrect program execution after signal return - Potential application crashes - Data corruption in applications using binary translation features
2. **Small, Contained Fix**: The change is: - Only 10 lines (5 insertions, 5 deletions) - Confined to a single function in signal handling - Simply reorders existing operations without adding new logic - Protected by `#ifdef CONFIG_CPU_HAS_LBT` so it only affects systems with LBT support
3. **No Architectural Changes**: This is purely a bug fix that: - Doesn't introduce new features - Doesn't change kernel APIs or ABIs - Doesn't modify core subsystem behavior - Only affects LoongArch architecture with LBT extension enabled
4. **Clear Bug with Clear Fix**: The problem is well-defined (race condition causing context loss) and the solution is straightforward (reorder operations to save LBT before FPU).
5. **Affects User-Space Reliability**: Signal handling is a fundamental mechanism used by many applications. A bug here can affect system stability and application reliability.
### Risk Assessment
The risk of regression is **very low** because: - The change only affects code paths when LBT is enabled (`CONFIG_CPU_HAS_LBT`) - It's a simple reordering of independent save operations - The fix has been tested and merged into mainline - It doesn't change the fundamental logic, just the execution order
This commit clearly meets the stable kernel criteria for backporting as it fixes an important bug with minimal risk of introducing new issues.
arch/loongarch/kernel/signal.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/loongarch/kernel/signal.c b/arch/loongarch/kernel/signal.c index 4740cb5b2388..c9f7ca778364 100644 --- a/arch/loongarch/kernel/signal.c +++ b/arch/loongarch/kernel/signal.c @@ -677,6 +677,11 @@ static int setup_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc, for (i = 1; i < 32; i++) err |= __put_user(regs->regs[i], &sc->sc_regs[i]);
+#ifdef CONFIG_CPU_HAS_LBT + if (extctx->lbt.addr) + err |= protected_save_lbt_context(extctx); +#endif + if (extctx->lasx.addr) err |= protected_save_lasx_context(extctx); else if (extctx->lsx.addr) @@ -684,11 +689,6 @@ static int setup_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc, else if (extctx->fpu.addr) err |= protected_save_fpu_context(extctx);
-#ifdef CONFIG_CPU_HAS_LBT - if (extctx->lbt.addr) - err |= protected_save_lbt_context(extctx); -#endif - /* Set the "end" magic */ info = (struct sctx_info *)extctx->end.addr; err |= __put_user(0, &info->magic);
From: Qu Wenruo wqu@suse.com
[ Upstream commit 4bcd3061e8154606af7f721cb75ca04ffe191a12 ]
[BUG] If submit_one_sector() failed, the block will be kept dirty, but with their corresponding range finished in the ordered extent.
This means if a writeback happens later again, we can hit the following problems:
- ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector() If the original extent map is a hole, then we can hit this case, as the new ordered extent failed, we will drop the new extent map and re-read one from the disk.
- DEBUG_WARN() in btrfs_writepage_cow_fixup() This is because we no longer have an ordered extent for those dirty blocks. The original for them is already finished with error.
[CAUSE] The function submit_one_sector() is not following the regular error handling of writeback. The common practice is to clear the folio dirty, start and finish the writeback for the block.
This is normally done by extent_clear_unlock_delalloc() with PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during run_delalloc_range().
So if we keep those failed blocks dirty, they will stay in the page cache and wait for the next writeback.
And since the original ordered extent is already finished and removed, depending on the original extent map, we either hit the ASSERT() inside submit_one_sector(), or hit the DEBUG_WARN() in btrfs_writepage_cow_fixup().
[FIX] Follow the regular error handling to clear the dirty flag for the block, start and finish writeback for that block instead.
Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and the kernel repository context, here's my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix Analysis
1. **Critical Bug Being Fixed**: The commit fixes a serious error handling bug in btrfs writeback that can lead to two different assertion failures: - `ASSERT(block_start != EXTENT_MAP_HOLE)` in submit_one_sector() - `DEBUG_WARN()` in btrfs_writepage_cow_fixup()
2. **Data Integrity Issue**: The bug causes dirty blocks to remain dirty after a failed submission, but their corresponding ordered extent is already finished with error. This creates an inconsistent state where: - Dirty blocks exist without proper ordered extent tracking - Subsequent writeback attempts will fail with assertions/warnings - The filesystem enters an undefined state that could affect data integrity
3. **Clear Root Cause**: The commit message clearly identifies the problem - submit_one_sector() was not following standard writeback error handling practices. The fix aligns the error handling with the rest of the btrfs writeback code.
## Code Change Analysis
The fix is minimal and contained: ```c if (IS_ERR(em)) { + /* + * When submission failed, we should still clear the folio dirty. + * Or the folio will be written back again but without any + * ordered extent. + */ + btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize); + btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize); + btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize); return PTR_ERR(em); } ```
The changes: - Add proper error handling to clear dirty flag - Set and clear writeback status to properly finish the failed writeback - Update comments to clarify the behavior
## Stable Tree Criteria Met
1. **Fixes a real bug**: Yes - prevents assertion failures and potential filesystem corruption 2. **Small and contained**: Yes - only ~10 lines of actual code change in one function 3. **No new features**: Correct - purely bug fix 4. **Low regression risk**: The change follows established patterns used elsewhere in btrfs (extent_clear_unlock_delalloc) 5. **Important enough**: Yes - prevents filesystem errors and potential data integrity issues
The fix is straightforward, follows existing btrfs patterns, and addresses a clear bug that could affect users running btrfs filesystems, especially under I/O error conditions. This makes it an excellent candidate for stable backporting.
fs/btrfs/extent_io.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1dc931c4937f..2e127c109e5b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1483,7 +1483,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
/* * Return 0 if we have submitted or queued the sector for submission. - * Return <0 for critical errors. + * Return <0 for critical errors, and the sector will have its dirty flag cleared. * * Caller should make sure filepos < i_size and handle filepos >= i_size case. */ @@ -1506,8 +1506,17 @@ static int submit_one_sector(struct btrfs_inode *inode, ASSERT(filepos < i_size);
em = btrfs_get_extent(inode, NULL, filepos, sectorsize); - if (IS_ERR(em)) + if (IS_ERR(em)) { + /* + * When submission failed, we should still clear the folio dirty. + * Or the folio will be written back again but without any + * ordered extent. + */ + btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize); + btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize); + btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize); return PTR_ERR(em); + }
extent_offset = filepos - em->start; em_end = btrfs_extent_map_end(em); @@ -1637,8 +1646,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, * Here we set writeback and clear for the range. If the full folio * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag. * - * If we hit any error, the corresponding sector will still be dirty - * thus no need to clear PAGECACHE_TAG_DIRTY. + * If we hit any error, the corresponding sector will have its dirty + * flag cleared and writeback finished, thus no need to handle the error case. */ if (!submitted_io && !error) { btrfs_folio_set_writeback(fs_info, folio, start, len);
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 5b9e07551faa7bb2f26cb039cc6e8d00bc4d0831 ]
Chris reported that even on a BIOS that has a new enough SMU F/W version there is still a spurious IRQ1. Although the solution was added to SMU F/W 64.66.0 it turns out there needs to be a matching SBIOS change to activate it. Thus Linux shouldn't be avoiding the IRQ1 workaround on newer SMU F/W because there is no indication the BIOS change is in place.
Drop the match for 64.66.0+ and instead match all RN/CZN/BRC (they all share same SMU F/W). Adjust the quirk infrastructure to allow quirking the workaround on or off and also adjust existing quirks to match properly.
Unfortunately this may cause some systems that did have the SBIOS change in place to regress in keyboard wakeup but we don't have a way to know. If a user reports a keyboard wakeup regression they can run with amd_pmc.disable_workarounds=1 to deactivate the workaround and share DMI data so that their system can be quirked not to use the workaround in the upstream kernel.
Reported-by: Chris Bainbridge chris.bainbridge@gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4449 Tested-by: Chris Bainbridge chris.bainbridge@gmail.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20250724185156.1827592-1-superm1@kernel.org Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Now I have a better understanding. Let me analyze the commit to determine if it should be backported:
**Backport Status: YES**
## Extensive Explanation:
This commit should be backported to stable kernel trees for the following reasons:
### 1. **Fixes a Real User-Impacting Bug** The commit addresses a spurious IRQ1 (keyboard interrupt) issue that causes unintended system wakeups from suspend. The commit message references a specific bug report (https://gitlab.freedesktop.org/drm/amd/-/issues/4449) where users are experiencing this problem. This directly impacts user experience with system suspend/resume functionality.
### 2. **Regression Fix for Previous Workaround** The code shows this is fixing an incomplete workaround that was previously implemented. The original workaround avoided applying the IRQ1 disable on SMU firmware version 64.66.0+, assuming the firmware fix was sufficient. However, the commit message reveals that: - The SMU firmware fix requires a matching SBIOS change to be activated - Linux has no way to detect if the SBIOS change is present - This means systems with newer SMU firmware but without the SBIOS change still experience the spurious IRQ1 issue
### 3. **Limited Scope and Low Risk** The changes are confined to the AMD PMC driver quirks handling: - Removes the SMU firmware version check from `amd_pmc_wa_irq1()` function - Adjusts the quirk infrastructure to allow both s2idle bug and spurious 8042 fixes - Updates DMI matches to use the combined quirk where appropriate - The changes are self-contained within the platform-specific driver
### 4. **Hardware-Specific Fix** The fix targets specific AMD CPU models (Renoir/Cezanne/Barcelo - RN/CZN/BRC) that share the same SMU firmware. This hardware-specific nature means: - It won't affect other platforms - The risk is limited to AMD systems that already have the issue - The workaround provides a module parameter (`amd_pmc.disable_workarounds=1`) for users who might experience regressions
### 5. **Addresses Known Hardware/Firmware Limitation** The commit acknowledges a hardware/firmware limitation where: - A fix exists in SMU firmware 64.66.0+ - But it requires SBIOS activation that Linux cannot detect - This is a defensive approach to ensure all affected systems get the workaround
### 6. **Provides User Control** The commit message mentions that users who experience keyboard wakeup regression can use `amd_pmc.disable_workarounds=1` to disable the workaround and provide DMI data for future quirking. This gives users an escape hatch if needed.
### 7. **Follows Stable Kernel Criteria** This commit meets the stable kernel backport criteria: - **Fixes a real bug**: Spurious IRQ1 wakeups affecting suspend/resume - **Already tested**: Has a "Tested-by" tag from the bug reporter - **Small and contained**: Changes are limited to the AMD PMC driver - **No new features**: Only adjusts existing workaround logic - **Clear impact**: Users experience unwanted system wakeups
### Code Analysis Details: The key change in `drivers/platform/x86/amd/pmc/pmc.c` removes the SMU version check: ```c - /* cezanne platform firmware has a fix in 64.66.0 */ - if (pdev->cpu_id == AMD_CPU_ID_CZN) { - if (!pdev->major) { - rc = amd_pmc_get_smu_version(pdev); - if (rc) - return rc; - } - if (pdev->major > 64 || (pdev->major == 64 && pdev->minor > 65)) - return 0; - } ```
This ensures the workaround is always applied for affected CPUs, regardless of SMU firmware version.
The quirks restructuring in `pmc-quirks.c` creates a combined quirk (`quirk_s2idle_spurious_8042`) that applies both fixes where needed, showing careful consideration of the various affected systems.
drivers/platform/x86/amd/pmc/pmc-quirks.c | 54 ++++++++++++++--------- drivers/platform/x86/amd/pmc/pmc.c | 13 ------ 2 files changed, 34 insertions(+), 33 deletions(-)
diff --git a/drivers/platform/x86/amd/pmc/pmc-quirks.c b/drivers/platform/x86/amd/pmc/pmc-quirks.c index ded4c84f5ed1..7ffc659b2794 100644 --- a/drivers/platform/x86/amd/pmc/pmc-quirks.c +++ b/drivers/platform/x86/amd/pmc/pmc-quirks.c @@ -28,10 +28,15 @@ static struct quirk_entry quirk_spurious_8042 = { .spurious_8042 = true, };
+static struct quirk_entry quirk_s2idle_spurious_8042 = { + .s2idle_bug_mmio = FCH_PM_BASE + FCH_PM_SCRATCH, + .spurious_8042 = true, +}; + static const struct dmi_system_id fwbug_list[] = { { .ident = "L14 Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20X5"), @@ -39,7 +44,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14s Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20XF"), @@ -47,7 +52,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "X13 Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20XH"), @@ -55,7 +60,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14 Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20XK"), @@ -63,7 +68,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14 Gen1 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20UD"), @@ -71,7 +76,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14 Gen1 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20UE"), @@ -79,7 +84,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14s Gen1 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20UH"), @@ -87,7 +92,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "T14s Gen1 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20UJ"), @@ -95,7 +100,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "P14s Gen1 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20Y1"), @@ -103,7 +108,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "P14s Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "21A0"), @@ -111,7 +116,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "P14s Gen2 AMD", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "21A1"), @@ -152,7 +157,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "IdeaPad 1 14AMN7", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82VF"), @@ -160,7 +165,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "IdeaPad 1 15AMN7", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82VG"), @@ -168,7 +173,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "IdeaPad 1 15AMN7", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82X5"), @@ -176,7 +181,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "IdeaPad Slim 3 14AMN8", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82XN"), @@ -184,7 +189,7 @@ static const struct dmi_system_id fwbug_list[] = { }, { .ident = "IdeaPad Slim 3 15AMN8", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82XQ"), @@ -193,7 +198,7 @@ static const struct dmi_system_id fwbug_list[] = { /* https://gitlab.freedesktop.org/drm/amd/-/issues/4434 */ { .ident = "Lenovo Yoga 6 13ALC6", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "82ND"), @@ -202,7 +207,7 @@ static const struct dmi_system_id fwbug_list[] = { /* https://gitlab.freedesktop.org/drm/amd/-/issues/2684 */ { .ident = "HP Laptop 15s-eq2xxx", - .driver_data = &quirk_s2idle_bug, + .driver_data = &quirk_s2idle_spurious_8042, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "HP"), DMI_MATCH(DMI_PRODUCT_NAME, "HP Laptop 15s-eq2xxx"), @@ -285,6 +290,16 @@ void amd_pmc_quirks_init(struct amd_pmc_dev *dev) { const struct dmi_system_id *dmi_id;
+ /* + * IRQ1 may cause an interrupt during resume even without a keyboard + * press. + * + * Affects Renoir, Cezanne and Barcelo SoCs + * + * A solution is available in PMFW 64.66.0, but it must be activated by + * SBIOS. If SBIOS is known to have the fix a quirk can be added for + * a given system to avoid workaround. + */ if (dev->cpu_id == AMD_CPU_ID_CZN) dev->disable_8042_wakeup = true;
@@ -295,6 +310,5 @@ void amd_pmc_quirks_init(struct amd_pmc_dev *dev) if (dev->quirks->s2idle_bug_mmio) pr_info("Using s2idle quirk to avoid %s platform firmware bug\n", dmi_id->ident); - if (dev->quirks->spurious_8042) - dev->disable_8042_wakeup = true; + dev->disable_8042_wakeup = dev->quirks->spurious_8042; } diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c index 0b9b23eb7c2c..bd318fd02ccf 100644 --- a/drivers/platform/x86/amd/pmc/pmc.c +++ b/drivers/platform/x86/amd/pmc/pmc.c @@ -530,19 +530,6 @@ static int amd_pmc_get_os_hint(struct amd_pmc_dev *dev) static int amd_pmc_wa_irq1(struct amd_pmc_dev *pdev) { struct device *d; - int rc; - - /* cezanne platform firmware has a fix in 64.66.0 */ - if (pdev->cpu_id == AMD_CPU_ID_CZN) { - if (!pdev->major) { - rc = amd_pmc_get_smu_version(pdev); - if (rc) - return rc; - } - - if (pdev->major > 64 || (pdev->major == 64 && pdev->minor > 65)) - return 0; - }
d = bus_find_device_by_name(&serio_bus, NULL, "serio0"); if (!d)
From: Xianglai Li lixianglai@loongson.cn
[ Upstream commit 8ef7f3132e4005a103b382e71abea7ad01fbeb86 ]
When the CPU is offline, the timer of LoongArch is not correctly closed. This is harmless for real machines, but resulting in an excessively high cpu usage rate of the offline vCPU thread in the virtual machines.
To correctly close the timer, we have made the following modifications:
Register the cpu hotplug event (CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING) for LoongArch. This event's hooks will be called to close the timer when the CPU is offline.
Clear the timer interrupt when the timer is turned off. Since before the timer is turned off, there may be a timer interrupt that has already been in the pending state due to the interruption of the disabled, which also affects the halt state of the offline vCPU.
Signed-off-by: Xianglai Li lixianglai@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit and its context, here is my assessment:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix for Real User Impact
1. **Fixes a concrete bug affecting virtual machines**: The commit addresses a problem where offline vCPUs consume excessive CPU resources in virtual machines. This is a real performance bug that affects users running LoongArch VMs.
2. **Clear problem statement**: The commit message clearly describes the issue - when a CPU is offlined, the LoongArch timer is not properly disabled, leading to high CPU usage by offline vCPU threads in virtual environments.
## Small and Contained Change
The commit is relatively small and well-contained: - Adds ~20 lines of code for timer management hooks - Registers CPU hotplug callbacks using existing infrastructure (CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING) - The changes are isolated to the LoongArch timer subsystem
## Follows Established Patterns
1. **Uses standard kernel infrastructure**: The fix properly uses the cpuhotplug framework that other architectures already use (ARM, MIPS, RISCV all have similar CPUHP_AP_*_TIMER_STARTING entries).
2. **Similar to previous fixes**: Commit 355170a7ecac ("LoongArch: Implement constant timer shutdown interface") addressed a related issue with timer shutdown, and this commit completes the proper timer management during CPU hotplug.
## Minimal Risk of Regression
1. **Architecture-specific**: Changes are confined to LoongArch architecture code, with no impact on other architectures.
2. **Clear timer interrupt handling**: The fix properly clears pending timer interrupts when disabling the timer, preventing interrupt storms.
3. **Protected by proper locking**: Uses existing state_lock for synchronization.
## Virtual Machine Support is Important
With increasing use of virtualization, proper vCPU management is critical for production environments. High CPU usage by offline vCPUs can significantly impact VM performance and host resource utilization.
## Technical Correctness
The implementation correctly: - Enables timer interrupts on CPU startup (`set_csr_ecfg(ECFGF_TIMER)`) - Shuts down the timer on CPU dying (`constant_set_state_shutdown()`) - Clears pending timer interrupts (`write_csr_tintclear(CSR_TINTCLR_TI)`)
This is a straightforward bug fix that addresses a clear performance issue in virtual machine environments without introducing new features or architectural changes, making it an ideal candidate for stable backport.
arch/loongarch/kernel/time.c | 22 ++++++++++++++++++++++ include/linux/cpuhotplug.h | 1 + 2 files changed, 23 insertions(+)
diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c index 367906b10f81..f3092f2de8b5 100644 --- a/arch/loongarch/kernel/time.c +++ b/arch/loongarch/kernel/time.c @@ -5,6 +5,7 @@ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ #include <linux/clockchips.h> +#include <linux/cpuhotplug.h> #include <linux/delay.h> #include <linux/export.h> #include <linux/init.h> @@ -102,6 +103,23 @@ static int constant_timer_next_event(unsigned long delta, struct clock_event_dev return 0; }
+static int arch_timer_starting(unsigned int cpu) +{ + set_csr_ecfg(ECFGF_TIMER); + + return 0; +} + +static int arch_timer_dying(unsigned int cpu) +{ + constant_set_state_shutdown(this_cpu_ptr(&constant_clockevent_device)); + + /* Clear Timer Interrupt */ + write_csr_tintclear(CSR_TINTCLR_TI); + + return 0; +} + static unsigned long get_loops_per_jiffy(void) { unsigned long lpj = (unsigned long)const_clock_freq; @@ -172,6 +190,10 @@ int constant_clockevent_init(void) lpj_fine = get_loops_per_jiffy(); pr_info("Constant clock event device register\n");
+ cpuhp_setup_state(CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING, + "clockevents/loongarch/timer:starting", + arch_timer_starting, arch_timer_dying); + return 0; }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index df366ee15456..e62064cb9e08 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -169,6 +169,7 @@ enum cpuhp_state { CPUHP_AP_QCOM_TIMER_STARTING, CPUHP_AP_TEGRA_TIMER_STARTING, CPUHP_AP_ARMADA_TIMER_STARTING, + CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING, CPUHP_AP_MIPS_GIC_TIMER_STARTING, CPUHP_AP_ARC_TIMER_STARTING, CPUHP_AP_REALTEK_TIMER_STARTING,
From: Johannes Thumshirn johannes.thumshirn@wdc.com
[ Upstream commit f0ba0e7172a222ea6043b61ecd86723c46d7bcf2 ]
Don't call ZONE FINISH for conventional zones as this will result in I/O errors. Instead check if the zone that needs finishing is a conventional zone and if yes skip it.
Also factor out the actual handling of finishing a single zone into a helper function, as do_zone_finish() is growing ever bigger and the indentations levels are getting higher.
Reviewed-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix Analysis
1. **Critical Bug Fix**: The commit fixes a concrete bug where calling `REQ_OP_ZONE_FINISH` on conventional zones results in I/O errors. This is a real runtime error that affects users of btrfs on zoned storage devices that have conventional zones mixed with sequential zones.
2. **Small and Contained Change**: The fix is relatively small and well- contained: - Adds a new helper function `call_zone_finish()` that encapsulates the zone finishing logic - Most importantly, adds the critical check: `if (btrfs_dev_is_sequential(device, physical))` before issuing the `REQ_OP_ZONE_FINISH` operation - The refactoring merely moves existing code into the helper function without changing the logic
3. **Clear Root Cause**: The bug occurs because the original code unconditionally calls `blkdev_zone_mgmt()` with `REQ_OP_ZONE_FINISH` on all zones, but this operation is invalid for conventional zones (non-sequential zones). The fix properly checks if a zone is sequential before attempting to finish it.
4. **No New Features**: This commit doesn't introduce any new functionality - it's purely a bug fix that prevents I/O errors.
5. **Minimal Risk**: The change has minimal risk of regression: - The check for sequential zones is straightforward - The refactoring doesn't change the existing logic flow - The fix has been reviewed by multiple developers familiar with the zoned code
6. **Affects Real Users**: This bug affects users running btrfs on SMR (Shingled Magnetic Recording) drives or ZNS (Zoned Namespace) SSDs that have a mix of conventional and sequential zones, which is a common configuration.
## Code Analysis
The key fix in `call_zone_finish()` at line 2262: ```c if (btrfs_dev_is_sequential(device, physical)) { // Only call zone finish for sequential zones ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH, ...); } ```
This prevents the invalid operation on conventional zones while maintaining the correct behavior for sequential zones. The subsequent operations (updating reserved_active_zones and clearing active zone) are still performed regardless of zone type, which is the correct behavior.
The commit follows stable kernel rules perfectly: it's a clear bug fix, has minimal changes, doesn't introduce new features, and addresses a real user-facing issue that causes I/O errors.
fs/btrfs/zoned.c | 55 ++++++++++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 20 deletions(-)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 5439d8374716..950e72dc537c 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2246,6 +2246,40 @@ static void wait_eb_writebacks(struct btrfs_block_group *block_group) rcu_read_unlock(); }
+static int call_zone_finish(struct btrfs_block_group *block_group, + struct btrfs_io_stripe *stripe) +{ + struct btrfs_device *device = stripe->dev; + const u64 physical = stripe->physical; + struct btrfs_zoned_device_info *zinfo = device->zone_info; + int ret; + + if (!device->bdev) + return 0; + + if (zinfo->max_active_zones == 0) + return 0; + + if (btrfs_dev_is_sequential(device, physical)) { + unsigned int nofs_flags; + + nofs_flags = memalloc_nofs_save(); + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH, + physical >> SECTOR_SHIFT, + zinfo->zone_size >> SECTOR_SHIFT); + memalloc_nofs_restore(nofs_flags); + + if (ret) + return ret; + } + + if (!(block_group->flags & BTRFS_BLOCK_GROUP_DATA)) + zinfo->reserved_active_zones++; + btrfs_dev_clear_active_zone(device, physical); + + return 0; +} + static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written) { struct btrfs_fs_info *fs_info = block_group->fs_info; @@ -2330,31 +2364,12 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ down_read(&dev_replace->rwsem); map = block_group->physical_map; for (i = 0; i < map->num_stripes; i++) { - struct btrfs_device *device = map->stripes[i].dev; - const u64 physical = map->stripes[i].physical; - struct btrfs_zoned_device_info *zinfo = device->zone_info; - unsigned int nofs_flags; - - if (!device->bdev) - continue; - - if (zinfo->max_active_zones == 0) - continue; - - nofs_flags = memalloc_nofs_save(); - ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH, - physical >> SECTOR_SHIFT, - zinfo->zone_size >> SECTOR_SHIFT); - memalloc_nofs_restore(nofs_flags);
+ ret = call_zone_finish(block_group, &map->stripes[i]); if (ret) { up_read(&dev_replace->rwsem); return ret; } - - if (!(block_group->flags & BTRFS_BLOCK_GROUP_DATA)) - zinfo->reserved_active_zones++; - btrfs_dev_clear_active_zone(device, physical); } up_read(&dev_replace->rwsem);
From: Timur Kristóf timur.kristof@gmail.com
[ Upstream commit 8246147f1fbaed522b8bcc02ca34e4260747dcfb ]
On some GPUs the VBIOS just doesn't have encoder caps, or maybe not for every encoder.
This isn't really a problem and it's handled well, so let's not litter the logs with it.
Signed-off-by: Timur Kristóf timur.kristof@gmail.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Rodrigo Siqueira siqueira@igalia.com Reviewed-by: Alex Hung alex.hung@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 33e0227ee96e62d034781e91f215e32fd0b1d512) Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of this commit, here is my determination:
**Backport Status: YES**
## Extensive Explanation:
This commit is a good candidate for backporting to stable kernel trees for the following reasons:
### 1. **It fixes a user-visible issue (log spam)** The commit addresses warning messages that unnecessarily clutter system logs. The code shows that when `get_encoder_cap_info()` returns `BP_RESULT_NORECORD` (meaning the VBIOS doesn't have encoder capabilities for that specific encoder), it was incorrectly logging a warning. Looking at the code pattern across the AMD display driver, `BP_RESULT_NORECORD` is a normal, expected condition - not an error.
### 2. **The fix is minimal and contained** The change is extremely simple - it only modifies the condition check from: ```c if (BP_RESULT_OK == result) { // handle success } else { DC_LOG_WARNING(...); // Always warn on non-OK } ``` to: ```c if (result == BP_RESULT_OK) { // handle success } else if (result != BP_RESULT_NORECORD) { DC_LOG_WARNING(...); // Only warn on actual errors } ```
### 3. **No functional changes or new features** The commit doesn't change any behavior - it only suppresses inappropriate warning messages. The driver already handles the missing encoder caps gracefully (as noted in the commit message: "This isn't really a problem and it's handled well").
### 4. **Low risk of regression** The change only affects logging behavior. It doesn't modify: - Any hardware initialization sequences - Any encoder capabilities detection logic - Any functional paths in the driver - Any data structures or APIs
### 5. **Pattern consistency across the codebase** My grep analysis shows that `BP_RESULT_NORECORD` is commonly handled without warnings in other parts of the AMD display driver. For example, in `dc_link.c`, it uses `ASSERT(bp_result == BP_RESULT_NORECORD)` showing it's an expected condition. Many other functions simply return `BP_RESULT_NORECORD` without logging warnings.
### 6. **Applied to two identical code paths** The commit applies the same fix to both `dce110_link_encoder_construct()` and `dce60_link_encoder_construct()` functions, maintaining consistency and preventing the warning in both DCE 11.0 and DCE 6.0 hardware paths.
### 7. **Already cherry-picked** The commit message shows "(cherry picked from commit 33e0227ee96e...)", indicating it was already deemed important enough to cherry-pick to another branch, suggesting its stability importance.
### 8. **Quality of Life improvement for users** Reducing unnecessary log spam improves the user experience and makes it easier to identify real problems in system logs. This is especially important for users monitoring their systems for actual issues.
The commit follows stable tree rules perfectly: it's a small, contained fix for a real issue (log spam) that affects users, with minimal risk of introducing regressions.
drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c index 4a9d07c31bc5..0c50fe266c8a 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c @@ -896,13 +896,13 @@ void dce110_link_encoder_construct( enc110->base.id, &bp_cap_info);
/* Override features with DCE-specific values */ - if (BP_RESULT_OK == result) { + if (result == BP_RESULT_OK) { enc110->base.features.flags.bits.IS_HBR2_CAPABLE = bp_cap_info.DP_HBR2_EN; enc110->base.features.flags.bits.IS_HBR3_CAPABLE = bp_cap_info.DP_HBR3_EN; enc110->base.features.flags.bits.HDMI_6GB_EN = bp_cap_info.HDMI_6GB_EN; - } else { + } else if (result != BP_RESULT_NORECORD) { DC_LOG_WARNING("%s: Failed to get encoder_cap_info from VBIOS with error code %d!\n", __func__, result); @@ -1798,13 +1798,13 @@ void dce60_link_encoder_construct( enc110->base.id, &bp_cap_info);
/* Override features with DCE-specific values */ - if (BP_RESULT_OK == result) { + if (result == BP_RESULT_OK) { enc110->base.features.flags.bits.IS_HBR2_CAPABLE = bp_cap_info.DP_HBR2_EN; enc110->base.features.flags.bits.IS_HBR3_CAPABLE = bp_cap_info.DP_HBR3_EN; enc110->base.features.flags.bits.HDMI_6GB_EN = bp_cap_info.HDMI_6GB_EN; - } else { + } else if (result != BP_RESULT_NORECORD) { DC_LOG_WARNING("%s: Failed to get encoder_cap_info from VBIOS with error code %d!\n", __func__, result);
From: Yang Li yang.li@amlogic.com
[ Upstream commit de5d7d3f27ddd4046736f558a40e252ddda82013 ]
list_empty(&hdev->adv_instances) is always true during startup, so an advertising instance is added by default.
Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 hci_setup_ext_adv_instance_sync+0x17c/0x328 hci_powered_update_adv_sync+0xb4/0x12c hci_powered_update_sync+0x54/0x70 hci_power_on_sync+0xe4/0x278 hci_set_powered_sync+0x28/0x34 set_powered_sync+0x40/0x58 hci_cmd_sync_work+0x94/0x100 process_one_work+0x168/0x444 worker_thread+0x378/0x3f4 kthread+0x108/0x10c ret_from_fork+0x10/0x20
Link: https://github.com/bluez/bluez/issues/1442 Signed-off-by: Yang Li yang.li@amlogic.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Analysis
The commit fixes a **logic error** in `hci_powered_update_adv_sync()` that causes unnecessary default advertising instance creation during Bluetooth startup. The problematic code at line 3290 uses an OR condition:
```c if (hci_dev_test_flag(hdev, HCI_ADVERTISING) || list_empty(&hdev->adv_instances)) ```
This condition is **always true** during startup because `list_empty(&hdev->adv_instances)` returns true when the advertising instances list is initially empty. This causes the code block to execute unconditionally, creating a default advertising instance even when not needed.
## The Fix
The patch changes the OR (`||`) to AND (`&&`):
```c if (hci_dev_test_flag(hdev, HCI_ADVERTISING) && list_empty(&hdev->adv_instances)) ```
This ensures the default advertising setup only occurs when: 1. The HCI_ADVERTISING flag is explicitly set, AND 2. There are no advertising instances configured
## Why This Should Be Backported
1. **Clear Bug Fix**: This is a straightforward logic error that causes incorrect behavior during Bluetooth initialization. The stack trace in the commit message shows this happens during normal startup flow (`hci_power_on_sync` → `hci_powered_update_sync` → `hci_powered_update_adv_sync`).
2. **Small and Contained**: The fix is a single character change (|| to &&) that only affects the conditional logic. No architectural changes or new features are introduced.
3. **Prevents Resource Waste**: The bug causes unnecessary advertising instance creation on every Bluetooth startup, which wastes system resources and may interfere with user-configured advertising settings.
4. **Low Risk**: The change is minimal and only affects the specific condition for creating default advertising. The same pattern (checking both flags with AND) is already used in other parts of the codebase (e.g., `reenable_adv_sync()` function).
5. **User-Visible Impact**: The issue has an associated BlueZ bug report (#1442), indicating real users are affected by this problem.
6. **Long-Standing Issue**: The problematic code was introduced in commit cf75ad8b41d2a (October 2021), meaning this bug has been affecting users for an extended period across multiple kernel versions.
The fix follows stable kernel rules perfectly: it's a important bugfix with minimal code change and very low regression risk, making it an ideal candidate for stable backporting.
net/bluetooth/hci_sync.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 7938c004071c..795952d5f921 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -3344,7 +3344,7 @@ static int hci_powered_update_adv_sync(struct hci_dev *hdev) * advertising data. This also applies to the case * where BR/EDR was toggled during the AUTO_OFF phase. */ - if (hci_dev_test_flag(hdev, HCI_ADVERTISING) || + if (hci_dev_test_flag(hdev, HCI_ADVERTISING) && list_empty(&hdev->adv_instances)) { if (ext_adv_capable(hdev)) { err = hci_setup_ext_adv_instance_sync(hdev, 0x00);
From: Shinji Nomoto fj5851bi@fujitsu.com
[ Upstream commit b3eaf14f4c63fd6abc7b68c6d7a07c5680a6d8e5 ]
The set subcommand's -t option is documented as being available for boost configuration, but it was not actually functioning due to a bug in the option handling.
Link: https://lore.kernel.org/r/20250522061122.2149188-2-fj5851bi@fujitsu.com Signed-off-by: Shinji Nomoto fj5851bi@fujitsu.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a clear bug fix that should be backported to stable kernel trees. Here's my detailed analysis:
## Bug Analysis
The commit fixes a critical bug where the `-t` option for the `cpupower set` subcommand was documented and accepted as a valid option (defined in `set_opts[]` structure), but was completely non-functional due to a missing character in the `getopt_long()` call.
### Specific Code Changes
The bug is in line 58-59 of the original code: ```c while ((ret = getopt_long(argc, argv, "b:e:m:", set_opts, NULL)) != -1) { ```
The fix adds the missing `t:` to the argument string: ```c while ((ret = getopt_long(argc, argv, "b:e:m:t:", set_opts, NULL)) != -1) { ```
### Root Cause
When the turbo-boost feature was added in commit `eb426fc6bdd6` (merged in v6.6-rc1), the developer: 1. Added the option to the `set_opts[]` array correctly 2. Added the case handler for 't' in the switch statement 3. **BUT forgot to add "t:" to the getopt_long argument string**
This means that while the option was defined and had handling code, `getopt_long()` would never recognize `-t` as a valid option, making the feature completely unusable.
## Why This Should Be Backported
1. **Clear Bug Fix**: This is an obvious bug where documented functionality doesn't work at all. Users trying to use `cpupower set -t` or `--turbo-boost` would get errors.
2. **Small and Contained**: The fix is a single-line change adding 2 characters ("t:") to a string. This is as minimal as fixes get.
3. **No Side Effects**: The change only enables already-implemented functionality. It doesn't introduce new code paths or change existing behavior.
4. **User Impact**: The turbo-boost control feature is important for power management, and users on stable kernels with v6.6+ would expect this documented feature to work.
5. **Affects Stable Versions**: The bug was introduced in v6.6-rc1 and affects all kernels from v6.6 onwards that include the turbo-boost feature.
6. **Low Risk**: There's virtually no regression risk - the worst case is the option continues not working, which is the current state.
This is exactly the type of fix that stable kernel rules recommend: a clear bug fix that restores documented functionality with minimal code change and no architectural modifications.
tools/power/cpupower/utils/cpupower-set.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/power/cpupower/utils/cpupower-set.c b/tools/power/cpupower/utils/cpupower-set.c index 0677b58374ab..59ace394cf3e 100644 --- a/tools/power/cpupower/utils/cpupower-set.c +++ b/tools/power/cpupower/utils/cpupower-set.c @@ -62,8 +62,8 @@ int cmd_set(int argc, char **argv)
params.params = 0; /* parameter parsing */ - while ((ret = getopt_long(argc, argv, "b:e:m:", - set_opts, NULL)) != -1) { + while ((ret = getopt_long(argc, argv, "b:e:m:t:", + set_opts, NULL)) != -1) { switch (ret) { case 'b': if (params.perf_bias)
From: Piotr Zalewski pZ010001011111@proton.me
[ Upstream commit a52dffaa46c2c5ff0b311c4dc1288581f7b9109e ]
Make video port registers nonvolatile. As DSP_CTRL register is written to twice due to gamma LUT enable bit which is set outside of the main DSP_CTRL initialization within atomic_enable (for rk356x case it is also necessary to always disable gamma LUT before writing a new LUT) there is a chance that DSP_CTRL value read-out in gamma LUT init/update code is not the one which was written by the preceding DSP_CTRL initialization code within atomic_enable. This might result in misconfigured DSP_CTRL which leads to no visual output[1]. Since DSP_CTRL write takes effect after VSYNC[1] the issue is not always present. When tested on Pinetab2 with kernel 6.14 it happenes only when DRM is compiled as a module[1]. In order to confirm that it is a timing issue I inserted 18ms udelay before vop2_crtc_atomic_try_set_gamma in atomic enable and compiled DRM as module - this has also fixed the issue.
[1] https://lore.kernel.org/linux-rockchip/562b38e5.a496.1975f09f983.Coremail.an...
Reported-by: Diederik de Haas didi.debian@cknow.org Closes: https://lore.kernel.org/linux-rockchip/DAEVDSTMWI1E.J454VZN0R9MA@cknow.org/ Suggested-by: Andy Yan andy.yan@rock-chips.com Signed-off-by: Piotr Zalewski pZ010001011111@proton.me Tested-by: Diederik de Haas didi.debian@cknow.org Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://lore.kernel.org/r/20250706083629.140332-2-pZ010001011111@proton.me Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis of the commit, let me provide my determination:
**Backport Status: YES**
## Extensive Analysis:
This commit is a strong candidate for backporting to stable kernel trees for the following reasons:
### 1. **Bug Fix Nature** The commit fixes a real bug that causes **no visual output** on displays. This is a critical user-visible issue where the display driver fails to produce any image output. The commit message clearly describes this as fixing a situation where "misconfigured DSP_CTRL which leads to no visual output."
### 2. **Race Condition Fix** The code changes address a race condition in register access patterns. The issue occurs because: - DSP_CTRL register is written twice during atomic_enable - The second write happens for gamma LUT configuration - Due to the register being volatile, read-modify-write operations can read stale values - DSP_CTRL writes only take effect after VSYNC, creating a timing window for the race
### 3. **Small and Contained Change** The actual code change is minimal - just one line adding the VP register range to the nonvolatile list: ```c + regmap_reg_range(RK3568_VP0_CTRL_BASE, RK3588_VP3_CTRL_BASE + 255), ```
This makes the video port registers (0xc00-0xeff based on the offsets) use the regmap cache instead of reading back from hardware, preventing the race condition.
### 4. **Hardware-Specific Fix** The fix is specific to Rockchip VOP2 hardware (RK3568/RK3588 SoCs) and doesn't affect other subsystems. It's confined to the DRM/Rockchip display driver.
### 5. **Tested and Confirmed** The commit includes: - A "Reported-by" tag indicating a real user issue - A "Tested-by" tag confirming the fix works - Clear reproduction details (happens on Pinetab2 with kernel 6.14 when DRM is compiled as module)
### 6. **Low Risk of Regression** The change only affects how registers are cached in the regmap framework. It doesn't change any functional logic, algorithms, or introduce new features. The comment update accurately reflects that both window AND video port registers are now marked as non-volatile for cache coherency.
### 7. **Follows Stable Rules** This fix aligns with stable kernel rules: - Fixes a real bug that users are hitting - Small change (essentially 1 line of functional code) - Already tested by reporter - Fix is obvious once the problem is understood - No new features or architectural changes
### 8. **Critical Display Functionality** Display output is fundamental functionality. A bug that causes "no visual output" severely impacts usability of devices using these SoCs, making this a high-priority fix for stable trees.
The commit properly documents the issue with external references and follows good practices for a bug fix that should be backported to stable kernels.
drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c index 186f6452a7d3..b50927a824b4 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c @@ -2579,12 +2579,13 @@ static int vop2_win_init(struct vop2 *vop2) }
/* - * The window registers are only updated when config done is written. - * Until that they read back the old value. As we read-modify-write - * these registers mark them as non-volatile. This makes sure we read - * the new values from the regmap register cache. + * The window and video port registers are only updated when config + * done is written. Until that they read back the old value. As we + * read-modify-write these registers mark them as non-volatile. This + * makes sure we read the new values from the regmap register cache. */ static const struct regmap_range vop2_nonvolatile_range[] = { + regmap_reg_range(RK3568_VP0_CTRL_BASE, RK3588_VP3_CTRL_BASE + 255), regmap_reg_range(0x1000, 0x23ff), };
linux-stable-mirror@lists.linaro.org