From: John Garry john.g.garry@oracle.com
[ Upstream commit 448dfecc7ff807822ecd47a5c052acedca7d09e8 ]
In blk_stack_limits(), we check that the t->chunk_sectors value is a multiple of the t->physical_block_size value.
However, by finding the chunk_sectors value in bytes, we may overflow the unsigned int which holds chunk_sectors, so change the check to be based on sectors.
Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: John Garry john.g.garry@oracle.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Link: https://lore.kernel.org/r/20250729091448.1691334-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
## Extensive Analysis
This commit should be backported to stable kernel trees for the following reasons:
### 1. **It fixes a real integer overflow bug** The original code performs `(t->chunk_sectors << 9)` which can cause an integer overflow. Since both `chunk_sectors` and `physical_block_size` are `unsigned int` (32-bit), when `chunk_sectors` is larger than 8,388,607 (2^23 - 1), shifting it left by 9 bits (multiplying by 512) will overflow the 32-bit unsigned integer. This overflow can lead to incorrect alignment checks.
### 2. **The fix is minimal and contained** The change is a single-line modification that replaces: ```c if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) ``` with: ```c if (t->chunk_sectors % (t->physical_block_size >> SECTOR_SHIFT)) ```
This mathematically equivalent check avoids the overflow by dividing `physical_block_size` by the sector size (512 bytes) instead of multiplying `chunk_sectors` by 512.
### 3. **It's a correctness issue in critical block layer code** The `blk_stack_limits()` function is fundamental to the block layer's device stacking functionality, used by MD, DM, and other stacked block devices. An overflow here could lead to: - Incorrect alignment detection - Improper chunk_sectors being cleared when it shouldn't be - Potential data corruption or I/O errors in stacked block devices
### 4. **The fix has low regression risk** - The new calculation is mathematically equivalent to the intended check - It only changes how the comparison is performed, not the logic - The fix has been reviewed by multiple maintainers (Hannes Reinecke, Martin K. Petersen, Damien Le Moal) - No new features or architectural changes are introduced
### 5. **Real-world impact potential** Modern storage devices can have large chunk sizes. For example: - NVMe devices with large zones could have chunk_sectors approaching or exceeding the overflow threshold - RAID configurations with large stripe sizes - This becomes more likely as storage devices grow in capacity and complexity
### 6. **Follows stable kernel criteria** According to stable kernel rules, this qualifies because it: - Fixes a bug that users can potentially hit - Is obviously correct and tested - Is small (single line change) - Fixes only one thing - Has no dependencies on other patches
The commit message clearly describes the problem (overflow in unsigned int) and the solution (changing the check to avoid overflow). The fix maintains the same semantic meaning while being overflow-safe, making it an ideal candidate for stable backporting.
block/blk-settings.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c index a000daafbfb4..88890e904320 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -779,7 +779,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, }
/* chunk_sectors a multiple of the physical block size? */ - if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) { + if (t->chunk_sectors % (t->physical_block_size >> SECTOR_SHIFT)) { t->chunk_sectors = 0; t->flags |= BLK_FLAG_MISALIGNED; ret = -1;
From: Len Brown len.brown@intel.com
[ Upstream commit e60a13bcef206795d3ddf82f130fe8f570176d06 ]
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/ may be readable by all, but /sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz may be readable only by root.
Non-root turbostat users see complaints in this scenario.
Fail probe of the interface if we can't read current_freq_khz.
Reported-by: Artem Bityutskiy artem.bityutskiy@linux.intel.com Original-patch-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Len Brown len.brown@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Now I have a clear understanding of the issue and fix. Let me analyze the code change:
## Backport Status: **YES**
## Extensive Analysis:
This commit should be backported to stable kernel trees for the following reasons:
### 1. **Bug Fix for Real User Issue** The commit fixes a specific bug where non-root users of turbostat encounter errors when the sysfs permissions are inconsistent. As stated in the commit message: - The directory `/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/` may be readable by all users - But the file `current_freq_khz` within that directory may only be readable by root - This causes turbostat to fail with complaints for non-root users
### 2. **Simple and Contained Fix** The code change is minimal and focused: ```c // Before: Only checked if the directory was readable if (access(path_base, R_OK)) continue;
// After: Check if the actual file we need (current_freq_khz) is readable sprintf(path, "%s/current_freq_khz", path_base); if (access(path, R_OK)) continue; ```
The fix simply moves the permission check from the directory to the actual file that will be read later (line 6763-6765 in the original code reads `current_freq_khz`).
### 3. **Follows Established Pattern** Looking at the `probe_intel_uncore_frequency_cluster()` function (line 6776), it already checks the specific file permission: ```c if (access("/sys/devices/system/cpu/intel_uncore_frequency/uncore00/curr ent_freq_khz", R_OK)) return; ``` This fix makes the legacy function consistent with the cluster function's approach.
### 4. **No Risk of Regression** - The change only affects the probing logic - if the file cannot be read, the function simply skips that uncore frequency entry - This is better than the current behavior where it would try to read the file anyway and fail with errors - The fix maintains backward compatibility - systems where the file is readable continue to work as before
### 5. **Userspace Tool Fix** This is a fix for a userspace tool (turbostat) in the tools/ directory, not kernel code, making it even safer to backport as it cannot cause kernel instability.
### 6. **Clear Problem and Solution** - The problem is well-defined and reported by a real user (Artem Bityutskiy) - The solution is straightforward and logical - The fix prevents error messages that would confuse users running turbostat without root privileges
### 7. **Meets Stable Criteria** According to stable kernel rules, this fix: - Fixes a real bug that affects users ✓ - Is small and contained (2 line change) ✓ - Has minimal risk of introducing new issues ✓ - Doesn't introduce new features ✓ - Doesn't make architectural changes ✓
The commit improves the robustness of turbostat when dealing with permission inconsistencies in sysfs, which is a reasonable scenario that can occur in production systems with specific security configurations.
tools/power/x86/turbostat/turbostat.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 5230e072e414..bf011c2847f2 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -6740,7 +6740,8 @@ static void probe_intel_uncore_frequency_legacy(void) sprintf(path_base, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d", i, j);
- if (access(path_base, R_OK)) + sprintf(path, "%s/current_freq_khz", path_base); + if (access(path, R_OK)) continue;
BIC_PRESENT(BIC_UNCORE_MHZ);
From: Thomas Croft thomasmcft@gmail.com
[ Upstream commit dbe05428c4e54068a86e7e02405f3b30b1d2b3dd ]
Several months ago, Joshua Grisham submitted a patch [1] for several ALC298 based sound cards.
The entry for the LG gram 16 in the alc269_fixup_tbl only matches the Subsystem ID for the 16Z90R-Q and 16Z90R-K models [2]. My 16Z90R-A has a different Subsystem ID [3]. I'm not sure why these IDs differ, but I speculate it's due to the NVIDIA GPU included in the 16Z90R-A model that isn't present in the other models.
I applied the patch to the latest Arch Linux kernel and the card was initialized as expected.
[1]: https://lore.kernel.org/linux-sound/20240909193000.838815-1-josh@joshuagrish... [2]: https://linux-hardware.org/?id=pci:8086-51ca-1854-0488 [3]: https://linux-hardware.org/?id=pci:8086-51ca-1854-0489
Signed-off-by: Thomas Croft thomasmcft@gmail.com Link: https://patch.msgid.link/20250804151457.134761-2-thomasmcft@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a strong candidate for backporting to stable kernel trees for the following reasons:
## Analysis of the Commit
### 1. **Bug Fix Nature** This commit fixes a hardware enablement issue where the LG gram 16Z90R-A laptop's audio hardware was not properly initialized. The existing quirk table entry only covered subsystem IDs 0x0488 (for 16Z90R-Q and 16Z90R-K models) but missed ID 0x0489 (for the 16Z90R-A model with NVIDIA GPU).
### 2. **Minimal and Contained Change** The change is extremely minimal - it adds exactly **one line** to the quirk table: ```c SND_PCI_QUIRK(0x1854, 0x0489, "LG gram 16 (16Z90R-A)", ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS), ```
### 3. **Low Risk of Regression** - The change only affects devices with the specific PCI subsystem ID (0x1854:0x0489) - It reuses an existing, proven fixup (`ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS`) that's already used for similar LG gram models - The fixup function `alc298_fixup_samsung_amp_v2_4_amps` initializes speaker amplifiers, which is critical for proper audio output - No architectural changes or new features are introduced
### 4. **User Impact** Without this fix, users of the LG gram 16Z90R-A laptop experience: - Non-functional or improperly initialized audio hardware - Potential inability to use speakers or headphones correctly - Poor user experience requiring manual workarounds
### 5. **Proven Solution** - The commit message indicates the patch was tested on Arch Linux kernel - The fixup is identical to what's used for other 16Z90R variants (0x0488) and 17ZD90R (0x048a) - The Samsung amp v2 with 4 amplifiers initialization is a known working solution for this hardware family
### 6. **Hardware Enablement** This is a classic hardware enablement fix that allows existing kernel functionality to work with a specific hardware variant. Such fixes are routinely backported to stable kernels to ensure hardware compatibility.
The commit follows stable kernel criteria perfectly: it's a small, targeted fix for a real hardware issue that affects users, with minimal risk of introducing regressions since it only activates for one specific hardware ID.
sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 2627e2f49316..8544fbd816fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11398,6 +11398,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1854, 0x0440, "LG CQ6", ALC256_FIXUP_HEADPHONE_AMP_VOL), SND_PCI_QUIRK(0x1854, 0x0441, "LG CQ6 AIO", ALC256_FIXUP_HEADPHONE_AMP_VOL), SND_PCI_QUIRK(0x1854, 0x0488, "LG gram 16 (16Z90R)", ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS), + SND_PCI_QUIRK(0x1854, 0x0489, "LG gram 16 (16Z90R-A)", ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS), SND_PCI_QUIRK(0x1854, 0x048a, "LG gram 17 (17ZD90R)", ALC298_FIXUP_SAMSUNG_AMP_V2_4_AMPS), SND_PCI_QUIRK(0x19e5, 0x3204, "Huawei MACH-WX9", ALC256_FIXUP_HUAWEI_MACH_WX9_PINS), SND_PCI_QUIRK(0x19e5, 0x320f, "Huawei WRT-WX9 ", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE),
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 42e6c6ce03fd3e41e39a0f93f9b1a1d9fa664338 ]
Currently elevators will record internal 'async_depth' to throttle asynchronous requests, and they both calculate shallow_dpeth based on sb->shift, with the respect that sb->shift is the available tags in one word.
However, sb->shift is not the availbale tags in the last word, see __map_depth:
if (index == sb->map_nr - 1) return sb->depth - (index << sb->shift);
For consequence, if the last word is used, more tags can be get than expected, for example, assume nr_requests=256 and there are four words, in the worst case if user set nr_requests=32, then the first word is the last word, and still use bits per word, which is 64, to calculate async_depth is wrong.
One the ohter hand, due to cgroup qos, bfq can allow only one request to be allocated, and set shallow_dpeth=1 will still allow the number of words request to be allocated.
Fix this problems by using shallow_depth to the whole sbitmap instead of per word, also change kyber, mq-deadline and bfq to follow this, a new helper __map_depth_with_shallow() is introduced to calculate available bits in each word.
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Real Bug Fixes Identified
1. **Tag Allocation Overflow Bug**: The commit fixes a serious bug where more tags can be allocated than expected. As stated in the commit message: "if the last word is used, more tags can be get than expected". For example, with nr_requests=256 and four words, if nr_requests is set to 32, the calculation incorrectly uses 64 bits (full word size) instead of the correct limited depth.
2. **BFQ QoS Violation**: The bug specifically breaks BFQ's quality-of- service guarantees. When BFQ tries to limit a cgroup to only 1 request (shallow_depth=1), the bug allows "the number of words request to be allocated" instead, completely violating the intended throttling mechanism.
## Impact on Users
The bugs affect critical I/O scheduling functionality:
- **Performance Issues**: Async I/O can starve sync I/O by consuming more tags than intended - **Cgroup QoS Failure**: Cgroup-based I/O throttling doesn't work correctly, allowing one cgroup to starve others - **Incorrect Resource Allocation**: The system allocates more resources than configured limits
## Code Analysis Shows Clear Fixes
The changes fix the calculation errors by:
1. **Converting shallow_depth semantics** from per-word to whole-sbitmap basis (see `__map_depth_with_shallow()` function) 2. **Updating all affected schedulers** (BFQ, Kyber, mq-deadline) to use the corrected calculation 3. **Simplifying sbq_calc_wake_batch()** to use the correct depth calculation
## Meets Stable Criteria
- **Fixes real bugs**: Addresses actual resource allocation and QoS violations - **Obviously correct**: The math error is clear - using per-word depth when total depth is needed - **Well-tested**: Patch has been reviewed and signed-off by subsystem maintainer (Jens Axboe) - **Size reasonable**: While larger than typical, the changes are necessary to fix the bug across all affected schedulers - **No new features**: Only fixes existing broken functionality
The bug causes incorrect behavior in production systems using I/O scheduling with async depth limits or BFQ cgroup QoS, making this an important fix for stable kernels.
block/bfq-iosched.c | 35 ++++++++++++-------------- block/bfq-iosched.h | 3 +-- block/kyber-iosched.c | 9 ++----- block/mq-deadline.c | 16 +----------- include/linux/sbitmap.h | 6 ++--- lib/sbitmap.c | 56 +++++++++++++++++++++-------------------- 6 files changed, 52 insertions(+), 73 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 0cb1e9873aab..d68da9e92e1e 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -701,17 +701,13 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) { struct bfq_data *bfqd = data->q->elevator->elevator_data; struct bfq_io_cq *bic = bfq_bic_lookup(data->q); - int depth; - unsigned limit = data->q->nr_requests; - unsigned int act_idx; + unsigned int limit, act_idx;
/* Sync reads have full depth available */ - if (op_is_sync(opf) && !op_is_write(opf)) { - depth = 0; - } else { - depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)]; - limit = (limit * depth) >> bfqd->full_depth_shift; - } + if (op_is_sync(opf) && !op_is_write(opf)) + limit = data->q->nr_requests; + else + limit = bfqd->async_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) { /* Fast path to check if bfqq is already allocated. */ @@ -725,14 +721,16 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) * available requests and thus starve other entities. */ if (bfqq_request_over_limit(bfqd, bic, opf, act_idx, limit)) { - depth = 1; + limit = 1; break; } } + bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", - __func__, bfqd->wr_busy_queues, op_is_sync(opf), depth); - if (depth) - data->shallow_depth = depth; + __func__, bfqd->wr_busy_queues, op_is_sync(opf), limit); + + if (limit < data->q->nr_requests) + data->shallow_depth = limit; }
static struct bfq_queue * @@ -7128,9 +7126,8 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg) */ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) { - unsigned int depth = 1U << bt->sb.shift; + unsigned int nr_requests = bfqd->queue->nr_requests;
- bfqd->full_depth_shift = bt->sb.shift; /* * In-word depths if no bfq_queue is being weight-raised: * leaving 25% of tags only for sync reads. @@ -7142,13 +7139,13 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) * limit 'something'. */ /* no more than 50% of tags for async I/O */ - bfqd->word_depths[0][0] = max(depth >> 1, 1U); + bfqd->async_depths[0][0] = max(nr_requests >> 1, 1U); /* * no more than 75% of tags for sync writes (25% extra tags * w.r.t. async I/O, to prevent async I/O from starving sync * writes) */ - bfqd->word_depths[0][1] = max((depth * 3) >> 2, 1U); + bfqd->async_depths[0][1] = max((nr_requests * 3) >> 2, 1U);
/* * In-word depths in case some bfq_queue is being weight- @@ -7158,9 +7155,9 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) * shortage. */ /* no more than ~18% of tags for async I/O */ - bfqd->word_depths[1][0] = max((depth * 3) >> 4, 1U); + bfqd->async_depths[1][0] = max((nr_requests * 3) >> 4, 1U); /* no more than ~37% of tags for sync writes (~20% extra tags) */ - bfqd->word_depths[1][1] = max((depth * 6) >> 4, 1U); + bfqd->async_depths[1][1] = max((nr_requests * 6) >> 4, 1U); }
static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx) diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 687a3a7ba784..31217f196f4f 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -813,8 +813,7 @@ struct bfq_data { * Depth limits used in bfq_limit_depth (see comments on the * function) */ - unsigned int word_depths[2][2]; - unsigned int full_depth_shift; + unsigned int async_depths[2][2];
/* * Number of independent actuators. This is equal to 1 in diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 4dba8405bd01..bfd9a40bb33d 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -157,10 +157,7 @@ struct kyber_queue_data { */ struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
- /* - * Async request percentage, converted to per-word depth for - * sbitmap_get_shallow(). - */ + /* Number of allowed async requests. */ unsigned int async_depth;
struct kyber_cpu_latency __percpu *cpu_latency; @@ -454,10 +451,8 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx) { struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags; - unsigned int shift = tags->bitmap_tags.sb.shift; - - kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
+ kqd->async_depth = hctx->queue->nr_requests * KYBER_ASYNC_PERCENT / 100U; sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth); }
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 2edf1cac06d5..9ab6c6256695 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -487,20 +487,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx) return rq; }
-/* - * 'depth' is a number in the range 1..INT_MAX representing a number of - * requests. Scale it with a factor (1 << bt->sb.shift) / q->nr_requests since - * 1..(1 << bt->sb.shift) is the range expected by sbitmap_get_shallow(). - * Values larger than q->nr_requests have the same effect as q->nr_requests. - */ -static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int qdepth) -{ - struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags; - const unsigned int nrr = hctx->queue->nr_requests; - - return ((qdepth << bt->sb.shift) + nrr - 1) / nrr; -} - /* * Called by __blk_mq_alloc_request(). The shallow_depth value set by this * function is used by __blk_mq_get_tag(). @@ -517,7 +503,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) * Throttle asynchronous requests and writes such that these requests * do not block the allocation of synchronous requests. */ - data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth); + data->shallow_depth = dd->async_depth; }
/* Called by blk_mq_update_nr_requests(). */ diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 189140bf11fc..4adf4b364fcd 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -213,12 +213,12 @@ int sbitmap_get(struct sbitmap *sb); * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap, * limiting the depth used from each word. * @sb: Bitmap to allocate from. - * @shallow_depth: The maximum number of bits to allocate from a single word. + * @shallow_depth: The maximum number of bits to allocate from the bitmap. * * This rather specific operation allows for having multiple users with * different allocation limits. E.g., there can be a high-priority class that * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow() - * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority + * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority * class can only allocate half of the total bits in the bitmap, preventing it * from starving out the high-priority class. * @@ -478,7 +478,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, * sbitmap_queue, limiting the depth used from each word, with preemption * already disabled. * @sbq: Bitmap queue to allocate from. - * @shallow_depth: The maximum number of bits to allocate from a single word. + * @shallow_depth: The maximum number of bits to allocate from the queue. * See sbitmap_get_shallow(). * * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after diff --git a/lib/sbitmap.c b/lib/sbitmap.c index d3412984170c..c07e3cd82e29 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -208,8 +208,28 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map, return nr; }
+static unsigned int __map_depth_with_shallow(const struct sbitmap *sb, + int index, + unsigned int shallow_depth) +{ + u64 shallow_word_depth; + unsigned int word_depth, reminder; + + word_depth = __map_depth(sb, index); + if (shallow_depth >= sb->depth) + return word_depth; + + shallow_word_depth = word_depth * shallow_depth; + reminder = do_div(shallow_word_depth, sb->depth); + + if (reminder >= (index + 1) * word_depth) + shallow_word_depth++; + + return (unsigned int)shallow_word_depth; +} + static int sbitmap_find_bit(struct sbitmap *sb, - unsigned int depth, + unsigned int shallow_depth, unsigned int index, unsigned int alloc_hint, bool wrap) @@ -218,12 +238,12 @@ static int sbitmap_find_bit(struct sbitmap *sb, int nr = -1;
for (i = 0; i < sb->map_nr; i++) { - nr = sbitmap_find_bit_in_word(&sb->map[index], - min_t(unsigned int, - __map_depth(sb, index), - depth), - alloc_hint, wrap); + unsigned int depth = __map_depth_with_shallow(sb, index, + shallow_depth);
+ if (depth) + nr = sbitmap_find_bit_in_word(&sb->map[index], depth, + alloc_hint, wrap); if (nr != -1) { nr += index << sb->shift; break; @@ -406,27 +426,9 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show); static unsigned int sbq_calc_wake_batch(struct sbitmap_queue *sbq, unsigned int depth) { - unsigned int wake_batch; - unsigned int shallow_depth; - - /* - * Each full word of the bitmap has bits_per_word bits, and there might - * be a partial word. There are depth / bits_per_word full words and - * depth % bits_per_word bits left over. In bitwise arithmetic: - * - * bits_per_word = 1 << shift - * depth / bits_per_word = depth >> shift - * depth % bits_per_word = depth & ((1 << shift) - 1) - * - * Each word can be limited to sbq->min_shallow_depth bits. - */ - shallow_depth = min(1U << sbq->sb.shift, sbq->min_shallow_depth); - depth = ((depth >> sbq->sb.shift) * shallow_depth + - min(depth & ((1U << sbq->sb.shift) - 1), shallow_depth)); - wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1, - SBQ_WAKE_BATCH); - - return wake_batch; + return clamp_t(unsigned int, + min(depth, sbq->min_shallow_depth) / SBQ_WAIT_QUEUES, + 1, SBQ_WAKE_BATCH); }
int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
From: Cezary Rojewski cezary.rojewski@intel.com
[ Upstream commit 11f74f48c14c1f4fe16541900ea5944c42e30ccf ]
If pcim_request_all_regions() fails, error path operates on uninitialized 'bus' pointer. Found out by Coverity static analyzer.
Reviewed-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Signed-off-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://patch.msgid.link/20250730124906.351798-1-cezary.rojewski@intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
## Extensive Analysis:
This commit is an excellent candidate for backporting to stable kernel trees based on the following analysis:
### 1. **Clear Bug Fix for Uninitialized Pointer Access** The commit fixes a serious bug where an uninitialized `bus` pointer could be dereferenced in error paths. Looking at the original code flow: - Line 427: `struct hdac_bus *bus;` declares the pointer without initialization - Line 455-457: If `pcim_request_all_regions()` fails and returns an error, the function returns immediately - **Critical Issue**: At this point, `bus` has never been initialized (line 459 `bus = &adev->base.core;` would not be reached) - However, any error handling code that tries to use `bus` would crash with a null/uninitialized pointer dereference
### 2. **Minimal and Contained Fix** The fix is extremely simple and surgical: - Moves one line of code (`bus = &adev->base.core;`) from after the `pcim_request_all_regions()` call to immediately after `adev` allocation - This ensures `bus` is properly initialized before any potential error path that might use it - The change is only 2 lines (one deletion, one addition in a different location)
### 3. **No Functional Changes or Side Effects** - The fix doesn't change any logic or behavior in the success path - It merely ensures proper initialization order to prevent crashes - No new features or architectural changes are introduced - The initialization still happens at the same logical point in the flow (after `adev` is allocated and `avs_bus_init` succeeds)
### 4. **Found by Static Analysis** The commit message explicitly states this was "Found out by Coverity static analyzer", which indicates: - This is a real potential crash/security issue identified by automated tooling - Static analyzers are good at finding these initialization order bugs - The bug might be difficult to trigger in practice but represents a real vulnerability
### 5. **Affects Critical Audio Subsystem** This is in the Intel AVS (Audio Voice Speech) driver probe function, which: - Is part of system initialization for Intel audio hardware - A crash here could prevent audio subsystem from initializing - Could affect system stability on affected Intel platforms
### 6. **Clear Stable Tree Criteria Met** This commit meets multiple stable kernel criteria: - ✅ Fixes a real bug (uninitialized pointer dereference) - ✅ Minimal change (2 lines moved) - ✅ No new features - ✅ Low risk of regression (initialization order fix) - ✅ Could cause crashes/oops in error conditions - ✅ Found by static analysis (indicates real issue)
### 7. **Error Path Vulnerability** While the bug only manifests in error conditions (when `pcim_request_all_regions()` fails), error path bugs are particularly important to fix because: - They can be triggered by resource exhaustion or hardware issues - They can potentially be exploited for denial of service - They make debugging other issues more difficult when the error handler itself crashes
This is a textbook example of a commit that should be backported to stable trees - it fixes a clear bug with minimal risk and no functional changes.
sound/soc/intel/avs/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/intel/avs/core.c b/sound/soc/intel/avs/core.c index ec1b3f55cb5c..d45e9279df27 100644 --- a/sound/soc/intel/avs/core.c +++ b/sound/soc/intel/avs/core.c @@ -446,6 +446,8 @@ static int avs_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) adev = devm_kzalloc(dev, sizeof(*adev), GFP_KERNEL); if (!adev) return -ENOMEM; + bus = &adev->base.core; + ret = avs_bus_init(adev, pci, id); if (ret < 0) { dev_err(dev, "failed to init avs bus: %d\n", ret); @@ -456,7 +458,6 @@ static int avs_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) if (ret < 0) return ret;
- bus = &adev->base.core; bus->addr = pci_resource_start(pci, 0); bus->remap_addr = pci_ioremap_bar(pci, 0); if (!bus->remap_addr) {
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 550a194c5998e4e77affc6235e80d3766dc2d27e ]
It is already called long before we may hit this cleanup code path.
Cc: Steve French smfrench@gmail.com Cc: Tom Talpey tom@talpey.com Cc: Long Li longli@microsoft.com Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Based on my analysis, I can now provide a comprehensive answer about whether this commit should be backported.
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Analysis of the Bug Fix
1. **Clear Bug Identification**: The commit fixes a redundant initialization bug where `init_waitqueue_head(&info->conn_wait)` is called twice in the error path: - First initialization at line 1630 (in the normal flow) - Second redundant initialization at line 1696 (in the `negotiation_failed` error path)
2. **Bug Impact**: While calling `init_waitqueue_head()` twice on the same wait queue is not catastrophic, it is incorrect and could potentially cause issues: - The wait queue is already initialized and potentially in use (via `wait_event()` at line 1639-1642) - Re-initializing a wait queue that might have waiters or be in an intermediate state could lead to subtle synchronization issues - This violates the kernel's initialization patterns and could mask other bugs
3. **Fix Characteristics**: - **Minimal and Contained**: The fix is a single-line removal of redundant code - **Low Risk**: Removing a redundant initialization cannot introduce new bugs - **Clear Intent**: The commit message clearly states the issue and the fix - **No Feature Changes**: This is purely a bug fix with no functional changes
4. **Code Flow Analysis**: - At line 1630, `info->conn_wait` is properly initialized before `rdma_connect()` - The wait queue is then used at line 1639-1642 with `wait_event_interruptible_timeout()` - If `smbd_negotiate()` fails (line 1672-1675), the code jumps to `negotiation_failed` - The redundant re-initialization at line 1696 happens AFTER the wait queue has already been used - This could potentially corrupt any pending waiters or internal state
5. **Historical Context**: - Both `init_waitqueue_head()` calls were present since the initial commit (f198186aa9bbd from 2017) - This appears to be an original coding error that has persisted - The error path likely wasn't thoroughly tested or reviewed
6. **Stable Tree Criteria Met**: - ✅ Fixes a real bug (redundant initialization that could cause issues) - ✅ Small change (single line removal) - ✅ Obviously correct (removing duplicate initialization) - ✅ No risk of regression (removing redundant code) - ✅ Already tested (merged into mainline)
The commit satisfies all stable kernel criteria for backporting: it's a simple, obvious fix for a real bug with minimal risk of introducing regressions.
fs/smb/client/smbdirect.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 754e94a0e07f..75142c391d0c 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1735,7 +1735,6 @@ static struct smbd_connection *_smbd_get_connection( cancel_delayed_work_sync(&info->idle_timer_work); destroy_caches_and_workqueue(info); sc->status = SMBDIRECT_SOCKET_NEGOTIATE_FAILED; - init_waitqueue_head(&info->conn_wait); rdma_disconnect(sc->rdma.cm_id); wait_event(info->conn_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED);
From: Sergey Bashirov sergeybashirov@gmail.com
[ Upstream commit 81438498a285759f31e843ac4800f82a5ce6521f ]
Because of integer division, we need to carefully calculate the disk offset. Consider the example below for a stripe of 6 volumes, a chunk size of 4096, and an offset of 70000.
chunk = div_u64(offset, dev->chunk_size) = 70000 / 4096 = 17 offset = chunk * dev->chunk_size = 17 * 4096 = 69632 disk_offset_wrong = div_u64(offset, dev->nr_children) = 69632 / 6 = 11605 disk_chunk = div_u64(chunk, dev->nr_children) = 17 / 6 = 2 disk_offset = disk_chunk * dev->chunk_size = 2 * 4096 = 8192
Signed-off-by: Sergey Bashirov sergeybashirov@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20250701122341.199112-1-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix Nature The commit fixes a critical mathematical calculation error in the pNFS block/SCSI layout stripe mapping function. The bug causes incorrect disk offset calculations due to improper integer division, which could lead to data corruption or incorrect data access in striped pNFS block layouts.
## Specific Code Analysis
The bug is in the `bl_map_stripe()` function at line 276 of the original code: ```c disk_offset = div_u64(offset, dev->nr_children); ```
This calculation is fundamentally wrong. The commit message provides a clear example showing how the bug manifests: - With 6 volumes, chunk size 4096, and offset 70000 - The wrong calculation yields disk_offset = 11605 - The correct calculation yields disk_offset = 8192
The fix changes the calculation to: ```c disk_chunk = div_u64_rem(chunk, dev->nr_children, &chunk_idx); disk_offset = disk_chunk * dev->chunk_size; ```
This properly calculates which chunk on the specific disk should be accessed by: 1. First determining the disk_chunk number (which chunk on the target disk) 2. Then multiplying by chunk_size to get the actual byte offset
## Stable Backport Criteria Met
1. **Fixes a real bug**: Yes - incorrect stripe offset calculation leading to wrong data access 2. **Small and contained**: Yes - only 3 lines changed, localized to one function 3. **No major architectural changes**: Correct - simple math fix 4. **Low regression risk**: Yes - the new calculation is mathematically correct and doesn't change any interfaces 5. **Important for users**: Yes - data corruption/access issues in pNFS deployments are serious
## Historical Context Looking at the git history, this area has had previous stable-worthy fixes: - Commit 0914bb965e38 fixed an off-by-one error and was explicitly marked for stable (Cc: stable@vger.kernel.org # 3.17+) - Commit 5466112f0935 fixed 64-bit division issues in the same function
This indicates that `bl_map_stripe()` is a critical function that has needed careful attention for correctness, and fixes to it have historically been considered stable-worthy.
## Impact Assessment Without this fix, any system using pNFS block layout with striping could experience: - Data written to wrong disk locations - Data read from wrong disk locations - Potential data corruption or loss
The fix is essential for correct operation of pNFS block layouts with striping configurations.
fs/nfs/blocklayout/dev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index cab8809f0e0f..44306ac22353 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -257,10 +257,11 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset, struct pnfs_block_dev *child; u64 chunk; u32 chunk_idx; + u64 disk_chunk; u64 disk_offset;
chunk = div_u64(offset, dev->chunk_size); - div_u64_rem(chunk, dev->nr_children, &chunk_idx); + disk_chunk = div_u64_rem(chunk, dev->nr_children, &chunk_idx);
if (chunk_idx >= dev->nr_children) { dprintk("%s: invalid chunk idx %d (%lld/%lld)\n", @@ -273,7 +274,7 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset, offset = chunk * dev->chunk_size;
/* disk offset of the stripe */ - disk_offset = div_u64(offset, dev->nr_children); + disk_offset = disk_chunk * dev->chunk_size;
child = &dev->children[chunk_idx]; child->map(child, disk_offset, map);
From: Sergey Bashirov sergeybashirov@gmail.com
[ Upstream commit 9768797c219326699778fba9cd3b607b2f1e7950 ]
The error occurs on the third attempt to encode extents. When function ext_tree_prepare_commit() reallocates a larger buffer to retry encoding extents, the "layoutupdate_pages" page array is initialized only after the retry loop. But ext_tree_free_commitdata() is called on every iteration and tries to put pages in the array, thus dereferencing uninitialized pointers.
An additional problem is that there is no limit on the maximum possible buffer_size. When there are too many extents, the client may create a layoutcommit that is larger than the maximum possible RPC size accepted by the server.
During testing, we observed two typical scenarios. First, one memory page for extents is enough when we work with small files, append data to the end of the file, or preallocate extents before writing. But when we fill a new large file without preallocating, the number of extents can be huge, and counting the number of written extents in ext_tree_encode_commit() does not help much. Since this number increases even more between unlocking and locking of ext_tree, the reallocated buffer may not be large enough again and again.
Co-developed-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Sergey Bashirov sergeybashirov@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20250630183537.196479-2-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Critical Bug Fix - Uninitialized Pointer Dereference
The commit fixes a **critical memory safety bug** - an uninitialized pointer dereference in the pNFS block/SCSI layout code path. According to the commit message, when `ext_tree_prepare_commit()` needs to retry encoding extents (on the third attempt), it dereferences uninitialized pointers in the `layoutupdate_pages` array.
## Specific Code Analysis
Looking at the original buggy code flow:
1. **Line 571 (old code)**: Has a `retry:` label for the retry loop 2. **Line 574**: Calls `ext_tree_free_commitdata()` on every retry iteration 3. **Line 579-583**: The `layoutupdate_pages` array is allocated AFTER entering the retry path 4. **Line 591**: `goto retry` sends execution back, but now `ext_tree_free_commitdata()` will try to access the uninitialized array
The fix removes the dangerous retry loop and replaces it with a single retry attempt: - Removes the `goto retry` pattern (line 591 deleted) - Limits to one retry with proper initialization order - Changes buffer size calculation to use `NFS_SERVER(arg->inode)->wsize` instead of unbounded growth
## Additional Security Issue Fixed
The commit also fixes a secondary but important issue: **unbounded buffer growth**. The old code could create layoutcommit messages larger than the server's maximum RPC size, potentially causing: - Memory exhaustion on the client - Server rejection of oversized RPCs - Potential DoS conditions
## Stable Kernel Criteria Met
1. **Fixes a real bug**: Uninitialized pointer dereference that can cause kernel crashes 2. **Small and contained**: Changes are limited to one function in the pNFS subsystem 3. **Clear fix**: The solution is straightforward - remove the retry loop and properly order initialization 4. **No new features**: Pure bug fix, no functionality additions 5. **Low regression risk**: The fix simplifies the code path rather than adding complexity 6. **User impact**: Affects NFS users with pNFS block/SCSI layouts, particularly when writing large files
## Return Value Documentation
The commit also adds proper documentation for the return values, changing the function to return the actual status rather than always returning 0, which allows proper error handling upstream.
This is a textbook example of a commit that should be backported to stable kernels - it fixes a memory safety bug that can cause kernel crashes, with a minimal, well-understood fix.
fs/nfs/blocklayout/extent_tree.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/blocklayout/extent_tree.c b/fs/nfs/blocklayout/extent_tree.c index 8f7cff7a4293..0add0f329816 100644 --- a/fs/nfs/blocklayout/extent_tree.c +++ b/fs/nfs/blocklayout/extent_tree.c @@ -552,6 +552,15 @@ static int ext_tree_encode_commit(struct pnfs_block_layout *bl, __be32 *p, return ret; }
+/** + * ext_tree_prepare_commit - encode extents that need to be committed + * @arg: layout commit data + * + * Return values: + * %0: Success, all required extents are encoded + * %-ENOSPC: Some extents are encoded, but not all, due to RPC size limit + * %-ENOMEM: Out of memory, extents not encoded + */ int ext_tree_prepare_commit(struct nfs4_layoutcommit_args *arg) { @@ -568,12 +577,12 @@ ext_tree_prepare_commit(struct nfs4_layoutcommit_args *arg) start_p = page_address(arg->layoutupdate_page); arg->layoutupdate_pages = &arg->layoutupdate_page;
-retry: - ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, &count, &arg->lastbytewritten); + ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, + &count, &arg->lastbytewritten); if (unlikely(ret)) { ext_tree_free_commitdata(arg, buffer_size);
- buffer_size = ext_tree_layoutupdate_size(bl, count); + buffer_size = NFS_SERVER(arg->inode)->wsize; count = 0;
arg->layoutupdate_pages = @@ -588,7 +597,8 @@ ext_tree_prepare_commit(struct nfs4_layoutcommit_args *arg) return -ENOMEM; }
- goto retry; + ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, + &count, &arg->lastbytewritten); }
*start_p = cpu_to_be32(count); @@ -608,7 +618,7 @@ ext_tree_prepare_commit(struct nfs4_layoutcommit_args *arg) }
dprintk("%s found %zu ranges\n", __func__, count); - return 0; + return ret; }
void
From: Buday Csaba buday.csaba@prolan.hu
[ Upstream commit 57ec5a8735dc5dccd1ee68afdb1114956a3fce0d ]
According to the LAN8710A datasheet (Rev. B, section 3.8.5.1), a hardware reset is required after power-on, and the reference clock (REF_CLK) must be established before asserting reset.
Signed-off-by: Buday Csaba buday.csaba@prolan.hu Cc: Csókás Bence csokas.bence@prolan.hu Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://patch.msgid.link/20250728152916.46249-2-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Hardware Requirement Fix**: The commit addresses a documented hardware requirement from the LAN8710A datasheet (section 3.8.5.1) that specifies a hardware reset is required after power-on with the reference clock established before asserting reset. This is fixing incorrect hardware initialization that could lead to device malfunction.
2. **Regression Fix**: Looking at the git history, the `PHY_RST_AFTER_CLK_EN` flag was: - Originally added in commit 7f64e5b18ebb (2017) for LAN8710/20 based on datasheet requirements - Removed in commit d65af21842f8 (2020) when refclk support was added, with the assumption that the refclk mechanism would handle the reset - Still present for LAN8740 (added in commit 76db2d466f6a in 2019)
The removal in 2020 appears to have been premature, as it relied on optional clock provider support that may not be configured in all systems. This commit re-adds the flag specifically for LAN8710A, restoring proper hardware initialization.
3. **Minimal and Contained Change**: The fix is a single-line addition of the `PHY_RST_AFTER_CLK_EN` flag to the driver structure for the LAN8710/LAN8720 PHY entry. This flag is already used by other PHYs in the same driver (LAN8740) and has well-established kernel infrastructure to handle it properly through `phy_reset_after_clk_enable()`.
4. **Bug Fix Nature**: This fixes a real hardware initialization issue that could cause the PHY to not work properly if the reference clock timing requirements aren't met. Systems without proper clock provider configuration would experience PHY initialization failures.
5. **Low Risk**: The change only affects the specific PHY model (LAN8710/LAN8720) and uses an existing, well-tested mechanism (`PHY_RST_AFTER_CLK_EN` flag). The flag is already successfully used by LAN8740 in the same driver, demonstrating its safety and effectiveness.
6. **Clear Problem Statement**: The commit message clearly documents the hardware requirement from the datasheet, making it evident this is fixing a specification compliance issue rather than adding a new feature.
The commit meets stable kernel criteria as it fixes a hardware initialization bug with minimal risk and a very contained change scope.
drivers/net/phy/smsc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index b6489da5cfcd..48487149c225 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -785,6 +785,7 @@ static struct phy_driver smsc_phy_driver[] = {
/* PHY_BASIC_FEATURES */
+ .flags = PHY_RST_AFTER_CLK_EN, .probe = smsc_phy_probe,
/* basic functions */
From: Peter Jakubek peterjakubek@gmail.com
[ Upstream commit 1b03391d073dad748636a1ad9668b837cce58265 ]
Add DMI quirk entry for Alienware systems with SKU "0CCC" to enable proper speaker codec configuration (SOC_SDW_CODEC_SPKR).
This system requires the same audio configuration as some existing Dell systems. Without this patch, the laptop's speakers and microphone will not work.
Signed-off-by: Peter Jakubek peterjakubek@gmail.com Link: https://patch.msgid.link/20250731172104.2009007-1-peterjakubek@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
Now, let me analyze the commit for backport suitability:
**Backport Status: YES**
## Extensive Analysis:
This commit should be backported to stable kernel trees based on the following detailed analysis:
### 1. **Hardware Enablement Fix** The commit adds a DMI quirk entry for the Alienware Area 51 (2025) with SKU "0CCC" to enable proper speaker codec configuration. The commit message explicitly states: "Without this patch, the laptop's speakers and microphone will not work." This is a clear hardware functionality fix where audio devices are completely non-functional without this patch.
### 2. **Minimal and Contained Change** The code change is extremely minimal - it adds exactly 8 lines to the `sof_sdw_quirk_table[]` array: ```c + { + .callback = sof_sdw_quirk_cb, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Alienware"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "0CCC") + }, + .driver_data = (void *)(SOC_SDW_CODEC_SPKR), + }, ```
### 3. **Follows Established Pattern** The change follows the exact same pattern as numerous other Dell/Alienware system quirks already in the file. Looking at lines 694-742 in the same file, we can see identical entries for Dell systems with different SKUs (0CF1, 0CF3, 0CF4, 0CF5, etc.), all using the same `SOC_SDW_CODEC_SPKR` flag.
### 4. **Zero Risk of Regression** The quirk only activates when the DMI system matches exactly "Alienware" vendor and SKU "0CCC". This makes it impossible to affect any other systems. The `SOC_SDW_CODEC_SPKR` flag (defined in `/home/sasha/linux/include/sound/soc_sdw_utils.h` as BIT(15)) enables codec internal speaker support, which is a well-tested configuration used by many other systems.
### 5. **Similar Commits Are Routinely Backported** Recent git history shows numerous similar quirk additions for Dell and Lenovo systems (commits like d859923faeca, af23d38caae5, 60f1c71ec874) that add DMI quirks for audio hardware enablement. These types of hardware enablement fixes are regularly accepted into stable trees.
### 6. **User Impact** Without this patch, users of the Alienware Area 51 (2025) laptop have completely non-functional audio (no speakers, no microphone). This severely impacts the usability of the device and would force users to use external audio devices or compile custom kernels.
### 7. **No Architectural Changes** The commit makes no architectural changes, adds no new features, and simply extends an existing quirk table with a new hardware ID. It uses existing, well-tested infrastructure (DMI matching and SOC_SDW_CODEC_SPKR flag).
### Conclusion: This is a textbook example of a commit that should be backported to stable kernels. It fixes a real hardware issue (non-functional audio), has zero regression risk due to exact DMI matching, follows established patterns, and is minimal in scope. The fix enables basic functionality for affected hardware without any possibility of impacting other systems.
sound/soc/intel/boards/sof_sdw.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index 504887505e68..c576ec5527f9 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -741,6 +741,14 @@ static const struct dmi_system_id sof_sdw_quirk_table[] = { }, .driver_data = (void *)(SOC_SDW_CODEC_SPKR), }, + { + .callback = sof_sdw_quirk_cb, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Alienware"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "0CCC") + }, + .driver_data = (void *)(SOC_SDW_CODEC_SPKR), + }, /* Pantherlake devices*/ { .callback = sof_sdw_quirk_cb,
From: Mark Brown broonie@kernel.org
[ Upstream commit 1da33858af6250184d2ef907494d698af03283de ]
We do not currently free the mutex allocated by regmap-irq, do so.
Tested-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Mark Brown broonie@kernel.org Link: https://patch.msgid.link/20250731-regmap-irq-nesting-v1-1-98b4d1bf20f0@kerne... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit fixes a resource leak bug where a mutex allocated with `mutex_init()` in `regmap_add_irq_chip_fwnode()` was never properly freed with `mutex_destroy()`. This is a clear memory/resource leak that has existed since the mutex was first introduced in 2011 (commit f8beab2bb611).
**Key reasons for backporting:**
1. **Clear bug fix**: The commit fixes a resource leak where `mutex_init(&d->lock)` at line 804 allocates mutex resources but they were never freed. The fix adds corresponding `mutex_destroy(&d->lock)` calls in both error paths and the cleanup function.
2. **Long-standing issue**: This bug has existed since 2011 when the mutex was first introduced, affecting all stable kernels that include the regmap-irq subsystem.
3. **Small and contained change**: The patch only adds two `mutex_destroy()` calls: - One in the error path (`err_mutex:` label) at line 935 - One in `regmap_del_irq_chip()` at line 1031
4. **No behavioral changes**: The fix only ensures proper cleanup; it doesn't change any functional behavior or introduce new features.
5. **Low regression risk**: Adding `mutex_destroy()` calls is a standard cleanup operation that carries minimal risk. The patch also properly adjusts error handling labels (changing `goto err_alloc` to `goto err_mutex` after mutex initialization).
6. **Affects widely-used subsystem**: The regmap-irq framework is used by many drivers across the kernel for interrupt handling, making this fix broadly beneficial.
The commit follows stable kernel rules perfectly - it's a small, obvious fix for a real bug with minimal risk of regression. While not a critical security issue or crash fix, resource leaks are valid stable candidates, especially in widely-used infrastructure code like regmap.
drivers/base/regmap/regmap-irq.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/base/regmap/regmap-irq.c b/drivers/base/regmap/regmap-irq.c index d1585f073776..4aac12d38215 100644 --- a/drivers/base/regmap/regmap-irq.c +++ b/drivers/base/regmap/regmap-irq.c @@ -816,7 +816,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, d->mask_buf[i], chip->irq_drv_data); if (ret) - goto err_alloc; + goto err_mutex; }
if (chip->mask_base && !chip->handle_mask_sync) { @@ -827,7 +827,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (ret) { dev_err(map->dev, "Failed to set masks in 0x%x: %d\n", reg, ret); - goto err_alloc; + goto err_mutex; } }
@@ -838,7 +838,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (ret) { dev_err(map->dev, "Failed to set masks in 0x%x: %d\n", reg, ret); - goto err_alloc; + goto err_mutex; } }
@@ -855,7 +855,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (ret != 0) { dev_err(map->dev, "Failed to read IRQ status: %d\n", ret); - goto err_alloc; + goto err_mutex; } }
@@ -879,7 +879,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (ret != 0) { dev_err(map->dev, "Failed to ack 0x%x: %d\n", reg, ret); - goto err_alloc; + goto err_mutex; } } } @@ -901,7 +901,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (ret != 0) { dev_err(map->dev, "Failed to set masks in 0x%x: %d\n", reg, ret); - goto err_alloc; + goto err_mutex; } } } @@ -910,7 +910,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode, if (chip->status_is_level) { ret = read_irq_data(d); if (ret < 0) - goto err_alloc; + goto err_mutex;
memcpy(d->prev_status_buf, d->status_buf, array_size(d->chip->num_regs, sizeof(d->prev_status_buf[0]))); @@ -918,7 +918,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode,
ret = regmap_irq_create_domain(fwnode, irq_base, chip, d); if (ret) - goto err_alloc; + goto err_mutex;
ret = request_threaded_irq(irq, NULL, regmap_irq_thread, irq_flags | IRQF_ONESHOT, @@ -935,6 +935,8 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode,
err_domain: /* Should really dispose of the domain but... */ +err_mutex: + mutex_destroy(&d->lock); err_alloc: kfree(d->type_buf); kfree(d->type_buf_def); @@ -1027,6 +1029,7 @@ void regmap_del_irq_chip(int irq, struct regmap_irq_chip_data *d) kfree(d->config_buf[i]); kfree(d->config_buf); } + mutex_destroy(&d->lock); kfree(d); } EXPORT_SYMBOL_GPL(regmap_del_irq_chip);
From: Calvin Owens calvin@wbinvd.org
[ Upstream commit 6ea0ec1b958a84aff9f03fb0ae4613a4d5bed3ea ]
turbostat.c: In function 'parse_int_file': turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function) 5567 | char path[PATH_MAX]; | ^~~~~~~~
turbostat.c: In function 'probe_graphics': turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function) 6787 | char path[PATH_MAX]; | ^~~~~~~~
Signed-off-by: Calvin Owens calvin@wbinvd.org Reviewed-by: Artem Bityutskiy artem.bityutskiy@linux.intel.com Signed-off-by: Len Brown len.brown@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
1. **Clear Build Fix**: This is a straightforward build fix that addresses compilation failures when building turbostat with musl libc. The error messages in the commit clearly show `PATH_MAX` is undeclared, which prevents the tool from compiling.
2. **Minimal and Safe Change**: The fix is extremely minimal - it only adds a single include directive (`#include <limits.h>`) at line 70 of turbostat.c. This is a standard POSIX header that defines `PATH_MAX` and other system limits.
3. **No Functional Changes**: The commit doesn't change any logic or behavior - it simply ensures the code compiles correctly by including the necessary header file that defines `PATH_MAX`.
4. **Affects User Tools**: While turbostat is a userspace tool (not kernel code), it's an important power monitoring utility that's part of the kernel source tree. Users building kernel tools with musl libc (common in embedded systems, Alpine Linux, etc.) would be unable to build this tool without this fix.
5. **Cross-Platform Compatibility**: The issue affects systems using musl libc instead of glibc. With glibc, `PATH_MAX` might be defined through indirect includes, but musl requires explicit inclusion of `<limits.h>`. This fix improves portability.
6. **No Risk of Regression**: Adding the `<limits.h>` include has zero risk of breaking existing functionality. The header is standard and the constant `PATH_MAX` is used in at least 4 places in the code (lines 5641, 6870, 7481, 7537) where character arrays are declared.
7. **Follows Stable Rules**: This perfectly fits the stable kernel criteria: - Fixes a real bug (build failure) - Obviously correct (standard header inclusion) - Already tested (has sign-offs and review) - Small change (1 line) - No new features or architectural changes
The commit is an ideal candidate for stable backporting as it fixes a concrete build issue with minimal risk and maximum benefit for users of alternative libc implementations.
tools/power/x86/turbostat/turbostat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index bf011c2847f2..46ee85216373 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -67,6 +67,7 @@ #include <stdbool.h> #include <assert.h> #include <linux/kernel.h> +#include <limits.h>
#define UNUSED(x) (void)(x)
From: Elad Nachman enachman@marvell.com
[ Upstream commit 3c3d7dbab2c70a4bca47634d564bf659351c05ca ]
When a kexec'ed kernel boots up, there might be stale unhandled interrupts pending in the interrupt controller. These are delivered as spurious interrupts once the boot CPU enables interrupts.
Clear all pending interrupts when the driver is initialized to prevent these spurious interrupts from locking the CPU in an endless loop.
Signed-off-by: Elad Nachman enachman@marvell.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/all/20250803102548.669682-2-enachman@marvell.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Reasons for Backporting:
### 1. **Fixes a Real Bug Affecting Users** The commit addresses a concrete issue where stale, unhandled interrupts from before a kexec can cause spurious interrupts that lock the CPU in an endless loop. This is a serious functionality bug that can prevent systems from booting properly after kexec.
### 2. **Small and Contained Fix** The change is minimal and surgical: - Only adds 9 lines of functional code - The fix is self-contained within the probe function - Simply clears pending interrupts by writing to the GICP_CLRSPI_NSR_OFFSET register for all 64 possible interrupts - Uses standard kernel APIs (ioremap/iounmap)
### 3. **Low Risk of Regression** - The clearing operation only happens once during driver initialization - If ioremap fails, it's handled gracefully with an error message but doesn't fail the probe - The clearing loop writes to a register specifically designed for clearing interrupts (GICP_CLRSPI_NSR_OFFSET) - This is a write-only operation that doesn't affect normal interrupt handling flow
### 4. **Follows Established Pattern** Similar fixes for spurious/pending interrupts have been backported in other interrupt controllers: - commit 28e89cdac648 ("irqchip/renesas-rzv2h: Prevent TINT spurious interrupt") - marked with Cc: stable@vger.kernel.org - commit 853a6030303f ("irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type")
### 5. **Critical for Kexec Functionality** Kexec is an important feature for: - Crash dump collection (kdump) - Fast reboot scenarios - System recovery This fix ensures these use cases work reliably on Marvell platforms using the GICP interrupt controller.
### 6. **No Architectural Changes** The fix doesn't introduce: - New features or capabilities - Changes to existing APIs or interfaces - Modifications to interrupt handling logic - Any structural changes to the driver
### 7. **Clear Problem Statement** The commit message clearly describes: - The problem scenario (kexec with pending interrupts) - The symptom (CPU locked in endless loop) - The solution (clear all pending interrupts on init)
## Code Analysis:
The added code (lines 240-247 in the patched version): ```c base = ioremap(gicp->res->start, gicp->res->end - gicp->res->start); if (IS_ERR(base)) { dev_err(&pdev->dev, "ioremap() failed. Unable to clear pending interrupts.\n"); } else { for (i = 0; i < 64; i++) writel(i, base + GICP_CLRSPI_NSR_OFFSET); iounmap(base); } ```
This is a defensive programming approach that ensures system stability without affecting normal operation, making it an ideal candidate for stable backport.
drivers/irqchip/irq-mvebu-gicp.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/irqchip/irq-mvebu-gicp.c b/drivers/irqchip/irq-mvebu-gicp.c index d3232d6d8dce..fd85c845e015 100644 --- a/drivers/irqchip/irq-mvebu-gicp.c +++ b/drivers/irqchip/irq-mvebu-gicp.c @@ -177,6 +177,7 @@ static int mvebu_gicp_probe(struct platform_device *pdev) .ops = &gicp_domain_ops, }; struct mvebu_gicp *gicp; + void __iomem *base; int ret, i;
gicp = devm_kzalloc(&pdev->dev, sizeof(*gicp), GFP_KERNEL); @@ -236,6 +237,15 @@ static int mvebu_gicp_probe(struct platform_device *pdev) return -ENODEV; }
+ base = ioremap(gicp->res->start, gicp->res->end - gicp->res->start); + if (IS_ERR(base)) { + dev_err(&pdev->dev, "ioremap() failed. Unable to clear pending interrupts.\n"); + } else { + for (i = 0; i < 64; i++) + writel(i, base + GICP_CLRSPI_NSR_OFFSET); + iounmap(base); + } + return msi_create_parent_irq_domain(&info, &gicp_msi_parent_ops) ? 0 : -ENOMEM; }
From: Sergey Bashirov sergeybashirov@gmail.com
[ Upstream commit 7db6e66663681abda54f81d5916db3a3b8b1a13d ]
At the end of the isect translation, disc_addr represents the physical disk offset. Thus, end calculated from disk_addr is also a physical disk offset. Therefore, range checking should be done using map->disk_offset, not map->start.
Signed-off-by: Sergey Bashirov sergeybashirov@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20250702133226.212537-1-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit is a clear bugfix that should be backported to stable kernel trees. Here's my detailed analysis:
## Bug Analysis
The commit fixes an incorrect range check in the pNFS block/SCSI layout code. Looking at the code flow:
1. **Variable Transformations** (lines 140-148): - `disk_addr` starts as a physical disk offset (line 141) - It gets adjusted by `map->disk_offset` (line 147) - Then it's adjusted by subtracting `map->start` (line 148) - After these operations, `disk_addr` represents the **relative offset within the mapped region**
2. **The Bug** (lines 151-153): - `end = disk_addr + *len` calculates the end position using the **transformed** disk_addr - The buggy check `if (end >= map->start + map->len)` incorrectly compares this relative end position against `map->start + map->len` - This is wrong because `end` is now relative to the mapping, not an absolute address
3. **The Fix**: - Changes the comparison to `if (end >= map->disk_offset + map->len)` - This correctly uses `map->disk_offset` which corresponds to the physical disk space after the transformations
## Why This Should Be Backported
1. **Clear Bug Fix**: This fixes a logical error in range checking that could lead to: - Incorrect I/O length calculations - Potential data corruption if writes go beyond intended boundaries - Read errors or incomplete data transfers
2. **Small and Contained**: The fix changes only 2 lines in a single function, making it low-risk for regression
3. **Critical Subsystem**: pNFS block layout is used in production NFS environments for high-performance storage access
4. **No New Features**: This is purely a bugfix - no new functionality or architectural changes
5. **Stable Impact**: The bug could cause silent data corruption or I/O errors in stable kernels using pNFS block layouts
6. **Related History**: The codebase shows previous fixes for similar range checking issues (commit f34462c3c8a2), indicating this is a sensitive area that needs correct boundary validation
The incorrect range check could allow I/O operations to proceed with incorrect length limits, potentially accessing memory beyond the intended device mapping boundaries. This is exactly the type of bug that stable kernels need fixed - a subtle logic error with potentially serious consequences in production systems.
fs/nfs/blocklayout/blocklayout.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 47189476b553..5d6edafbed20 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -149,8 +149,8 @@ do_add_page_to_bio(struct bio *bio, int npg, enum req_op op, sector_t isect,
/* limit length to what the device mapping allows */ end = disk_addr + *len; - if (end >= map->start + map->len) - *len = map->start + map->len - disk_addr; + if (end >= map->disk_offset + map->len) + *len = map->disk_offset + map->len - disk_addr;
retry: if (!bio) {
From: Calvin Owens calvin@wbinvd.org
[ Upstream commit d34fe509f5f76d9dc36291242d67c6528027ebbd ]
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc(). Check for ENOSYS to recognize this case, and continue on to attempt to access the requested MSRs (such as temperature).
Signed-off-by: Calvin Owens calvin@wbinvd.org Signed-off-by: Len Brown len.brown@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees for the following reasons:
## Bug Fix Nature The commit fixes a clear bug where turbostat fails to work on kernels configured with `CONFIG_MULTIUSER=n`. Without this fix, turbostat would incorrectly fail when `cap_get_proc()` returns NULL with `ENOSYS`, preventing users from accessing MSRs (like temperature readings) even when they should be able to.
## Small and Contained Change The fix is minimal - only 9 lines added, 1 line removed. The change is entirely contained within the `check_for_cap_sys_rawio()` function in turbostat.c: - It adds a check for `errno == ENOSYS` when `cap_get_proc()` returns NULL - Returns 0 (success) in this specific case to allow MSR access attempts to proceed - Preserves the original error handling (return 1) for all other failure cases
## No Side Effects or Architectural Changes The change doesn't introduce any new features or architectural modifications. It simply adds proper error handling for a specific configuration scenario that was previously not handled correctly. The logic flow remains the same for all other cases - only the ENOSYS case gets special treatment.
## User Impact This affects real users running embedded or specialized Linux systems with `CONFIG_MULTIUSER=n` (single-user mode kernels). Without this fix, turbostat is completely broken on such systems, unable to read MSRs for temperature monitoring and other power management features.
## Stable Tree Criteria Compliance The fix meets stable kernel criteria: - **Fixes a real bug**: turbostat failure on CONFIG_MULTIUSER=n kernels - **Already upstream**: The commit is in the mainline kernel - **Minimal risk**: The change only affects the specific error case and doesn't alter behavior for normal configurations - **Clear and obvious**: The fix is straightforward - checking for ENOSYS and handling it appropriately
## Tool-Specific Nature Since this is a userspace tool fix (tools/power/x86/turbostat/), it has even lower risk of causing kernel regressions while providing immediate benefit to affected users.
The commit is an ideal candidate for stable backporting as it fixes a specific, reproducible issue with minimal code changes and no risk to systems where `CONFIG_MULTIUSER=y` (the common case).
tools/power/x86/turbostat/turbostat.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 46ee85216373..00fdb6589bea 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -6573,8 +6573,16 @@ int check_for_cap_sys_rawio(void) int ret = 0;
caps = cap_get_proc(); - if (caps == NULL) + if (caps == NULL) { + /* + * CONFIG_MULTIUSER=n kernels have no cap_get_proc() + * Allow them to continue and attempt to access MSRs + */ + if (errno == ENOSYS) + return 0; + return 1; + }
if (cap_get_flag(caps, CAP_SYS_RAWIO, CAP_EFFECTIVE, &cap_flag_value)) { ret = 1;
From: Sergey Bashirov sergeybashirov@gmail.com
[ Upstream commit d897d81671bc4615c80f4f3bd5e6b218f59df50c ]
When there are too many block extents for a layoutcommit, they may not all fit into the maximum-sized RPC. This patch allows the generic pnfs code to properly handle -ENOSPC returned by the block/scsi layout driver and trigger additional layoutcommits if necessary.
Co-developed-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Konstantin Evtushenko koevtushenko@yandex.com Signed-off-by: Sergey Bashirov sergeybashirov@gmail.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20250630183537.196479-5-sergeybashirov@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Bug Fix Nature
This commit fixes a clear bug in the pNFS (parallel NFS) layoutcommit handling where the code didn't properly handle the `-ENOSPC` error returned when there are too many block extents to fit in a single RPC (Remote Procedure Call). Looking at the code changes:
1. **The Problem**: In `ext_tree_encode_commit()` (fs/nfs/blocklayout/extent_tree.c:536-539), when there are too many extents, the function returns `-ENOSPC` but continues counting. The original `pnfs_layoutcommit_inode()` function didn't handle this error properly.
2. **The Fix**: The patch adds specific handling for the `-ENOSPC` error case: - Line 3394 → 3396: Changed to only `put_cred()` if status is NOT `-ENOSPC` - Line 3399 → 3401-3403: Added special handling to check for `-ENOSPC` and set `mark_as_dirty = true` - Line 3409: Changed condition to mark inode dirty if either `status` is set OR `mark_as_dirty` is true
## Impact Analysis
1. **User-Visible Bug**: Without this fix, when users have workloads that generate many block extents (common in database or large file operations), layoutcommits would fail silently, potentially leading to data inconsistency or loss.
2. **Contained Fix**: The changes are minimal and localized to the layoutcommit path: - Only affects the error handling path - Doesn't change the normal operation flow - Doesn't introduce new features or APIs
3. **Clear Regression Risk Assessment**: - Low risk - the patch only adds proper error handling for a specific error code - Doesn't change fundamental data structures or algorithms - The `-ENOSPC` handling triggers additional layoutcommits as needed, which is the correct behavior
## Stable Criteria Compliance
The commit meets stable kernel criteria: - **Fixes a real bug**: Handles RPC size limit overflow that can occur in production - **Small and targeted**: Only ~15 lines of actual logic changes - **No new features**: Pure bug fix, no feature additions - **Tested**: Has review from Christoph Hellwig (a well-known filesystem maintainer) - **Important for users**: Prevents potential data consistency issues in pNFS deployments
## Code Analysis Details
The specific code flow shows: 1. `ext_tree_encode_commit()` returns `-ENOSPC` when buffer is too small (line 538 in extent_tree.c) 2. The old code would incorrectly release credentials and fail the entire operation 3. The new code: - Preserves the credentials when `-ENOSPC` occurs - Sets the inode as dirty to trigger another layoutcommit attempt - Allows the operation to be retried with proper handling
This is a classic case of missing error handling that should be backported to ensure data integrity in stable kernels running pNFS workloads.
fs/nfs/pnfs.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 1a7ec68bde15..3fd0971bf16f 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -3340,6 +3340,7 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync) struct nfs_inode *nfsi = NFS_I(inode); loff_t end_pos; int status; + bool mark_as_dirty = false;
if (!pnfs_layoutcommit_outstanding(inode)) return 0; @@ -3391,19 +3392,23 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync) if (ld->prepare_layoutcommit) { status = ld->prepare_layoutcommit(&data->args); if (status) { - put_cred(data->cred); + if (status != -ENOSPC) + put_cred(data->cred); spin_lock(&inode->i_lock); set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags); if (end_pos > nfsi->layout->plh_lwb) nfsi->layout->plh_lwb = end_pos; - goto out_unlock; + if (status != -ENOSPC) + goto out_unlock; + spin_unlock(&inode->i_lock); + mark_as_dirty = true; } }
status = nfs4_proc_layoutcommit(data, sync); out: - if (status) + if (status || mark_as_dirty) mark_inode_dirty_sync(inode); dprintk("<-- %s status %d\n", __func__, status); return status;
linux-stable-mirror@lists.linaro.org