Hi Sergey,
On 11/21/25 03:55, Sergey Senozhatsky wrote:
> Hi Christian,
>
> On (25/11/20 10:15), Christian Loehle wrote:
>> On 11/20/25 04:45, Sergey Senozhatsky wrote:
>>> Hi,
>>>
>>> We are observing a performance regression on one of our arm64 boards.
>>> We tracked it down to the linux-6.6.y commit ada8d7fa0ad4 ("sched/cpufreq:
>>> Rework schedutil governor performance estimation").
>>>
>>> UI speedometer benchmark:
>>> w/commit: 395 +/-38
>>> w/o commit: 439 +/-14
>>>
>>
>> Hi Sergey,
>> Would be nice to get some details. What board?
>
> It's an MT8196 chromebook.
>
>> What do the OPPs look like?
>
> How do I find that out?
>
>> Does this system use uclamp during the benchmark? How?
>
> How do I find that out?
>
>> Given how large the stddev given by speedometer (version 3?) itself is, can we get the
>> stats of a few runs?
>
> v2.1
>
> w/o patch w/ patch
> 440 +/-30 406 +/-11
> 440 +/-14 413 +/-16
> 444 +/-12 403 +/-14
> 442 +/-12 412 +/-15
>
>> Maybe traces of cpu_frequency for both w/ and w/o?
>
> trace-cmd record -e power:cpu_frequency attached.
>
> "base" is with ada8d7fa0ad4
> "revert" is ada8d7fa0ad4 reverted.
I did some analysis based on your trace files.
I have been playing some time ago with speedometer performance
issues so that's why I'm curious about your report here.
I've filtered your trace purely based on cpu7 (the single biggest cpu).
Then I have cut the data from the 'warm-up' phase in both traces, to
have similar start point (I think).
It looks like the 2 traces can show similar 'pattern' of that benchmark
which is good for analysis. If you align the timestamp:
176.051s and 972.465s then both plots (frequency changes in time) look
similar.
There are some differences, though:
1. there are more deeps in the freq in time, so more often you would
pay extra penalty for the ramp-up again
2. some of the ramp-up phases are a bit longer ~100ms instead of ~80ms
going from 2GHz to 3.6GHz
3.
There are idle phases missing in the trace, so we have to be careful
when e.g. comparing avg frequency, because that might not be the real
indication of the delivered computation and not indicate the gap in the
score.
Here are the stats:
1. revert:
frequency
count 1.318000e+03
mean 2.932240e+06
std 5.434045e+05
min 2.000000e+06
50% 3.000000e+06
85% 3.600000e+06
90% 3.626000e+06
95% 3.626000e+06
99% 3.626000e+06
max 3.626000e+06
2. base:
frequency
count 1.551000e+03
mean 2.809391e+06
std 5.369750e+05
min 2.000000e+06
50% 2.800000e+06
85% 3.500000e+06
90% 3.600000e+06
95% 3.626000e+06
99% 3.626000e+06
max 3.626000e+06
A better indication in this case would be comparison of the frequency
residency in time, especially for the max freq:
1. revert: 11.92s
2. base: 9.11s
So there is 2.8s longer residency for that fmax (while we even have
longer period for finishing that Speedometer 2 test on 'base').
Here is some detail about that run*:
+---------------+---------------------+---------------+----------------+
| Trace | Total Trace | Time at Max | % of Total |
| | Duration (s) | Freq (s) | Time |
+---------------+---------------------+---------------+----------------+
| Base Trace | 24.72 | 9.11 | 36.9% |
| Revert Trace | 22.88 | 11.92 | 52.1% |
+---------------+---------------------+---------------+----------------+
*We don't know the idle periods which might happen for those frequencies
I wonder if you had a fix patch for the util_est in your kernel...
That fix has been recently backported to 6.6 stable [1].
You might want to try that patch as well, w/ or w/o this revert.
IMHO it might be worth to have it on top. It might help
the main Chrome task ('CrRendererMain') to stay longer on the biggest
cpu, since the util_est would be higher. You can read the discussion
that I had back then with PeterZ and VincentG [2].
Regards,
Lukasz
[1]
https://lore.kernel.org/stable/20251121130232.828187990@linuxfoundation.org/
[2]
https://lore.kernel.org/lkml/20230912142821.GA22166@noisy.programming.kicks…
This is v2 of a series to address multiple reports [0][1]
(+ 2 offlist) of suspend failing when NBCON console drivers are
in use. With the help of NXP and NVIDIA we were able to isolate
the problem and verify the fix.
v1 is here [2].
The first NBCON drivers appeared in 6.13, so currently there is
no LTS kernel that requires this series. But it should go into
6.17.x and 6.18.
The changes since v1:
- For printk_trigger_flush() add support for all flush types
that are available. This will prevent printk_trigger_flush()
from trying to inappropriately queue irq_work after this
series is applied.
- Add WARN_ON_ONCE() to the printk irq_work queueing functions
in case they are called when irq_work is blocked. There
should never be (and currently are no) such callers, but
these functions are externally available.
John Ogness
[0] https://lore.kernel.org/lkml/80b020fc-c18a-4da4-b222-16da1cab2f4c@nvidia.com
[1] https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04…
[2] https://lore.kernel.org/lkml/20251111144328.887159-1-john.ogness@linutronix…
John Ogness (2):
printk: Allow printk_trigger_flush() to flush all types
printk: Avoid scheduling irq_work on suspend
kernel/printk/internal.h | 8 ++--
kernel/printk/nbcon.c | 9 ++++-
kernel/printk/printk.c | 81 ++++++++++++++++++++++++++++++++--------
3 files changed, 78 insertions(+), 20 deletions(-)
base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
--
2.47.3
media_pad_remote_pad_first() may return NULL when the media link is absent
or disabled. The code dereferenced remote_pad->entity unconditionally in
ipu6_isys_csi2_enable_streams() and ipu6_isys_csi2_disable_streams(),
leading to a possible NULL dereference.
Guard the remote pad/subdev: in enable path return -ENOLINK/-ENODEV, in
disable path always shut down the local stream first and return 0 if the
remote side is missing. This keeps local shutdown semantics intact and
prevents a crash when the graph is partially torn down.
Also, ipu6_isys_csi2_calc_timing() passes link_freq into calc_timing()
where it is used as a divisor. v4l2_get_link_freq() may yield 0; add an
explicit check and return -EINVAL to avoid division by zero.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 3a5c59ad926b ("media: ipu6: Rework CSI-2 sub-device streaming control")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexei Safin <a.safin(a)rosa.ru>
---
drivers/media/pci/intel/ipu6/ipu6-isys-csi2.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/media/pci/intel/ipu6/ipu6-isys-csi2.c b/drivers/media/pci/intel/ipu6/ipu6-isys-csi2.c
index d1fece6210ab..58944918c344 100644
--- a/drivers/media/pci/intel/ipu6/ipu6-isys-csi2.c
+++ b/drivers/media/pci/intel/ipu6/ipu6-isys-csi2.c
@@ -171,6 +171,10 @@ ipu6_isys_csi2_calc_timing(struct ipu6_isys_csi2 *csi2,
if (link_freq < 0)
return link_freq;
+ /* Avoid division by zero in calc_timing() if link frequency is zero */
+ if (!link_freq)
+ return -EINVAL;
+
timing->ctermen = calc_timing(CSI2_CSI_RX_DLY_CNT_TERMEN_CLANE_A,
CSI2_CSI_RX_DLY_CNT_TERMEN_CLANE_B,
link_freq, accinv);
@@ -352,7 +356,12 @@ static int ipu6_isys_csi2_enable_streams(struct v4l2_subdev *sd,
int ret;
remote_pad = media_pad_remote_pad_first(&sd->entity.pads[CSI2_PAD_SINK]);
+ if (!remote_pad)
+ return -ENOLINK;
+
remote_sd = media_entity_to_v4l2_subdev(remote_pad->entity);
+ if (!remote_sd)
+ return -ENODEV;
sink_streams =
v4l2_subdev_state_xlate_streams(state, pad, CSI2_PAD_SINK,
@@ -389,7 +398,16 @@ static int ipu6_isys_csi2_disable_streams(struct v4l2_subdev *sd,
&streams_mask);
remote_pad = media_pad_remote_pad_first(&sd->entity.pads[CSI2_PAD_SINK]);
+ if (!remote_pad) {
+ ipu6_isys_csi2_set_stream(sd, NULL, 0, false);
+ return 0;
+ }
+
remote_sd = media_entity_to_v4l2_subdev(remote_pad->entity);
+ if (!remote_sd) {
+ ipu6_isys_csi2_set_stream(sd, NULL, 0, false);
+ return 0;
+ }
ipu6_isys_csi2_set_stream(sd, NULL, 0, false);
--
2.50.1 (Apple Git-155)
This reverts commit 7777f47f2ea64efd1016262e7b59fab34adfb869.
The commit 1a721de8489f ("block: don't add or resize partition on the disk
with GENHD_FL_NO_PART") and the commit 7777f47f2ea6 ("block: Move checking
GENHD_FL_NO_PART to bdev_add_partition()") used the flag GENHD_FL_NO_PART
to prevent the add or resize of partitions in 5.15 stable kernels.But in
these 5.15 kernels, this is giving an issue with the following error
where the loop driver wants to create a partition when the partscan is
disabled on the loop device:
dd if=/dev/zero of=loopDisk.dsk bs=1M count=1 seek=10240;
losetup -f loopDisk.dsk;parted -s /dev/loop0 -- mklabel gpt mkpart primary
2048s 4096s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0016293 s, 644 MB/s
""
Error: Partition(s) 1 on /dev/loop0 have been written, but we have been
unable to inform the kernel of the change, probably because it/they are
in use. As a result, the old partition(s) will remain in use. You should
reboot now before making further changes.
""
If the partition scan is not enabled on the loop device, this flag
GENHD_FL_NO_PART is getting set and when partition creation is tried,
it returns an error EINVAL thereby preventing the creation of partitions.
So, there is no such distinction between disabling of partition scan and
partition creation.
Later in 6.xxx kernels, the commit b9684a71fca7 ("block, loop: support
partitions without scanning") a new flag GD_SUPPRESS_PART_SCAN was
introduced that just disables the partition scan and uses GENHD_FL_NO_PART
only to prevent creating partition scan. So, the partition creationg can
proceed with even if partition scan is disabled.
As the commit b9684a71fca7 ("block, loop: support partitions without
scanning") is not available in 5.15 stable kernel, and since there is no
distinction between disabling of "partition scan" and "partition
creation", we need to revert the commits 1a721de8489f and 7777f47f2ea6
from 5.15 stable kernel to allow partition creation when partscan is
disabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gulam Mohamed <gulam.mohamed(a)oracle.com>
---
block/ioctl.c | 2 ++
block/partitions/core.c | 5 -----
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index a260e39e56a4..d25b84441237 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -20,6 +20,8 @@ static int blkpg_do_ioctl(struct block_device *bdev,
struct blkpg_partition p;
sector_t start, length;
+ if (disk->flags & GENHD_FL_NO_PART)
+ return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (copy_from_user(&p, upart, sizeof(struct blkpg_partition)))
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 0d1fe2b42b85..7b5750db7eaf 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -463,11 +463,6 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
goto out;
}
- if (disk->flags & GENHD_FL_NO_PART) {
- ret = -EINVAL;
- goto out;
- }
-
if (partition_overlaps(disk, start, length, -1)) {
ret = -EBUSY;
goto out;
--
2.47.3
This reverts commit 7777f47f2ea64efd1016262e7b59fab34adfb869.
The commit 1a721de8489f ("block: don't add or resize partition on the disk
with GENHD_FL_NO_PART") and the commit 7777f47f2ea6 ("block: Move checking
GENHD_FL_NO_PART to bdev_add_partition()") used the flag GENHD_FL_NO_PART
to prevent the add or resize of partitions in 5.15 stable kernels.But in
these 5.15 kernels, this is giving an issue with the following error
where the loop driver wants to create a partition when the partscan is
disabled on the loop device:
dd if=/dev/zero of=loopDisk.dsk bs=1M count=1 seek=10240;
losetup -f loopDisk.dsk;parted -s /dev/loop0 -- mklabel gpt mkpart primary
2048s 4096s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0016293 s, 644 MB/s
""
Error: Partition(s) 1 on /dev/loop0 have been written, but we have been
unable to inform the kernel of the change, probably because it/they are
in use. As a result, the old partition(s) will remain in use. You should
reboot now before making further changes.
""
If the partition scan is not enabled on the loop device, this flag
GENHD_FL_NO_PART is getting set and when partition creation is tried,
it returns an error EINVAL thereby preventing the creation of partitions.
So, there is no such distinction between disabling of partition scan and
partition creation.
Later in 6.xxx kernels, the commit b9684a71fca7 ("block, loop: support
partitions without scanning") a new flag GD_SUPPRESS_PART_SCAN was
introduced that just disables the partition scan and uses GENHD_FL_NO_PART
only to prevent creating partition scan. So, the partition creationg can
proceed with even if partition scan is disabled.
As the commit b9684a71fca7 ("block, loop: support partitions without
scanning") is not available in 5.15 stable kernel, and since there is no
distinction between disabling of "partition scan" and "partition
creation", we need to revert the commits 1a721de8489f and 7777f47f2ea6
from 5.15 stable kernel to allow partition creation when partscan is
disabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gulam Mohamed <gulam.mohamed(a)oracle.com>
---
block/ioctl.c | 2 ++
block/partitions/core.c | 5 -----
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index a260e39e56a4..d25b84441237 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -20,6 +20,8 @@ static int blkpg_do_ioctl(struct block_device *bdev,
struct blkpg_partition p;
sector_t start, length;
+ if (disk->flags & GENHD_FL_NO_PART)
+ return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (copy_from_user(&p, upart, sizeof(struct blkpg_partition)))
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 0d1fe2b42b85..7b5750db7eaf 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -463,11 +463,6 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
goto out;
}
- if (disk->flags & GENHD_FL_NO_PART) {
- ret = -EINVAL;
- goto out;
- }
-
if (partition_overlaps(disk, start, length, -1)) {
ret = -EBUSY;
goto out;
--
2.47.3
Although it is guided that `#mbox-cells` must be at least 1, there are
many instances of `#mbox-cells = <0>;` in the device tree. If that is
the case and the corresponding mailbox controller does not provide
`fw_xlate` and of_xlate` function pointers, `fw_mbox_index_xlate()` will
be used by default and out-of-bounds accesses could occur due to lack of
bounds check in that function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Joonwon Kang <joonwonkang(a)google.com>
---
V3 -> V4: Prevented access to sp->args[0] if sp->nargs < 1 and rebased
on the linux-next tree.
For CVE review, below is a problematic control flow when
`#mbox-cells = <0>;`:
```
static struct mbox_chan *
fw_mbox_index_xlate(struct mbox_controller *mbox,
const struct fwnode_reference_args *sp)
{
int ind = sp->args[0]; // (4)
if (ind >= mbox->num_chans) // (5)
return ERR_PTR(-EINVAL);
return &mbox->chans[ind]; // (6)
}
struct mbox_chan *mbox_request_channel(struct mbox_client *cl, int index)
{
...
struct fwnode_reference_args fwspec; // (1)
...
ret = fwnode_property_get_reference_args(fwnode, "mboxes", // (2)
"#mbox-cells", 0, index, &fwspec);
...
scoped_guard(mutex, &con_mutex) {
...
list_for_each_entry(mbox, &mbox_cons, node) {
if (device_match_fwnode(mbox->dev, fwspec.fwnode)) {
if (mbox->fw_xlate) {
chan = mbox->fw_xlate(mbox, &fwspec); // (3)
if (!IS_ERR(chan))
break;
}
...
}
}
...
ret = __mbox_bind_client(chan, cl); // (7)
...
}
...
}
static int __mbox_bind_client(struct mbox_chan *chan,
struct mbox_client *cl)
{
if (chan->cl || ...) { // (8)
}
```
(1) `fwspec.args[]` is filled with arbitrary leftover values in the stack.
Let's say that `fwspec.args[0] == 0xffffffff`.
(2) Since `#mbox-cells = <0>;`, `fwspec.nargs` is assigned 0 and
`fwspec.args[]` are untouched.
(3) Since the controller has not provided `fw_xlate` and `of_xlate`,
`fw_mbox_index_xlate()` is used instead.
(4) `idx` is assigned -1 due to the value of `fwspec.args[0]`.
(5) Since `mbox->num_chans >= 0` and `idx == -1`, this condition does
not filter out this case.
(6) Out-of-bounds address is returned. Depending on what was left in
`fwspec.args[0]`, it could be an arbitrary(but confined to a specific
range) address.
(7) A function is called with the out-of-bounds address in `chan`.
(8) The out-of-bounds address is accessed.
drivers/mailbox/mailbox.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 2acc6ec229a4..617ba505691d 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -489,12 +489,10 @@ EXPORT_SYMBOL_GPL(mbox_free_channel);
static struct mbox_chan *fw_mbox_index_xlate(struct mbox_controller *mbox,
const struct fwnode_reference_args *sp)
{
- int ind = sp->args[0];
-
- if (ind >= mbox->num_chans)
+ if (sp->nargs < 1 || sp->args[0] >= mbox->num_chans)
return ERR_PTR(-EINVAL);
- return &mbox->chans[ind];
+ return &mbox->chans[sp->args[0]];
}
/**
--
2.52.0.487.g5c8c507ade-goog
From: Long Li <longli(a)microsoft.com>
Enable the user space to manage interrupt_mask for subchannels through
irqcontrol interface for uio device. Also remove the memory barrier
when monitor bit is enabled as it is not necessary.
This is a backport of the upstream
commit d062463edf17 ("uio_hv_generic: Set event for all channels on the device")
with some modifications to resolve merge conflicts and take care of
missing support for slow devices on older kernels.
Original change was not a fix, but it needs to be backported to fix a
NULL pointer crash resulting from missing interrupt mask setting.
Commit b15b7d2a1b09 ("uio_hv_generic: Let userspace take care of interrupt mask")
removed the default setting of interrupt_mask for channels (including
subchannels) in the uio_hv_generic driver, as it relies on the user space
to take care of managing it. This approach works fine when user space
can control this setting using the irqcontrol interface provided for uio
devices. Support for setting the interrupt mask through this interface for
subchannels came only after commit d062463edf17 ("uio_hv_generic: Set event
for all channels on the device"). On older kernels, this change is not
present. With uio_hv_generic no longer setting the interrupt_mask, and
userspace not having the capability to set it, it remains unset,
and interrupts can come for the subchannels, which can result in a crash
in hv_uio_channel_cb. Backport the change to older kernels, where this
change was not present, to allow userspace to set the interrupt mask
properly for subchannels. Additionally, this patch also adds certain
checks for primary vs subchannels in the hv_uio_channel_cb, which can
gracefully handle these two cases and prevent the NULL pointer crashes.
Signed-off-by: Long Li <longli(a)microsoft.com>
Fixes: b15b7d2a1b09 ("uio_hv_generic: Let userspace take care of interrupt mask")
Closes: https://bugs.debian.org/1120602
Cc: stable(a)vger.kernel.org # all LTS kernels from 5.4 till 6.6
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
Changes since v1:
https://lore.kernel.org/all/20251115085937.2237-1-namjain@linux.microsoft.c…
* Changed commit being fixed, to reflect upstream commit
* Added stable kernel versions info in Stable tag and email subject.
This needs to be ported to all the stable kernels, where the Fixes patch
went in - 5.4.x, 5.10.x, 5.15.x, 6.1.x, 6.6.x.
Previous notes:
Remove reviewed-by tags since the original code has changed quite a bit
while backporting.
Backported change for 6.12 kernel is sent separately.
---
drivers/uio/uio_hv_generic.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 137109f5f69b..95c1f08028a3 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -80,9 +80,15 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
{
struct hv_uio_private_data *pdata = info->priv;
struct hv_device *dev = pdata->device;
+ struct vmbus_channel *primary, *sc;
- dev->channel->inbound.ring_buffer->interrupt_mask = !irq_state;
- virt_mb();
+ primary = dev->channel;
+ primary->inbound.ring_buffer->interrupt_mask = !irq_state;
+
+ mutex_lock(&vmbus_connection.channel_mutex);
+ list_for_each_entry(sc, &primary->sc_list, sc_list)
+ sc->inbound.ring_buffer->interrupt_mask = !irq_state;
+ mutex_unlock(&vmbus_connection.channel_mutex);
return 0;
}
@@ -93,11 +99,18 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
static void hv_uio_channel_cb(void *context)
{
struct vmbus_channel *chan = context;
- struct hv_device *hv_dev = chan->device_obj;
- struct hv_uio_private_data *pdata = hv_get_drvdata(hv_dev);
+ struct hv_device *hv_dev;
+ struct hv_uio_private_data *pdata;
virt_mb();
+ /*
+ * The callback may come from a subchannel, in which case look
+ * for the hv device in the primary channel
+ */
+ hv_dev = chan->primary_channel ?
+ chan->primary_channel->device_obj : chan->device_obj;
+ pdata = hv_get_drvdata(hv_dev);
uio_event_notify(&pdata->info);
}
--
2.43.0