CPU frequency cannot be boosted when using the amd_pstate driver in
passive or guided mode.
On a host that has an AMD EPYC 7662 processor, while running with
amd-pstate configured for passive mode on full CPU load, the processor
only reaches 2.0 GHz. On later kernels the CPU can reach 3.3GHz.
The CPU frequency is dependent on a setting called highest_perf which is
the multiplier used to compute it. The highest_perf value comes from
cppc_init_perf when the driver is built-in and from pstate_init_perf when
it is a loaded module. Both of these calls have the following condition:
highest_perf = amd_get_highest_perf();
if (highest_perf > __cppc_highest_perf_)
highest_perf = __cppc_highest_perf;
Where again __cppc_highest_perf is either the return from
cppc_get_perf_caps in the built-in case or AMD_CPPC_HIGHEST_PERF in the
module case. Both of these functions actually return the nominal value,
whereas the call to amd_get_highest_perf returns the correct boost value,
so the condition tests true and highest_perf always ends up being the
nominal value, therefore never having the ability to boost CPU frequency.
Since amd_get_highest_perf already returns the boost value, we have
eliminated this check.
The issue was introduced in v6.1 via commit bedadcfb011f ("cpufreq:
amd-pstate: Fix initial highest_perf value"), and exists in stable v6.1
kernels. This has been fixed in v6.6.y and newer but due to refactoring that
change isn't feasible to bring back to v6.1.y. Thus, v6.1 kernels are
affected by this significant performance issue, and cannot be easily
remediated.
Signed-off-by: Nabil S. Alramli <dev(a)nalramli.com>
Reviewed-by: Joe Damato <jdamato(a)fastly.com>
Reviewed-by: Kyle Hubert <khubert(a)fastly.com>
Fixes: bedadcfb011f ("cpufreq: amd-pstate: Fix initial highest_perf value")
See-also: 1ec40a175a48 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support")
Cc: mario.limonciello(a)amd.com
Cc: Perry.Yuan(a)amd.com
Cc: li.meng(a)amd.com
Cc: stable(a)vger.kernel.org # v6.1
---
v2:
- Omit cover letter
- Converted from RFC to PATCH
- Expand commit message based on feedback from Mario Limonciello
- Added Reviewed-by tags
- No functional/code changes
rfc:
https://lore.kernel.org/lkml/20241025010527.491605-1-dev@nalramli.com/
---
drivers/cpufreq/amd-pstate.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 90dcf26f0973..c66086ae624a 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -102,9 +102,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
*/
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
- highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
+ highest_perf = max(amd_get_highest_perf(), AMD_CPPC_HIGHEST_PERF(cap1));
WRITE_ONCE(cpudata->highest_perf, highest_perf);
@@ -124,9 +122,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata)
if (ret)
return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
- highest_perf = cppc_perf.highest_perf;
+ highest_perf = max(amd_get_highest_perf(), cppc_perf.highest_perf);
WRITE_ONCE(cpudata->highest_perf, highest_perf);
--
2.35.1
This problem reported by Clement LE GOFFIC manifest when
using CONFIG_KASAN_IN_VMALLOC and VMAP_STACK:
https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db…
After some analysis it seems we are missing to sync the
VMALLOC shadow memory in top level PGD to all CPUs.
Add some code to perform this sync, and the bug appears
to go away.
As suggested by Ard, also perform a dummy read from the
shadow memory of the new VMAP_STACK in the low level
assembly.
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
Changes in v3:
- Collect Mark Rutlands ACK on patch 1
- Change the simplified assembly add r2, ip, lsr #n to the canonical
add r2, r2, ip, lsr #n in patch 2.
- Link to v2: https://lore.kernel.org/r/20241016-arm-kasan-vmalloc-crash-v2-0-0a52fd086ee…
Changes in v2:
- Implement the two helper functions suggested by Russell
making the KASAN PGD copying less messy.
- Link to v1: https://lore.kernel.org/r/20241015-arm-kasan-vmalloc-crash-v1-0-dbb23592ca8…
---
Linus Walleij (2):
ARM: ioremap: Sync PGDs for VMALLOC shadow
ARM: entry: Do a dummy read from VMAP shadow
arch/arm/kernel/entry-armv.S | 8 ++++++++
arch/arm/mm/ioremap.c | 25 +++++++++++++++++++++----
2 files changed, 29 insertions(+), 4 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241015-arm-kasan-vmalloc-crash-fcbd51416457
Best regards,
--
Linus Walleij <linus.walleij(a)linaro.org>
The 2nd patch fixes CVE-2024-38538, but it requires the helper function
pskb_may_pull_reason which is defined in the 1st patch. Backport both
together.
Eric Dumazet (1):
net: add pskb_may_pull_reason() helper
Nikolay Aleksandrov (1):
net: bridge: xmit: make sure we have at least eth header len bytes
include/linux/skbuff.h | 19 +++++++++++++++----
net/bridge/br_device.c | 6 ++++++
2 files changed, 21 insertions(+), 4 deletions(-)
--
2.46.0
Upstream commit 7f9ab862e05c ("leds: spi-byte: Call of_node_put() on error path")
after being backported to 5.10/5.15/6.1/6.6 stable kernels introduced an
access-before-initialization error - it will most likely lead to a crash
in the probe function of the driver if there is no default zero
initialization for the stack values.
The commit moved the initialization of `struct device_node *child` lower
in code, while in stable kernels its value is used in between those places.
Git context of the patch does not cover the situation so it was applied
without any failures.
Upstream commit which removed that intermediate access to the variable is
ccc35ff2fd29 ("leds: spi-byte: Use devm_led_classdev_register_ext()").
I think it's worth a backport, too. The patches for the corresponding
stable trees follow in this thread.
Judging by Documentation/devicetree/bindings/leds/common.yaml, "label"
leds property is deprecated at least since the start of 2020. So there
should be no problem with switching from "label" to "function"+"color"
device name generation in kernels down to 5.10.y.
--
Fedor
The quilt patch titled
Subject: sched/numa: fix the potential null pointer dereference in task_numa_work()
has been removed from the -mm tree. Its filename was
sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shawn Wang <shawnwang(a)linux.alibaba.com>
Subject: sched/numa: fix the potential null pointer dereference in task_numa_work()
Date: Fri, 25 Oct 2024 10:22:08 +0800
When running stress-ng-vm-segv test, we found a null pointer dereference
error in task_numa_work(). Here is the backtrace:
[323676.066985] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
......
[323676.067108] CPU: 35 PID: 2694524 Comm: stress-ng-vm-se
......
[323676.067113] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
[323676.067115] pc : vma_migratable+0x1c/0xd0
[323676.067122] lr : task_numa_work+0x1ec/0x4e0
[323676.067127] sp : ffff8000ada73d20
[323676.067128] x29: ffff8000ada73d20 x28: 0000000000000000 x27: 000000003e89f010
[323676.067130] x26: 0000000000080000 x25: ffff800081b5c0d8 x24: ffff800081b27000
[323676.067133] x23: 0000000000010000 x22: 0000000104d18cc0 x21: ffff0009f7158000
[323676.067135] x20: 0000000000000000 x19: 0000000000000000 x18: ffff8000ada73db8
[323676.067138] x17: 0001400000000000 x16: ffff800080df40b0 x15: 0000000000000035
[323676.067140] x14: ffff8000ada73cc8 x13: 1fffe0017cc72001 x12: ffff8000ada73cc8
[323676.067142] x11: ffff80008001160c x10: ffff000be639000c x9 : ffff8000800f4ba4
[323676.067145] x8 : ffff000810375000 x7 : ffff8000ada73974 x6 : 0000000000000001
[323676.067147] x5 : 0068000b33e26707 x4 : 0000000000000001 x3 : ffff0009f7158000
[323676.067149] x2 : 0000000000000041 x1 : 0000000000004400 x0 : 0000000000000000
[323676.067152] Call trace:
[323676.067153] vma_migratable+0x1c/0xd0
[323676.067155] task_numa_work+0x1ec/0x4e0
[323676.067157] task_work_run+0x78/0xd8
[323676.067161] do_notify_resume+0x1ec/0x290
[323676.067163] el0_svc+0x150/0x160
[323676.067167] el0t_64_sync_handler+0xf8/0x128
[323676.067170] el0t_64_sync+0x17c/0x180
[323676.067173] Code: d2888001 910003fd f9000bf3 aa0003f3 (f9401000)
[323676.067177] SMP: stopping secondary CPUs
[323676.070184] Starting crashdump kernel...
stress-ng-vm-segv in stress-ng is used to stress test the SIGSEGV error
handling function of the system, which tries to cause a SIGSEGV error on
return from unmapping the whole address space of the child process.
Normally this program will not cause kernel crashes. But before the
munmap system call returns to user mode, a potential task_numa_work() for
numa balancing could be added and executed. In this scenario, since the
child process has no vma after munmap, the vma_next() in task_numa_work()
will return a null pointer even if the vma iterator restarts from 0.
Recheck the vma pointer before dereferencing it in task_numa_work().
Link: https://lkml.kernel.org/r/20241025022208.125527-1-shawnwang@linux.alibaba.c…
Fixes: 214dbc428137 ("sched: convert to vma iterator")
Signed-off-by: Shawn Wang <shawnwang(a)linux.alibaba.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: <stable(a)vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/sched/fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/sched/fair.c~sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work
+++ a/kernel/sched/fair.c
@@ -3369,7 +3369,7 @@ retry_pids:
vma = vma_next(&vmi);
}
- do {
+ for (; vma; vma = vma_next(&vmi)) {
if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
@@ -3491,7 +3491,7 @@ retry_pids:
*/
if (vma_pids_forced)
break;
- } for_each_vma(vmi, vma);
+ }
/*
* If no VMAs are remaining and VMAs were skipped due to the PID
_
Patches currently in -mm which might be from shawnwang(a)linux.alibaba.com are
daddr can be NULL if there is no neighbour table entry present,
in that case the tx packet should be dropped.
saddr will usually be set by MCTP core, but check for NULL in case a
packet is transmitted by a different protocol.
Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver")
Cc: stable(a)vger.kernel.org
Reported-by: Dung Cao <dung(a)os.amperecomputing.com>
Signed-off-by: Matt Johnston <matt(a)codeconstruct.com.au>
---
Changes in v3:
- Revert to simpler saddr check of v1, mention in commit message
- Revert whitespace change from v2
- Link to v2: https://lore.kernel.org/r/20241021-mctp-i2c-null-dest-v2-1-4503e478517c@cod…
Changes in v2:
- Set saddr to device address if NULL, mention in commit message
- Fix patch prefix formatting
- Link to v1: https://lore.kernel.org/r/20241018-mctp-i2c-null-dest-v1-1-ba1ab52966e9@cod…
---
drivers/net/mctp/mctp-i2c.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/mctp/mctp-i2c.c b/drivers/net/mctp/mctp-i2c.c
index 4dc057c121f5d0fb9c9c48bf16b6933ae2f7b2ac..e70fb66879941f3937b7ffc5bc1e20a8a435a441 100644
--- a/drivers/net/mctp/mctp-i2c.c
+++ b/drivers/net/mctp/mctp-i2c.c
@@ -588,6 +588,9 @@ static int mctp_i2c_header_create(struct sk_buff *skb, struct net_device *dev,
if (len > MCTP_I2C_MAXMTU)
return -EMSGSIZE;
+ if (!daddr || !saddr)
+ return -EINVAL;
+
lldst = *((u8 *)daddr);
llsrc = *((u8 *)saddr);
---
base-commit: cb560795c8c2ceca1d36a95f0d1b2eafc4074e37
change-id: 20241018-mctp-i2c-null-dest-a0ba271e0c48
Best regards,
--
Matt Johnston <matt(a)codeconstruct.com.au>
This patch series is to fix bugs for below APIs:
devm_phy_put()
devm_of_phy_provider_unregister()
devm_phy_destroy()
phy_get()
of_phy_get()
devm_phy_get()
devm_of_phy_get()
devm_of_phy_get_by_index()
And simplify below API:
of_phy_simple_xlate().
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Changes in v2:
- Correct title, commit message, and inline comments.
- Link to v1: https://lore.kernel.org/r/20241020-phy_core_fix-v1-0-078062f7da71@quicinc.c…
---
Zijun Hu (6):
phy: core: Fix that API devm_phy_put() fails to release the phy
phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider
phy: core: Fix that API devm_phy_destroy() fails to destroy the phy
phy: core: Fix an OF node refcount leakage in _of_phy_get()
phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup()
phy: core: Simplify API of_phy_simple_xlate() implementation
drivers/phy/phy-core.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
---
base-commit: e70d2677ef4088d59158739d72b67ac36d1b132b
change-id: 20241020-phy_core_fix-e3ad65db98f7
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>