An invalid pointer dereference bug was reported on arm64 cpu, and has
not yet been seen on x86. A partial oops looks like:
Call trace:
update_cfs_rq_h_load+0x80/0xb0
wake_affine+0x158/0x168
select_task_rq_fair+0x364/0x3a8
try_to_wake_up+0x154/0x648
wake_up_q+0x68/0xd0
futex_wake_op+0x280/0x4c8
do_futex+0x198/0x1c0
__arm64_sys_futex+0x11c/0x198
Link: https://lore.kernel.org/all/20251013071820.1531295-1-CruzZhao@linux.alibaba…
We found that the task_group corresponding to the problematic se
is not in the parent task_group’s children list, indicating that
h_load_next points to an invalid address. Consider the following
cgroup and task hierarchy:
A
/ \
/ \
B E
/ \ |
/ \ t2
C D
| |
t0 t1
Here follows a timing sequence that may be responsible for triggering
the problem:
CPU X CPU Y CPU Z
wakeup t0
set list A->B->C
traverse A->B->C
t0 exits
destroy C
wakeup t2
set list A->E wakeup t1
set list A->B->D
traverse A->B->C
panic
CPU Z sets ->h_load_next list to A->B->D, but due to arm64 weaker memory
ordering, Y may observe A->B before it sees B->D, then in this time window,
it can traverse A->B->C and reach an invalid se.
We can avoid stale pointer accesses by clearing ->h_load_next when
unregistering cgroup.
Suggested-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()")
Cc: <stable(a)vger.kernel.org>
Co-developed-by: Cruz Zhao <CruzZhao(a)linux.alibaba.com>
Signed-off-by: Cruz Zhao <CruzZhao(a)linux.alibaba.com>
Signed-off-by: Peng Wang <peng_wang(a)linux.alibaba.com>
---
kernel/sched/fair.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cee1793e8277..a5fce15093d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -13427,6 +13427,14 @@ void unregister_fair_sched_group(struct task_group *tg)
list_del_leaf_cfs_rq(cfs_rq);
}
remove_entity_load_avg(se);
+ /*
+ * Clear parent's h_load_next if it points to the
+ * sched_entity being freed to avoid stale pointer.
+ */
+ struct cfs_rq *parent_cfs_rq = cfs_rq_of(se);
+
+ if (READ_ONCE(parent_cfs_rq->h_load_next) == se)
+ WRITE_ONCE(parent_cfs_rq->h_load_next, NULL);
}
/*
--
2.27.0
On LoongArch system, guest PMU hardware is shared by guest and host
and PMU interrupt is separated. PMU is pass-through to VM, and there is
PMU context switch when exit to host and return to guest.
There is optimiation to check whether PMU is enabled by guest. If not,
it is not necessary to return to guest. However it is enabled, PMU
context for guest need switch on. Now KVM_REQ_PMU notification is set
on vcpu context switch, however it is missing if there is no vcpu context
switch and PMU is used by guest VM.
Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bibo Mao <maobibo(a)loongson.cn>
---
arch/loongarch/kvm/vcpu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 30e3b089a596..bf56ad29ac15 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -132,6 +132,9 @@ static void kvm_lose_pmu(struct kvm_vcpu *vcpu)
* Clear KVM_LARCH_PMU if the guest is not using PMU CSRs when
* exiting the guest, so that the next time trap into the guest.
* We don't need to deal with PMU CSRs contexts.
+ *
+ * Otherwise set request bit KVM_REQ_PMU to restore guest PMU
+ * before entering guest VM
*/
val = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_PERFCTRL0);
val |= kvm_read_sw_gcsr(csr, LOONGARCH_CSR_PERFCTRL1);
@@ -139,6 +142,8 @@ static void kvm_lose_pmu(struct kvm_vcpu *vcpu)
val |= kvm_read_sw_gcsr(csr, LOONGARCH_CSR_PERFCTRL3);
if (!(val & KVM_PMU_EVENT_ENABLED))
vcpu->arch.aux_inuse &= ~KVM_LARCH_PMU;
+ else
+ kvm_make_request(KVM_REQ_PMU, vcpu);
kvm_restore_host_pmu(vcpu);
}
--
2.39.3
A few cleanups and a bugfix that are either suitable after the swap
table phase I or found during code review.
Patch 1 is a bugfix and needs to be included in the stable branch,
the rest have no behavior change.
---
Kairui Song (4):
mm, swap: do not perform synchronous discard during allocation
mm, swap: rename helper for setup bad slots
mm, swap: cleanup swap entry allocation parameter
mm/migrate, swap: drop usage of folio_index
include/linux/swap.h | 4 ++--
mm/migrate.c | 4 ++--
mm/shmem.c | 2 +-
mm/swap.h | 21 -----------------
mm/swapfile.c | 64 ++++++++++++++++++++++++++++++++++++----------------
mm/vmscan.c | 4 ++--
6 files changed, 52 insertions(+), 47 deletions(-)
---
base-commit: 53e573001f2b5168f9b65d2b79e9563a3b479c17
change-id: 20251007-swap-clean-after-swap-table-p1-b9a7635ee3fa
Best regards,
--
Kairui Song <kasong(a)tencent.com>
When waiting for the PCIe link to come up, both link up and link down
are valid results depending on the device state.
Since the link may come up later and to get rid of the following
mis-reported PM errors. Do not return an -ETIMEDOUT error, as the
outcome has already been reported in dw_pcie_wait_for_link().
PM error logs introduced by the -ETIMEDOUT error return.
imx6q-pcie 33800000.pcie: Phy link never came up
imx6q-pcie 33800000.pcie: PM: dpm_run_callback(): genpd_resume_noirq returns -110
imx6q-pcie 33800000.pcie: PM: failed to resume noirq: error -110
Cc: stable(a)vger.kernel.org
Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
---
drivers/pci/controller/dwc/pcie-designware-host.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index a4d9838bc33f0..8430ac433d457 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -1212,10 +1212,9 @@ int dw_pcie_resume_noirq(struct dw_pcie *pci)
if (ret)
return ret;
- ret = dw_pcie_wait_for_link(pci);
- if (ret)
- return ret;
+ /* Ignore errors, the link may come up later */
+ dw_pcie_wait_for_link(pci);
- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(dw_pcie_resume_noirq);
--
2.37.1
A chip freeze is observed on i.MX7D when PCIe RC kicks off the PM_PME
message and no devices are connected on the port.
To work aroud such kind of issue, skip PME_Turn_Off message if there is
no endpoint connected.
Cc: stable(a)vger.kernel.org
Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Fixes: a528d1a72597 ("PCI: imx6: Use DWC common suspend resume method")
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
---
drivers/pci/controller/dwc/pcie-designware-host.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 09b50a5ce19bb..a4d9838bc33f0 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -1136,12 +1136,15 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
return 0;
- if (pci->pp.ops->pme_turn_off) {
- pci->pp.ops->pme_turn_off(&pci->pp);
- } else {
- ret = dw_pcie_pme_turn_off(pci);
- if (ret)
- return ret;
+ /* Skip PME_Turn_Off message if there is no endpoint connected */
+ if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_WAIT) {
+ if (pci->pp.ops->pme_turn_off) {
+ pci->pp.ops->pme_turn_off(&pci->pp);
+ } else {
+ ret = dw_pcie_pme_turn_off(pci);
+ if (ret)
+ return ret;
+ }
}
if (dwc_quirk(pci, QUIRK_NOL2POLL_IN_PM)) {
--
2.37.1