From: Ryan Seto <ryanseto(a)amd.com>
[Why]
In the case where a dml allocation fails for any reason, the
current state's dml contexts would no longer be valid. Then
subsequent calls dc_state_copy_internal would shallow copy
invalid memory and if the new state was released, a double
free would occur.
[How]
Reset dml pointers in new_state to NULL and avoid invalid
pointer
Cc: stable(a)vger.kernel.org
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Signed-off-by: Ryan Seto <ryanseto(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
index 2597e3fd562b..e006f816ff2f 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
@@ -265,6 +265,9 @@ struct dc_state *dc_state_create_copy(struct dc_state *src_state)
dc_state_copy_internal(new_state, src_state);
#ifdef CONFIG_DRM_AMD_DC_FP
+ new_state->bw_ctx.dml2 = NULL;
+ new_state->bw_ctx.dml2_dc_power_source = NULL;
+
if (src_state->bw_ctx.dml2 &&
!dml2_create_copy(&new_state->bw_ctx.dml2, src_state->bw_ctx.dml2)) {
dc_state_release(new_state);
--
2.46.1
This patch fixes a reference count handling issue in the function
lpfc_bsg_hba_set_event(). In the branch
if (evt->reg_id == event_req->ev_reg_id), the function calls
lpfc_bsg_event_ref(), which increments the reference count of the
associated resource. However, in the subsequent branch
if (&evt->node == &phba->ct_ev_waiters), a new evt is allocated, but the
old evt should be released at this point. Failing to do so could lead to
issues.
To resolve this issue, we added a release instruction at the beginning of
the next branch if (&evt->node == &phba->ct_ev_waiters), ensuring that the
resources allocated in the previous branch are properly released, thereby
preventing a reference count leak.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.
Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/scsi/lpfc/lpfc_bsg.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c
index 85059b83ea6b..3a65270c5584 100644
--- a/drivers/scsi/lpfc/lpfc_bsg.c
+++ b/drivers/scsi/lpfc/lpfc_bsg.c
@@ -1200,6 +1200,9 @@ lpfc_bsg_hba_set_event(struct bsg_job *job)
spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
if (&evt->node == &phba->ct_ev_waiters) {
+ spin_lock_irqsave(&phba->ct_ev_lock, flags);
+ lpfc_bsg_event_unref(evt);
+ spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
/* no event waiting struct yet - first call */
dd_data = kmalloc(sizeof(struct bsg_job_data), GFP_KERNEL);
if (dd_data == NULL) {
--
2.34.1
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0
Gitweb: https://git.kernel.org/tip/a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0
Author: Mario Limonciello <mario.limonciello(a)amd.com>
AuthorDate: Tue, 05 Nov 2024 10:02:34 -06:00
Committer: Borislav Petkov (AMD) <bp(a)alien8.de>
CommitterDate: Tue, 05 Nov 2024 17:48:32 +01:00
x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
A number of Zen4 client SoCs advertise the ability to use virtualized
VMLOAD/VMSAVE, but using these instructions is reported to be a cause
of a random host reboot.
These instructions aren't intended to be advertised on Zen4 client
so clear the capability.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009
---
arch/x86/kernel/cpu/amd.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index fab5cae..823f44f 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -924,6 +924,17 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
{
if (!cpu_has(c, X86_FEATURE_HYPERVISOR))
msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
+
+ /*
+ * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE
+ * in some BIOS versions but they can lead to random host reboots.
+ */
+ switch (c->x86_model) {
+ case 0x18 ... 0x1f:
+ case 0x60 ... 0x7f:
+ clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD);
+ break;
+ }
}
static void init_amd_zen5(struct cpuinfo_x86 *c)
When support for PCC Opregion was added, validation of field_obj
was missed.
Based on the acpi_ev_address_space_dispatch function description,
field_obj can be NULL, and also when acpi_ev_address_space_dispatch
is called in the acpi_ex_region_read() NULL is passed as field_obj.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0acf24ad7e10 ("ACPICA: Add support for PCC Opregion special context data")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <grurikov(a)gmail.com>
---
drivers/acpi/acpica/evregion.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c
index cf53b9535f18..03e8b6f186af 100644
--- a/drivers/acpi/acpica/evregion.c
+++ b/drivers/acpi/acpica/evregion.c
@@ -164,13 +164,17 @@ acpi_ev_address_space_dispatch(union acpi_operand_object *region_obj,
}
if (region_obj->region.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
- struct acpi_pcc_info *ctx =
- handler_desc->address_space.context;
-
- ctx->internal_buffer =
- field_obj->field.internal_pcc_buffer;
- ctx->length = (u16)region_obj->region.length;
- ctx->subspace_id = (u8)region_obj->region.address;
+ if (field_obj != NULL) {
+ struct acpi_pcc_info *ctx =
+ handler_desc->address_space.context;
+
+ ctx->internal_buffer =
+ field_obj->field.internal_pcc_buffer;
+ ctx->length = (u16)region_obj->region.length;
+ ctx->subspace_id = (u8)region_obj->region.address;
+ } else {
+ return_ACPI_STATUS(AE_ERROR);
+ }
}
if (region_obj->region.space_id ==
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110551-fragrance-deodorant-e7fa@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri, 25 Oct 2024 12:32:55 -0700
Subject: [PATCH] cxl/port: Fix CXL port initialization order when the
subsystem is built-in
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race,
cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses
the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That
flow only works if cxl_acpi can assume ports are enabled immediately
upon cxl_acpi_probe() return. That in turn can only happen in the
CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before
cxl_acpi_probe() runs.
Fix up the order to prevent initialization failures. Ensure that
cxl_port is built-in when cxl_acpi is also built-in, arrange for
Makefile order to resolve the subsys_initcall() order of cxl_port and
cxl_acpi, and arrange for Makefile order to resolve the
device_initcall() (module_init()) order of the remaining objects.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry(a)gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry(a)gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: stable(a)vger.kernel.org
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Tested-by: Alejandro Lucero <alucerop(a)amd.com>
Reviewed-by: Alejandro Lucero <alucerop(a)amd.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 861dde65768f..9dc394295e1f 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = {
},
};
-module_cxl_driver(cxl_port_driver);
+static int __init cxl_port_init(void)
+{
+ return cxl_driver_register(&cxl_port_driver);
+}
+/*
+ * Be ready to immediately enable ports emitted by the platform CXL root
+ * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y.
+ */
+subsys_initcall(cxl_port_init);
+
+static void __exit cxl_port_exit(void)
+{
+ cxl_driver_unregister(&cxl_port_driver);
+}
+module_exit(cxl_port_exit);
+
MODULE_DESCRIPTION("CXL: Port enumeration and services");
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110550-chamomile-zit-2c77@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri, 25 Oct 2024 12:32:55 -0700
Subject: [PATCH] cxl/port: Fix CXL port initialization order when the
subsystem is built-in
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race,
cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses
the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That
flow only works if cxl_acpi can assume ports are enabled immediately
upon cxl_acpi_probe() return. That in turn can only happen in the
CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before
cxl_acpi_probe() runs.
Fix up the order to prevent initialization failures. Ensure that
cxl_port is built-in when cxl_acpi is also built-in, arrange for
Makefile order to resolve the subsys_initcall() order of cxl_port and
cxl_acpi, and arrange for Makefile order to resolve the
device_initcall() (module_init()) order of the remaining objects.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry(a)gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry(a)gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: stable(a)vger.kernel.org
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Tested-by: Alejandro Lucero <alucerop(a)amd.com>
Reviewed-by: Alejandro Lucero <alucerop(a)amd.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 861dde65768f..9dc394295e1f 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = {
},
};
-module_cxl_driver(cxl_port_driver);
+static int __init cxl_port_init(void)
+{
+ return cxl_driver_register(&cxl_port_driver);
+}
+/*
+ * Be ready to immediately enable ports emitted by the platform CXL root
+ * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y.
+ */
+subsys_initcall(cxl_port_init);
+
+static void __exit cxl_port_exit(void)
+{
+ cxl_driver_unregister(&cxl_port_driver);
+}
+module_exit(cxl_port_exit);
+
MODULE_DESCRIPTION("CXL: Port enumeration and services");
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 101c268bd2f37e965a5468353e62d154db38838e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110539-swooned-cold-53f5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 22 Oct 2024 18:43:49 -0700
Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder
shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable(a)vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Alison Schofield <alison.schofield(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Zijun Hu <quic_zijuhu(a)quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil…
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index a4c853411a6b..e42f1ad73078 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data,
}
EXPORT_SYMBOL_GPL(device_for_each_child_reverse);
+/**
+ * device_for_each_child_reverse_from - device child iterator in reversed order.
+ * @parent: parent struct device.
+ * @from: optional starting point in child list
+ * @fn: function to be called for each device.
+ * @data: data for the callback.
+ *
+ * Iterate over @parent's child devices, starting at @from, and call @fn
+ * for each, passing it @data. This helper is identical to
+ * device_for_each_child_reverse() when @from is NULL.
+ *
+ * @fn is checked each iteration. If it returns anything other than 0,
+ * iteration stop and that value is returned to the caller of
+ * device_for_each_child_reverse_from();
+ */
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *))
+{
+ struct klist_iter i;
+ struct device *child;
+ int error = 0;
+
+ if (!parent->p)
+ return 0;
+
+ klist_iter_init_node(&parent->p->klist_children, &i,
+ (from ? &from->p->knode_parent : NULL));
+ while ((child = prev_device(&i)) && !error)
+ error = fn(child, data);
+ klist_iter_exit(&i);
+ return error;
+}
+EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from);
+
/**
* device_find_child - device iterator for locating a particular device.
* @parent: parent struct device
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3df10517a327..223c273c0cd1 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int cxl_decoder_reset(struct cxl_decoder *cxld)
+static int commit_reap(struct device *dev, const void *data)
+{
+ struct cxl_port *port = to_cxl_port(dev->parent);
+ struct cxl_decoder *cxld;
+
+ if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev))
+ return 0;
+
+ cxld = to_cxl_decoder(dev);
+ if (port->commit_end == cxld->id &&
+ ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n",
+ dev_name(&cxld->dev), port->commit_end);
+ }
+
+ return 0;
+}
+
+void cxl_port_commit_reap(struct cxl_decoder *cxld)
+{
+ struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+ lockdep_assert_held_write(&cxl_region_rwsem);
+
+ /*
+ * Once the highest committed decoder is disabled, free any other
+ * decoders that were pinned allocated by out-of-order release.
+ */
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev),
+ port->commit_end);
+ device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL,
+ commit_reap);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL);
+
+static void cxl_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
@@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
u32 ctrl;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
down_read(&cxl_dpa_rwsem);
ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
@@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
up_read(&cxl_dpa_rwsem);
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
/* Userspace is now responsible for reconfiguring this decoder */
@@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
cxled = to_cxl_endpoint_decoder(&cxld->dev);
cxled->state = CXL_DECODER_STATE_MANUAL;
}
-
- return 0;
}
static int cxl_setup_hdm_decoder_from_dvsec(
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e701e4b04032..3478d2058303 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
return 0;
} else {
- dev_err(&cxlr->dev,
- "Failed to synchronize CPU cache state\n");
+ dev_WARN(&cxlr->dev,
+ "Failed to synchronize CPU cache state\n");
return -ENXIO;
}
}
@@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
return 0;
}
-static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
+static void cxl_region_decode_reset(struct cxl_region *cxlr, int count)
{
struct cxl_region_params *p = &cxlr->params;
- int i, rc = 0;
+ int i;
/*
- * Before region teardown attempt to flush, and if the flush
- * fails cancel the region teardown for data consistency
- * concerns
+ * Before region teardown attempt to flush, evict any data cached for
+ * this region, or scream loudly about missing arch / platform support
+ * for CXL teardown.
*/
- rc = cxl_region_invalidate_memregion(cxlr);
- if (rc)
- return rc;
+ cxl_region_invalidate_memregion(cxlr);
for (i = count - 1; i >= 0; i--) {
struct cxl_endpoint_decoder *cxled = p->targets[i];
@@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
cxl_rr = cxl_rr_load(iter, cxlr);
cxld = cxl_rr->decoder;
if (cxld->reset)
- rc = cxld->reset(cxld);
- if (rc)
- return rc;
+ cxld->reset(cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
endpoint_reset:
- rc = cxled->cxld.reset(&cxled->cxld);
- if (rc)
- return rc;
+ cxled->cxld.reset(&cxled->cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
/* all decoders associated with this region have been torn down */
clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
-
- return 0;
}
static int commit_decoder(struct cxl_decoder *cxld)
@@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
* still pending.
*/
if (p->state == CXL_CONFIG_RESET_PENDING) {
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- /*
- * Revert to committed since there may still be active
- * decoders associated with this region, or move forward
- * to active to mark the reset successful
- */
- if (rc)
- p->state = CXL_CONFIG_COMMIT;
- else
- p->state = CXL_CONFIG_ACTIVE;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
+ p->state = CXL_CONFIG_ACTIVE;
}
}
@@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
get_device(&cxlr->dev);
if (p->state > CXL_CONFIG_ACTIVE) {
- /*
- * TODO: tear down all impacted regions if a device is
- * removed out of order
- */
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- if (rc)
- goto out;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
p->state = CXL_CONFIG_ACTIVE;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0d8b810a51f0..5406e3ab3d4a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -359,7 +359,7 @@ struct cxl_decoder {
struct cxl_region *region;
unsigned long flags;
int (*commit)(struct cxl_decoder *cxld);
- int (*reset)(struct cxl_decoder *cxld);
+ void (*reset)(struct cxl_decoder *cxld);
};
/*
@@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
int cxl_num_decoders_committed(struct cxl_port *port);
bool is_cxl_port(const struct device *dev);
struct cxl_port *to_cxl_port(const struct device *dev);
+void cxl_port_commit_reap(struct cxl_decoder *cxld);
struct pci_bus;
int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
struct pci_bus *bus);
diff --git a/include/linux/device.h b/include/linux/device.h
index b4bde8d22697..667cb6db9019 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
int device_for_each_child_reverse(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *));
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
struct device *device_find_child_by_name(struct device *parent,
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 90d5afd52dd0..c5bbd89b3192 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int mock_decoder_reset(struct cxl_decoder *cxld)
+static void mock_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
int id = cxld->id;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
-
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
-
- return 0;
}
static void default_mock_decoder(struct cxl_decoder *cxld)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f8c879192465d9f328cb0df07208ef077c560bb1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110503-registry-reheat-cd11@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f8c879192465d9f328cb0df07208ef077c560bb1 Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Date: Wed, 23 Oct 2024 17:24:33 +0000
Subject: [PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation
rejections
Some versions of the pmic_glink firmware does not allow dynamic GLINK
intent allocations, attempting to send a message before the firmware has
allocated its receive buffers and announced these intent allocations
will fail. When this happens something like this showns up in the log:
pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
GLINK has been updated to distinguish between the cases where the remote
is going down (-ECANCELED) and the intent allocation being rejected
(-EAGAIN).
Retry the send until intent buffers becomes available, or an actual
error occur.
To avoid infinitely waiting for the firmware in the event that this
misbehaves and no intents arrive, an arbitrary 5 second timeout is
used.
This patch was developed with input from Chris Lew.
Reported-by: Johan Hovold <johan(a)kernel.org>
Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
Cc: stable(a)vger.kernel.org # rpmsg: glink: Handle rejected intent request better
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Reviewed-by: Chris Lew <quic_clew(a)quicinc.com>
Link: https://lore.kernel.org/r/20241023-pmic-glink-ecancelled-v2-2-ebc268129407@…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 9606222993fd..baa4ac6704a9 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -4,6 +4,7 @@
* Copyright (c) 2022, Linaro Ltd
*/
#include <linux/auxiliary_bus.h>
+#include <linux/delay.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/platform_device.h>
@@ -13,6 +14,8 @@
#include <linux/soc/qcom/pmic_glink.h>
#include <linux/spinlock.h>
+#define PMIC_GLINK_SEND_TIMEOUT (5 * HZ)
+
enum {
PMIC_GLINK_CLIENT_BATT = 0,
PMIC_GLINK_CLIENT_ALTMODE,
@@ -112,13 +115,29 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register);
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
{
struct pmic_glink *pg = client->pg;
+ bool timeout_reached = false;
+ unsigned long start;
int ret;
mutex_lock(&pg->state_lock);
- if (!pg->ept)
+ if (!pg->ept) {
ret = -ECONNRESET;
- else
- ret = rpmsg_send(pg->ept, data, len);
+ } else {
+ start = jiffies;
+ for (;;) {
+ ret = rpmsg_send(pg->ept, data, len);
+ if (ret != -EAGAIN)
+ break;
+
+ if (timeout_reached) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ usleep_range(1000, 5000);
+ timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT);
+ }
+ }
mutex_unlock(&pg->state_lock);
return ret;