On Tue, 12 May 2026 09:56:07 +0800, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Change coresight_path_assign_trace_id() to return int, moving the
> IS_VALID_CS_TRACE_ID() check inside it so it returns -EINVAL on failure
> and 0 on success. Update both callers to propagate this return value
> directly instead of inspecting path->trace_id after the call.
>
> [...]
Applied, thanks!
[1/1] coresight: fix missing error code when trace ID is invalid
https://git.kernel.org/coresight/c/f4526ffee6ff
Best regards,
--
Suzuki K Poulose <suzuki.poulose(a)arm.com>
On Tue, May 12, 2026 at 09:56:07AM +0800, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Change coresight_path_assign_trace_id() to return int, moving the
> IS_VALID_CS_TRACE_ID() check inside it so it returns -EINVAL on failure
> and 0 on success. Update both callers to propagate this return value
> directly instead of inspecting path->trace_id after the call.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Reviewed-by: James Clark <james.clark(a)linaro.org>
> Reviewed-by: Richard Cheng <icheng(a)nvidia.com>
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
Reviewed-by: Leo Yan <leo.yan(a)arm.com>
On Mon, May 11, 2026 at 05:27:10PM +0800, Jie Gan wrote:
[...]
> > > @@ -755,10 +755,16 @@ void coresight_path_assign_trace_id(struct coresight_path *path,
> > > * Non 0 is either success or fail.
> > > */
> > > if (trace_id != 0) {
> > > - path->trace_id = trace_id;
> > > - return;
> > > + if (IS_VALID_CS_TRACE_ID(trace_id)) {
> > > + path->trace_id = trace_id;
> > > + return 0;
> > > + }
> > > +
> > > + return -EINVAL;
I'd advocate a bit early exit style, like:
/* 0 means the device has no ID assignment, so keep searching */
if (trace_id == 0)
continue;
if (!IS_VALID_CS_TRACE_ID(trace_id))
return -EINVAL;
path->trace_id = trace_id;
return 0;
Early exit can reduce indentation depth, and it handles simple cases
first and then the complex logic. In some cases (maye not this case),
we may benefit a bit from compiler optimization [1].
[1] https://xania.org/202512/18-partial-inlining
[...]
> The return value has been ignored in perf mode. It will introduce noisy by
> adding __must_check. So I think its better without __must_check?
Wouldn't it need to update perf mode as well?
Regarding __must_check, I searched Documentation but didn't find
guidance on when it should be used. I don't want to use this annotation
randomly (some functions use it and some not), this will be hard for
everyone to follow up.
IMO, it's fine not to use __must_check here. I would leave this to
Suzuki and other maintainers if have different opinions.
Thanks,
Leo
This series focuses on CoreSight path power management. The changes can
be divided into four parts for review:
Patches 01 - 10: Preparison for CPU PM:
Fix source disabling on idr_alloc failure.
Fix helper enable failure handling.
Refactor CPU ID stored in csdev.
Move CPU lock to sysfs layer.
Move per-CPU source pointer from etm-perf to core layer.
Refactor etm-perf to retrieve source via per-CPU's event
data for lockless and get source reference during AUX
setup.
Patches 11 - 13: Refactor CPU idle flow managed in the CoreSight core
layer.
Patches 14 - 23: Refactor path enable / disable with range, control path
during CPU idle.
Patches 24 - 25: Support the sink (TRBE) control during CPU idle.
Patches 26 - 28: Move CPU hotplug into the core layer, and fix sysfs
mode for hotplug.
This series is rebased on the coresight-next branch and has been verified
on Juno-r2 (ETM + ETR) and FVP RevC (ETE + TRBE). Built successfully
for armv7 (ARCH=arm).
---
Changes in v12:
- Added comments on coresight_{get|put)_percpu_source_ref (Suzuki).
- Refined failure handling in path enable (Suzuki).
- Added coresight_is_software_source() helper (Suzuki).
- Reordered taking ref on csdev and its parent in patch 07.
- Define the enum mode with bit flags.
- Minor improvements on commit logs.
- Rebased on lastest coresight-next.
- Link to v11: https://lore.kernel.org/r/20260501-arm_coresight_path_power_management_impr…
Changes in v11:
- Moved per-CPU source pointer from etm-perf to core (Suzuki).
- Added grabbing/ungrabbing csdev for device reference (Suzuki).
- Minor refine for error handling and logs in CPU PM (James).
- Refactored etm-perf with fetching path/source from event data (Suzuki).
- Fixed Helper error handling (sashiko).
- Added Jie's test tag (thanks!).
- Minor improvement for comments and commit logs.
- Link to v10: https://lore.kernel.org/r/20260405-arm_coresight_path_power_management_impr…
Changes in v10:
- Removed redundant checks in ETMv4 PM callbacks (sashiko).
- Added a new const structure etm4_cs_pm_ops (sashiko).
- Used fine-grained spinlock on sysfs_active_config (sashiko).
- Blocked notification after failures in save / restore to avoid lockups.
- Changed Change CPUHP_AP_ARM_CORESIGHT_STARTING to
CPUHP_AP_ARM_CORESIGHT_ONLINE so that the CPU hotplug callback runs in
the thread context (sashiko).
- Link to v9: https://lore.kernel.org/r/20260401-arm_coresight_path_power_management_impr…
Signed-off-by: Leo Yan <leo.yan(a)arm.com>
---
Jie Gan (1):
coresight: Fix source not disabled on idr_alloc_u32 failure
Leo Yan (26):
coresight: Handle helper enable failure properly
coresight: Extract device init into coresight_init_device()
coresight: Populate CPU ID into coresight_device
coresight: Remove .cpu_id() callback from source ops
coresight: Take hotplug lock in enable_source_store() for Sysfs mode
coresight: perf: Retrieve path and source from event data
coresight: Take a reference on csdev
coresight: Move per-CPU source pointer to core layer
coresight: Take per-CPU source reference during AUX setup
coresight: Register CPU PM notifier in core layer
coresight: etm4x: Hook CPU PM callbacks
coresight: etm4x: Remove redundant checks in PM save and restore
coresight: syscfg: Use IRQ-safe spinlock to protect active variables
coresight: Disable source helpers in coresight_disable_path()
coresight: Control path with range
coresight: Use helpers to fetch first and last nodes
coresight: Introduce coresight_enable_source() helper
coresight: Save active path for system tracers
coresight: etm4x: Set active path on target CPU
coresight: etm3x: Set active path on target CPU
coresight: sysfs: Use source's path pointer for path control
coresight: Control path during CPU idle
coresight: Add PM callbacks for sink device
coresight: sysfs: Increment refcount only for software source
coresight: Move CPU hotplug callbacks to core layer
coresight: sysfs: Validate CPU online status for per-CPU sources
Yabin Cui (1):
coresight: trbe: Save and restore state across CPU low power state
drivers/hwtracing/coresight/coresight-catu.c | 2 +-
drivers/hwtracing/coresight/coresight-core.c | 548 ++++++++++++++++++---
drivers/hwtracing/coresight/coresight-cti-core.c | 9 +-
drivers/hwtracing/coresight/coresight-etm-perf.c | 285 ++++++-----
drivers/hwtracing/coresight/coresight-etm3x-core.c | 73 +--
drivers/hwtracing/coresight/coresight-etm4x-core.c | 166 ++-----
drivers/hwtracing/coresight/coresight-priv.h | 6 +
drivers/hwtracing/coresight/coresight-syscfg.c | 38 +-
drivers/hwtracing/coresight/coresight-syscfg.h | 2 +
drivers/hwtracing/coresight/coresight-sysfs.c | 131 ++---
drivers/hwtracing/coresight/coresight-trbe.c | 61 ++-
include/linux/coresight.h | 27 +-
include/linux/cpuhotplug.h | 2 +-
13 files changed, 870 insertions(+), 480 deletions(-)
---
base-commit: 0ec0a8785d21f63db520bd9d2a67c55e855d36a8
change-id: 20251104-arm_coresight_path_power_management_improvement-dab4966f8280
Best regards,
--
Leo Yan <leo.yan(a)arm.com>
On 08/05/2026 6:45 am, Jie Gan wrote:
> When coresight_path_assign_trace_id() fails to allocate a valid trace
> ID, the code jumps to err_path without setting ret to an error value.
> This causes coresight_enable_sysfs() to return 0 (success) to the
> caller even though no trace session was started.
>
> Set ret = -EINVAL before the goto so that callers receive a proper
> error code.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> ---
> drivers/hwtracing/coresight/coresight-sysfs.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-sysfs.c b/drivers/hwtracing/coresight/coresight-sysfs.c
> index d2a6ed8bcc74..c9338c783540 100644
> --- a/drivers/hwtracing/coresight/coresight-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-sysfs.c
> @@ -195,42 +195,44 @@ int coresight_enable_sysfs(struct coresight_device *csdev)
> */
> if (subtype == CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE)
> csdev->refcnt++;
> goto out;
> }
>
> sink = coresight_find_activated_sysfs_sink(csdev);
> if (!sink) {
> ret = -EINVAL;
> goto out;
> }
>
> path = coresight_build_path(csdev, sink);
> if (IS_ERR(path)) {
> pr_err("building path(s) failed\n");
> ret = PTR_ERR(path);
> goto out;
> }
>
> coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + if (!IS_VALID_CS_TRACE_ID(path->trace_id)) {
> + ret = -EINVAL;
> goto err_path;
> + }
>
> ret = coresight_enable_path(path, CS_MODE_SYSFS);
> if (ret)
> goto err_path;
>
> ret = coresight_enable_source_sysfs(csdev, CS_MODE_SYSFS, path);
> if (ret)
> goto err_source;
>
> switch (subtype) {
> case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC:
> /*
> * When working from sysFS it is important to keep track
> * of the paths that were created so that they can be
> * undone in 'coresight_disable()'. Since there can only
> * be a single session per tracer (when working from sysFS)
> * a per-cpu variable will do just fine.
> */
> cpu = source_ops(csdev)->cpu_id(csdev);
> per_cpu(tracer_path, cpu) = path;
>
> ---
> base-commit: 17c7841d09ee7d33557fd075562d9289b6018c90
> change-id: 20260508-fix-trace-id-error-dbfdd4d8f2d1
>
> Best regards,
Reviewed-by: James Clark <james.clark(a)linaro.org>
On Mon, May 11, 2026 at 05:28:37PM +0800, Jie Gan wrote:
> Hi Leo,
>
> On 5/11/2026 5:24 PM, Leo Yan wrote:
> > On Mon, May 11, 2026 at 05:04:44PM +0800, Jie Gan wrote:
> > > In coresight_enable_sysfs(), for non-CPU sources (SOFTWARE, TPDM,
> > > OTHERS), the source device is enabled via coresight_enable_source_sysfs()
> > > before idr_alloc_u32() maps the path. If idr_alloc_u32() fails, the
> > > original code jumped directly to err_source, which only calls
> > > coresight_disable_path() and coresight_release_path(). The source device
> > > was left enabled with an incremented refcnt but no path tracked for it,
> > > leaving the device in an inconsistent state.
> > >
> > > Disable the source before jumping to err_source so the enable and path
> > > operations are fully unwound.
> > >
> > > Fixes: 1f5149c7751c ("coresight: Move all sysfs code to sysfs file")
> > > Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
> >
> > Actually I have noticed this. Since my PM series will remove IDR things,
> > and I don't think anyone really hit idr alloc error, this is why I
> > didn't send fix for this.
> >
> > Anyway, this is a reasonable fix. I will send out my PM series later
> > in today, I will pick this patch into my series and rebase on it, hope
> > this is easier for all of us.
>
> Well noted. Please feel free to pick it into your series.
Thanks! Just note, I updated the Fixes tag as:
Fixes: 5c0016d7b343 ("coresight: core: Use IDR for non-cpu bound sources' paths.")
Which is the original patch for the issue.
Thanks,
Leo
On Mon, May 11, 2026 at 05:04:44PM +0800, Jie Gan wrote:
> In coresight_enable_sysfs(), for non-CPU sources (SOFTWARE, TPDM,
> OTHERS), the source device is enabled via coresight_enable_source_sysfs()
> before idr_alloc_u32() maps the path. If idr_alloc_u32() fails, the
> original code jumped directly to err_source, which only calls
> coresight_disable_path() and coresight_release_path(). The source device
> was left enabled with an incremented refcnt but no path tracked for it,
> leaving the device in an inconsistent state.
>
> Disable the source before jumping to err_source so the enable and path
> operations are fully unwound.
>
> Fixes: 1f5149c7751c ("coresight: Move all sysfs code to sysfs file")
> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
Actually I have noticed this. Since my PM series will remove IDR things,
and I don't think anyone really hit idr alloc error, this is why I
didn't send fix for this.
Anyway, this is a reasonable fix. I will send out my PM series later
in today, I will pick this patch into my series and rebase on it, hope
this is easier for all of us.
Thanks,
Leo
On Fri, May 08, 2026 at 01:45:35PM +0800, Jie Gan wrote:
[...]
> coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + if (!IS_VALID_CS_TRACE_ID(path->trace_id)) {
> + ret = -EINVAL;
> goto err_path;
> + }
On the top of this patch, could we do a further improvement?
Move IS_VALID_CS_TRACE_ID() into coresight_path_assign_trace_id() and
return 0 for success and < 0 for failures. As result, callers only
need to check the returned value.
For this patch:
Reviewed-by: Leo Yan <leo.yan(a)arm.com>
On Thu, May 07, 2026 at 03:55:26PM +0100, James Clark wrote:
[...]
> Hi Leo,
>
> Testing on the Orion O6 board was all good, and so was stress testing
> concurrent sysfs mode and hotplug on Juno.
Thanks a lot for test!
> However, I was trying to stress test sysfs mode and rmmod on Juno and
> ran into an issue, although a similar issue is present without your
> patchset.
I don't think CPU PM introduces additional complexity for the above cases.
The reason is that CPU PM notifiers _only_ apply to active sessions, and
once a device is enabled, the module cannot be removed.
If the race conditions between enabling/disabling sessions and module
load/unload are properly handled, CPU PM should be safe. If we have bug
in these race conditions, the high frequency data access in CPU PM may
expose the issues - I don't expect CPU PM is the culprit.
> If you run an rmmod on all the coresight devices at the same time as an
> enable_source / disable loop you always get this:
>
> WARNING: possible circular locking dependency detected
> 7.0.0-rc1+ #713 Tainted: G N
> ------------------------------------------------------
> rmmod/1361 is trying to acquire lock:
> ffff0008042f69a8 (kn->active#144){++++}-{0:0}, at:
> __kernfs_remove+0x1b8/0x2c8
kn->active is not a lock but an active reference of sysfs node, but it
use lockdep annotation to detect lock dependency.
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(coresight_mutex);
> lock(cpu_hotplug_lock);
> lock(coresight_mutex);
> lock(kn->active#144);
> *** DEADLOCK ***
The potential deadlock sequence could be:
kernfs_fop_write_iter()
`> kernfs_get_active_of() => acquire kn->active
`> coresight_enable_sysfs() => acquire coresight_mutex
coresight_unregister() => acquire coresight_mutex
`> device_unregister()
`> __kernfs_remove()
`> kernfs_drain() => acquire kn->active
> I think the issue can be fixed by releasing the coresight_mutex before
> device_unregister():
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c
> b/drivers/hwtracing/coresight/coresight-core.c
> index 015363da12fa..620560880f12 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -1639,8 +1639,8 @@ void coresight_unregister(struct coresight_device
> *csdev)
> coresight_remove_conns(csdev);
> coresight_clear_default_sink(csdev);
> coresight_release_platform_data(csdev->dev.parent, csdev->pdata);
> - device_unregister(&csdev->dev);
> mutex_unlock(&coresight_mutex);
> + device_unregister(&csdev->dev);
> }
> EXPORT_SYMBOL_GPL(coresight_unregister);
If so, we also need to move device_register() out of the mutex scope.
That said, I still think we should dive a bit if can use smaller locking
granluarity (combining with bus management provided by device model).
> Although I didn't think too hard about the implications, but it might be ok
> because once all the connections are removed the device can't be used so
> releasing the coresight_mutex isn't an issue.
>
> But then testing that I ran into some kind of refleak where I couldn't
> unload modules anymore, even though I'd disabled everything. But that could
> be a different issue:
>
> rmmod: ERROR: Module coresight_funnel is in use
> rmmod: ERROR: Module coresight_replicator is in use
> rmmod: ERROR: Module coresight_etm4x is in use
> rmmod: ERROR: Module coresight_tmc is in use
> rmmod: ERROR: Module coresight_cti is in use
> rmmod: ERROR: Module coresight is in use by: coresight_tmc coresight_cti
> coresight_etm4x coresight_replicator coresight_funnel
I suspect this is due to module references are not properly released, or
the entire CS path is not properly disabled.
After the issue occurs, can the ETM sysfs knob still be accessed? I am
curious whether this is caused by the sysfs knob disappearing so no way
to disable the path or the sysfs knob still exists but the driver
internally misses to disable the path.
> Anyway I don't think your patches make this worse, so we can probably ignore
> it, but it would be good to be able to stress test the new modifications
> around the same area.
As no regression in test, I agree that we should not defer this series.
We can fix the race with module load/unload as a separate task:
- sysfs mode + module load/unload
- perf mode + module load/unload
Then we can combine stress test with CPU idle/hotplug.
Thanks,
Leo
On 09/05/2026 1:58 am, Jie Gan wrote:
> When coresight_path_assign_trace_id() cannot assign a valid trace ID,
> coresight_enable_sysfs() takes the err_path goto with ret still 0,
> returning success to the caller despite no trace session being started.
>
> Fix this by changing coresight_path_assign_trace_id() to return int.
> Move the IS_VALID_CS_TRACE_ID() check inside the function so it returns
> -EINVAL on failure and 0 on success. Update coresight_enable_sysfs() to
> check the return value directly instead of inspecting path->trace_id
> after the call.
>
> The other caller in coresight-etm-perf.c discards the return value and
> continues to check path->trace_id via IS_VALID_CS_TRACE_ID() directly.
> This is unaffected: on failure path->trace_id is no longer written, so
> it remains 0, which IS_VALID_CS_TRACE_ID() rejects the same as before.
>
> Fixes: d87d76d823d1 ("Coresight: Allocate trace ID after building the path")
> Signed-off-by: Jie Gan <jie.gan(a)oss.qualcomm.com>
> ---
> Changes in v2:
> - Refactor the coresight_path_assign_trace_id function.
> - Link to v1: https://lore.kernel.org/r/20260508-fix-trace-id-error-v1-1-5f11a5456fdf@oss…
> ---
> drivers/hwtracing/coresight/coresight-core.c | 14 ++++++++++----
> drivers/hwtracing/coresight/coresight-priv.h | 2 +-
> drivers/hwtracing/coresight/coresight-sysfs.c | 4 ++--
> 3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 46f247f73cf6..cabdc0c72f38 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -739,12 +739,12 @@ static int coresight_get_trace_id(struct coresight_device *csdev,
> * Call this after creating the path and before enabling it. This leaves
> * the trace ID set on the path, or it remains 0 if it couldn't be assigned.
> */
> -void coresight_path_assign_trace_id(struct coresight_path *path,
> +int coresight_path_assign_trace_id(struct coresight_path *path,
> enum cs_mode mode)
> {
> struct coresight_device *sink = coresight_get_sink(path);
> struct coresight_node *nd;
> - int trace_id;
> + int trace_id, ret = -EINVAL;
>
> list_for_each_entry(nd, &path->path_list, link) {
> /* Assign a trace ID to the path for the first device that wants to do it */
> @@ -755,10 +755,16 @@ void coresight_path_assign_trace_id(struct coresight_path *path,
> * Non 0 is either success or fail.
> */
> if (trace_id != 0) {
> - path->trace_id = trace_id;
> - return;
> + if (IS_VALID_CS_TRACE_ID(trace_id)) {
> + path->trace_id = trace_id;
> + ret = 0;
Reviewed-by: James Clark <james.clark(a)linaro.org>
Minor nit, but just "return 0" here is a bit simpler.
> + }
> +
> + return ret;
And then "return -EINVAL" for these ones.
> }
> }
> +
> + return ret; > }
>
> /**
> diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
> index 1ea882dffd70..34c7e792adbd 100644
> --- a/drivers/hwtracing/coresight/coresight-priv.h
> +++ b/drivers/hwtracing/coresight/coresight-priv.h
> @@ -153,7 +153,7 @@ int coresight_make_links(struct coresight_device *orig,
> void coresight_remove_links(struct coresight_device *orig,
> struct coresight_connection *conn);
> u32 coresight_get_sink_id(struct coresight_device *csdev);
> -void coresight_path_assign_trace_id(struct coresight_path *path,
> +int coresight_path_assign_trace_id(struct coresight_path *path,
> enum cs_mode mode);
>
> #if IS_ENABLED(CONFIG_CORESIGHT_SOURCE_ETM3X)
> diff --git a/drivers/hwtracing/coresight/coresight-sysfs.c b/drivers/hwtracing/coresight/coresight-sysfs.c
> index d2a6ed8bcc74..b6a870399e83 100644
> --- a/drivers/hwtracing/coresight/coresight-sysfs.c
> +++ b/drivers/hwtracing/coresight/coresight-sysfs.c
> @@ -211,8 +211,8 @@ int coresight_enable_sysfs(struct coresight_device *csdev)
> goto out;
> }
>
> - coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> - if (!IS_VALID_CS_TRACE_ID(path->trace_id))
> + ret = coresight_path_assign_trace_id(path, CS_MODE_SYSFS);
> + if (ret)
> goto err_path;
>
> ret = coresight_enable_path(path, CS_MODE_SYSFS);
>
> ---
> base-commit: 17c7841d09ee7d33557fd075562d9289b6018c90
> change-id: 20260508-fix-trace-id-error-dbfdd4d8f2d1
>
> Best regards,