Hi Levi,
On Wed, May 14, 2025 at 12:04:39PM +0100, Yeoreum Yun wrote:
[...]
> > > +static bool cscfg_config_desc_get(struct cscfg_config_desc *config_desc)
> >
> > I would like to change the return type to int, so the error is handled
> > within the function. As a result, the caller _cscfg_activate_config()
> > does not need to explicitly return an error value.
>
> I think it's not good since cscfg_config_desc_get() failed only when
> try_module_get() failed and its return type is "bool".
Understood. I thought it would be easier for later maintenance if we
encapsulate error handling in the function, but I don't have strong
opinion. It is fine for me to return bool type.
Thanks,
Leo
Hi Levi,
On Fri, May 02, 2025 at 11:34:17AM +0100, Yeoreum Yun wrote:
[...]
> > > > > --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > > > > +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > > > > @@ -1020,6 +1020,9 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
> > > > > smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
> > > > >
> > > > > raw_spin_unlock(&drvdata->spinlock);
> > > > > +
> > > > > + cscfg_csdev_disable_active_config(csdev);
> > > > > +
> > > >
> > > > In general, we need to split changes into several patches if each
> > > > addresses a different issue. From my understanding, the change above is
> > > > to fix missing to disable config when disable Sysfs mode.
> > > >
> > > > If so, could we use a seperate patch for this change?
> > > >
> > >
> > > It's not a differnt issue. Without this line, the active count wouldn't
> > > decrese and it raise another issue -- unloadable moudle for active_cnt :(
> > > So I think it should be included in this patch.
> >
> > I read the code again and concluded the change above is not related to
> > locking and would be a separate issue: when we close a Sysfs session,
> > we need to disable a config on a CoreSight device.
> > Could you clarify what is meaning "unloadable moudle for active_cnt"?
> > I only saw "cscfg_mgr->sys_active_cnt" is used for module unloading,
> > but have no clue why "active_cnt" impacts module unloading.
>
> Yes. but it also related "by this patch".
> When the load config and "activate" configuration via sysfs,
> not only the cscfg_mgr->sys_active_cnt is increase but also
> "individual cscfg->active_cnt".
> This patch extends the meaning of "cscfg->active_cnt" includes
> "enable of configuraiton". so that cscfg_config_desc_put() prevent
> decrease "module reference" while holding individual active_cnt.
> That's why without this change, the "module reference" couldn't be
> decreased. The problem before this change is deactivation doesn't
> chekc cscfg->active_cnt but put module reference whenever it calls.
Thanks for explanation and it makes sense to me.
As we discussed, given this patch is relative big, let us divide into
three small patches for easier review.
- The first patch is to address that the sysfs interface misses to
call cscfg_csdev_disable_active_config() for disabling config.
- The second patch fixes the locking issue for "config_csdev_list".
As the "config_csdev_list" is protected by cscfg_csdev_lock, the
cscfg_remove_owned_csdev_configs() function should use lock for
exclusive operations.
- The third patch is to fix reference counter of a config module.
The patch adds cscfg_config_desc_get() and cscfg_config_desc_put()
in the config enabling / disabling flow for acquiring module
reference counter.
Thanks,
Leo
This series is to enable AUX pause and resume on Arm CoreSight.
The first patch extracts the trace unit controlling operations to two
functions. These two functions will be used by AUX pause and resume.
Patches 02 and 03 change the ETMv4 driver to prepare callback functions
for AUX pause and resume.
Patch 04 changes the ETM perf layer to support AUX pause and resume in a
perf session. The patch 05 re-enables sinks after buffer update, based
on it, the patch 06 updates buffer on AUX pause occasion, which can
mitigate the trace data lose issue.
Patch 07 documents the AUX pause usages with Arm CoreSight.
This patch set has been verified on the Hikey960 board.
It is suggested to disable CPUIdle (add `nohlt` option in Linux command
line) when verifying this series. ETM and funnel drivers are found
issues during CPU suspend and resume which will be addressed separately.
Changes from v3:
- Re-enabled sink in buffer update callbacks (Suzuki).
Changes from v2:
- Rebased on CoreSight next branch.
- Dropped the uAPI 'update_buf_on_pause' and updated document
respectively (Suzuki).
- Renamed ETM callbacks to .pause_perf() and .resume_perf() (Suzuki).
- Minor improvement for error handling in the AUX resume flow.
Changes from v1:
- Added validation function pointers in pause and resume APIs (Mike).
Leo Yan (7):
coresight: etm4x: Extract the trace unit controlling
coresight: Introduce pause and resume APIs for source
coresight: etm4x: Hook pause and resume callbacks
coresight: perf: Support AUX trace pause and resume
coresight: tmc: Re-enable sink after buffer update
coresight: perf: Update buffer on AUX pause
Documentation: coresight: Document AUX pause and resume
Documentation/trace/coresight/coresight-perf.rst | 31 +++++++++
drivers/hwtracing/coresight/coresight-core.c | 22 +++++++
drivers/hwtracing/coresight/coresight-etm-perf.c | 84 +++++++++++++++++++++++-
drivers/hwtracing/coresight/coresight-etm4x-core.c | 143 +++++++++++++++++++++++++++++------------
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +
drivers/hwtracing/coresight/coresight-priv.h | 2 +
drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 +++
drivers/hwtracing/coresight/coresight-tmc-etr.c | 10 +++
include/linux/coresight.h | 4 ++
9 files changed, 265 insertions(+), 42 deletions(-)
--
2.34.1
This series is to fix device registration and unregistration.
The first patch addresses the resource is not released properly for a
failure case during a device registration.
The second patch is to use mutex to protect unregistration flow.
The last three patches are for refactoring. Patch 03 explicitly uses
the parent device handler. Patch 04 separates the success and failure
flows for code readable and easier maintenance. Patch 05 improves the
error handling by invoking specific functions for resource cleanup.
Leo Yan (5):
coresight: Correct sink ID map allocation failure handling
coresight: Protect unregistration with mutex
coresight: Explicitly use the parent device handler
coresight: Separate failure and success flows
coresight: Refine error handling for device registration
drivers/hwtracing/coresight/coresight-core.c | 67 +++++++++++---------
1 file changed, 37 insertions(+), 30 deletions(-)
--
2.34.1
The Trace Network On Chip (TNOC) is an integration hierarchy which is a
hardware component that integrates the functionalities of TPDA and
funnels. It collects trace from subsystems and transfers it to coresight
sink.
In addition to the generic TNOC mentioned above, there is also a special type
of TNOC called Interconnect TNOC. Unlike the generic TNOC, the Interconnect
TNOC doesn't need ATID. Its primary function is to connect the source of
subsystems to the Aggregator TNOC. Its driver is different from this patch and
will describe it and upstream its driver separately.
Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
---
Changes in v5:
- update cover-letter to describe the Interconnect TNOC.
- Link to v4: https://lore.kernel.org/r/20250415-trace-noc-v4-0-979938fedfd8@quicinc.com
Changes in v4:
- Fix dt_binding warning.
- update mask of trace_noc amba_id.
- Modify driver comments.
- rename TRACE_NOC_SYN_VAL to TRACE_NOC_SYNC_INTERVAL.
- Link to v3: https://lore.kernel.org/r/20250411-trace-noc-v3-0-1f19ddf7699b@quicinc.com
Changes in v3:
- Remove unnecessary sysfs nodes.
- update commit messages.
- Use 'writel' instead of 'write_relaxed' when writing to the register for the last time.
- Add trace_id ops.
- Link to v2: https://lore.kernel.org/r/20250226-trace-noc-driver-v2-0-8afc6584afc5@quici…
Changes in v2:
- Modified the format of DT binging file.
- Fix compile warnings.
- Link to v1: https://lore.kernel.org/r/46643089-b88d-49dc-be05-7bf0bb21f847@quicinc.com
---
Yuanfang Zhang (2):
dt-bindings: arm: Add device Trace Network On Chip definition
coresight: add coresight Trace Network On Chip driver
.../bindings/arm/qcom,coresight-tnoc.yaml | 111 ++++++++++++
drivers/hwtracing/coresight/Kconfig | 13 ++
drivers/hwtracing/coresight/Makefile | 1 +
drivers/hwtracing/coresight/coresight-tnoc.c | 191 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-tnoc.h | 34 ++++
5 files changed, 350 insertions(+)
---
base-commit: a2cc6ff5ec8f91bc463fd3b0c26b61166a07eb11
change-id: 20250403-trace-noc-f8286b30408e
Best regards,
--
Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
On Mon, 31 Mar 2025 10:14:53 +0300, Andy Shevchenko wrote:
> The fwnode.h is not supposed to be used by the drivers as it
> has the definitions for the core parts for different device
> property provider implementations. Drop it.
>
> Since the code wants to use the pointer to the struct fwnode_handle
> the forward declaration is provided.
>
> [...]
Applied, thanks!
[1/1] coresight: cti: Replace inclusion by struct fwnode_handle forward declaration
https://git.kernel.org/coresight/c/aad548a9
Best regards,
--
Suzuki K Poulose <suzuki.poulose(a)arm.com>
On 5/7/25 23:43, Yabin Cui wrote:
> perf always allocates contiguous AUX pages based on aux_watermark.
> However, this contiguous allocation doesn't benefit all PMUs. For
> instance, ARM SPE and TRBE operate with virtual pages, and Coresight
> ETR allocates a separate buffer. For these PMUs, allocating contiguous
> AUX pages unnecessarily exacerbates memory fragmentation. This
> fragmentation can prevent their use on long-running devices.
>
> This patch modifies the perf driver to be memory-friendly by default,
> by allocating non-contiguous AUX pages. For PMUs requiring contiguous
> pages (Intel BTS and some Intel PT), the existing
> PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
> require but can benefit from contiguous pages (some Intel PT), a new
> capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
> existing behavior.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> ---
> Changes since v3:
> Add comments and a local variable to explain max_order value
> changes in rb_alloc_aux().
>
> Changes since v2:
> Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags.
> Then the only place needing PREFER_LARGE is intel/pt.c.
>
> Changes since v1:
> In v1, default is preferring contiguous pages, and add a flag to
> allocate non-contiguous pages. In v2, default is allocating
> non-contiguous pages, and add a flag to prefer contiguous pages.
>
> v1 patchset:
> perf,coresight: Reduce fragmentation with non-contiguous AUX pages for
> cs_etm
>
> arch/x86/events/intel/pt.c | 2 ++
> include/linux/perf_event.h | 1 +
> kernel/events/ring_buffer.c | 33 ++++++++++++++++++++++++---------
> 3 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index fa37565f6418..25ead919fc48 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1863,6 +1863,8 @@ static __init int pt_init(void)
>
> if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
> pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;
> + else
> + pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE;
>
> pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE |
> PERF_PMU_CAP_ITRACE |
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..56d77348c511 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
>
> /**
> * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..69c90ea1b79a 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -679,7 +679,19 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> {
> bool overwrite = !(flags & RING_BUFFER_WRITABLE);
> int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
> - int ret = -ENOMEM, max_order;
> + /*
> + * True if the PMU needs a contiguous AUX buffer (CAP_AUX_NO_SG) or
> + * prefers large contiguous pages (CAP_AUX_PREFER_LARGE).
> + */
> + bool use_contiguous_pages = event->pmu->capabilities & (
> + PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_AUX_PREFER_LARGE);
> + /*
> + * Initialize max_order to 0 for page allocation. This allocates single
> + * pages to minimize memory fragmentation. This is overriden if
Small nit typo -- s/overriden/overridden ^^^^
> + * use_contiguous_pages is true.
> + */
> + int max_order = 0;
> + int ret = -ENOMEM;
>
> if (!has_aux(event))
> return -EOPNOTSUPP;
> @@ -689,8 +701,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>
> if (!overwrite) {
> /*
> - * Watermark defaults to half the buffer, and so does the
> - * max_order, to aid PMU drivers in double buffering.
> + * Watermark defaults to half the buffer, to aid PMU drivers
> + * in double buffering.
> */
> if (!watermark)
> watermark = min_t(unsigned long,
> @@ -698,16 +710,19 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> (unsigned long)nr_pages << (PAGE_SHIFT - 1));
>
> /*
> - * Use aux_watermark as the basis for chunking to
> - * help PMU drivers honor the watermark.
> + * If using contiguous pages, use aux_watermark as the basis
> + * for chunking to help PMU drivers honor the watermark.
> */
> - max_order = get_order(watermark);
> + if (use_contiguous_pages)
> + max_order = get_order(watermark);
> } else {
> /*
> - * We need to start with the max_order that fits in nr_pages,
> - * not the other way around, hence ilog2() and not get_order.
> + * If using contiguous pages, we need to start with the
> + * max_order that fits in nr_pages, not the other way around,
> + * hence ilog2() and not get_order.
> */
> - max_order = ilog2(nr_pages);
> + if (use_contiguous_pages)
> + max_order = ilog2(nr_pages);
> watermark = 0;
> }
>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>