CoreSight May 2025

coresight@lists.linaro.org

8 participants
38 discussions

[PATCH v4 0/2] coresight: Add Coresight Trace Network On Chip driver

by Yuanfang Zhang

The Trace Network On Chip (TNOC) is an integration hierarchy which is a hardware component that integrates the functionalities of TPDA and funnels. It collects trace form subsystems and transfers to coresight sink. Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com> --- Changes in v4: - Fix dt_binding warning. - update mask of trace_noc amba_id. - Modify driver comments. - rename TRACE_NOC_SYN_VAL to TRACE_NOC_SYNC_INTERVAL. - Link to v3: https://lore.kernel.org/r/20250411-trace-noc-v3-0-1f19ddf7699b@quicinc.com Changes in v3: - Remove unnecessary sysfs nodes. - update commit messages. - Use 'writel' instead of 'write_relaxed' when writing to the register for the last time. - Add trace_id ops. - Link to v2: https://lore.kernel.org/r/20250226-trace-noc-driver-v2-0-8afc6584afc5@quici… Changes in v2: - Modified the format of DT binging file. - Fix compile warnings. - Link to v1: https://lore.kernel.org/r/46643089-b88d-49dc-be05-7bf0bb21f847@quicinc.com --- Yuanfang Zhang (2): dt-bindings: arm: Add device Trace Network On Chip definition coresight: add coresight Trace Network On Chip driver .../bindings/arm/qcom,coresight-tnoc.yaml | 111 ++++++++++++ drivers/hwtracing/coresight/Kconfig | 13 ++ drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-tnoc.c | 191 +++++++++++++++++++++ drivers/hwtracing/coresight/coresight-tnoc.h | 34 ++++ 5 files changed, 350 insertions(+) --- base-commit: a2cc6ff5ec8f91bc463fd3b0c26b61166a07eb11 change-id: 20250403-trace-noc-f8286b30408e Best regards, -- Yuanfang Zhang <quic_yuanfang(a)quicinc.com>

5 months, 2 weeks

Re: [PATCH v2 1/2] dt-bindings: arm: Add Qualcomm extended CTI

by Jinlong Mao

On 2025/5/2 14:43, Krzysztof Kozlowski wrote: > On Tue, Apr 29, 2025 at 12:18:40AM GMT, Mao Jinlong wrote: >> Add Qualcomm extended CTI support in CTI binding file. Qualcomm >> extended CTI supports up to 128 triggers. >> >> Signed-off-by: Mao Jinlong <quic_jinlmao(a)quicinc.com> >> --- >> Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> index 2d5545a2b49c..1aa27461f5bc 100644 >> --- a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> +++ b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> @@ -84,7 +84,9 @@ properties: >> - const: arm,coresight-cti >> - const: arm,primecell >> - items: >> - - const: arm,coresight-cti-v8-arch >> + - enum: >> + - arm,coresight-cti-v8-arch >> + - qcom,coresight-cti-extended > > cpu phandle is not required? Or not even valid? cpu phandle is not required for exteneded CTI. > > Best regards, > Krzysztof >

5 months, 2 weeks

Re: [PATCH v3] perf: Allocate non-contiguous AUX pages by default

by Anshuman Khandual

On 5/2/25 23:00, Yabin Cui wrote: > On Fri, May 2, 2025 at 3:51 AM Anshuman Khandual > <anshuman.khandual(a)arm.com> wrote: >> >> On 5/2/25 01:05, Yabin Cui wrote: >>> perf always allocates contiguous AUX pages based on aux_watermark. >>> However, this contiguous allocation doesn't benefit all PMUs. For >>> instance, ARM SPE and TRBE operate with virtual pages, and Coresight >>> ETR allocates a separate buffer. For these PMUs, allocating contiguous >>> AUX pages unnecessarily exacerbates memory fragmentation. This >>> fragmentation can prevent their use on long-running devices. >>> >>> This patch modifies the perf driver to be memory-friendly by default, >>> by allocating non-contiguous AUX pages. For PMUs requiring contiguous >>> pages (Intel BTS and some Intel PT), the existing >>> PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't >>> require but can benefit from contiguous pages (some Intel PT), a new >>> capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their >>> existing behavior. >>> >>> Signed-off-by: Yabin Cui <yabinc(a)google.com> >>> --- >>> Changes since v2: >>> Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags. >>> Then the only place needing PREFER_LARGE is intel/pt.c. >>> >>> Changes since v1: >>> In v1, default is preferring contiguous pages, and add a flag to >>> allocate non-contiguous pages. In v2, default is allocating >>> non-contiguous pages, and add a flag to prefer contiguous pages. >>> >>> v1 patchset: >>> perf,coresight: Reduce fragmentation with non-contiguous AUX pages for >>> cs_etm >>> >>> arch/x86/events/intel/pt.c | 2 ++ >>> include/linux/perf_event.h | 1 + >>> kernel/events/ring_buffer.c | 20 +++++++++++++------- >>> 3 files changed, 16 insertions(+), 7 deletions(-) >>> >>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c >>> index fa37565f6418..25ead919fc48 100644 >>> --- a/arch/x86/events/intel/pt.c >>> +++ b/arch/x86/events/intel/pt.c >>> @@ -1863,6 +1863,8 @@ static __init int pt_init(void) >>> >>> if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) >>> pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG; >>> + else >>> + pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE; >>> >>> pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | >>> PERF_PMU_CAP_ITRACE | >> >> Why this PMU has PERF_PMU_CAP_AUX_PREFER_LARGE fallback option but >> not the other PMU in arch/x86/events/intel/bts.c even though both >> had PERF_PMU_CAP_AUX_NO_SG ? > > Because Intel BTS always use NO_SG, while in some cases Intel PT > doesn't use NO_SG. Makes sense. >> >>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h >>> index 0069ba6866a4..56d77348c511 100644 >>> --- a/include/linux/perf_event.h >>> +++ b/include/linux/perf_event.h >>> @@ -301,6 +301,7 @@ struct perf_event_pmu_context; >>> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 >>> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 >>> #define PERF_PMU_CAP_AUX_PAUSE 0x0200 >>> +#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 >>> >>> /** >>> * pmu::scope >>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c >>> index 5130b119d0ae..4d2f1c95673e 100644 >>> --- a/kernel/events/ring_buffer.c >>> +++ b/kernel/events/ring_buffer.c >>> @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, >>> { >>> bool overwrite = !(flags & RING_BUFFER_WRITABLE); >>> int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu); >>> - int ret = -ENOMEM, max_order; >>> + int ret = -ENOMEM, max_order = 0; >> >> 0 order is now the default allocation granularity. This might benefit >> from a comment above explaining that max_order could change only with >> PERF_PMU_CAP_AUX_NO_SG or PERF_PMU_CAP_AUX_PREFER_LARGE PMU flags etc. >> > Will add the comment in the next respin. >>> >>> if (!has_aux(event)) >>> return -EOPNOTSUPP; >>> @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, >>> >>> if (!overwrite) { >>> /* >>> - * Watermark defaults to half the buffer, and so does the >>> - * max_order, to aid PMU drivers in double buffering. >>> + * Watermark defaults to half the buffer, to aid PMU drivers >>> + * in double buffering. >>> */ >>> if (!watermark) >>> watermark = min_t(unsigned long, >>> @@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, >>> (unsigned long)nr_pages << (PAGE_SHIFT - 1)); >>> >>> /* >>> - * Use aux_watermark as the basis for chunking to >>> + * For PMUs that need or prefer large contiguous buffers, >>> + * use aux_watermark as the basis for chunking to >>> * help PMU drivers honor the watermark. >>> */ >>> - max_order = get_order(watermark); >>> + if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG | >>> + PERF_PMU_CAP_AUX_PREFER_LARGE)) >>> + max_order = get_order(watermark); >>> } else { >>> /* >>> - * We need to start with the max_order that fits in nr_pages, >>> + * For PMUs that need or prefer large contiguous buffers, >>> + * we need to start with the max_order that fits in nr_pages, >>> * not the other way around, hence ilog2() and not get_order. >>> */ >>> - max_order = ilog2(nr_pages); >>> + if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG | >>> + PERF_PMU_CAP_AUX_PREFER_LARGE)) >>> + max_order = ilog2(nr_pages); >>> watermark = 0; >>> } >>> >> >> Although not really sure, could event->pmu->capabilities check against the ORed >> PMU flags PERF_PMU_CAP_AUX_NO_SG and PERF_PMU_CAP_AUX_PREFER_LARGE be contained >> in a helper pmu_prefers_cont_alloc(struct *pmu ...) or something similar ? > > Sure, but I feel it's not very worthwhile. Maybe add a local variable > use_contiguous_pages? It can also work as another comment near > max_order. Probably that will be better.

5 months, 2 weeks

[PATCH] coresight: Disable MMIO logging for coresight stm driver

by Mao Jinlong

When read/write registers with readl_relaxed and writel_relaxed, log_read_mmio and log_write_mmio will be called. If mmio trace is enabled to STM, STM driver will write the register to send the trace and writel_relaxed will be called again. The circular call like callstack below will happen. Disable mmio logging for stm driver to avoid this issue. [] stm_source_write[stm_core]+0xc4 [] stm_ftrace_write[stm_ftrace]+0x40 [] trace_event_buffer_commit+0x238 [] trace_event_raw_event_rwmmio_rw_template+0x8c [] log_post_write_mmio+0xb4 [] writel_relaxed[coresight_stm]+0x80 [] stm_generic_packet[coresight_stm]+0x1a8 [] stm_data_write[stm_core]+0x78 [] ost_write[stm_p_ost]+0xc8 [] stm_source_write[stm_core]+0x7c [] stm_ftrace_write[stm_ftrace]+0x40 [] trace_event_buffer_commit+0x238 [] trace_event_raw_event_rwmmio_read+0x84 [] log_read_mmio+0xac [] readl_relaxed[coresight_tmc]+0x50 Signed-off-by: Mao Jinlong <quic_jinlmao(a)quicinc.com> --- drivers/hwtracing/coresight/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index 4ba478211b31..f3158266f75e 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -22,6 +22,8 @@ condflags := \ $(call cc-option, -Wstringop-truncation) subdir-ccflags-y += $(condflags) +CFLAGS_coresight-stm.o := -D__DISABLE_TRACE_MMIO__ + obj-$(CONFIG_CORESIGHT) += coresight.o coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \ coresight-sysfs.o coresight-syscfg.o coresight-config.o \ -- 2.25.1

5 months, 2 weeks

Re: [PATCH v3] perf: Allocate non-contiguous AUX pages by default

by Anshuman Khandual

On 5/2/25 01:05, Yabin Cui wrote: > perf always allocates contiguous AUX pages based on aux_watermark. > However, this contiguous allocation doesn't benefit all PMUs. For > instance, ARM SPE and TRBE operate with virtual pages, and Coresight > ETR allocates a separate buffer. For these PMUs, allocating contiguous > AUX pages unnecessarily exacerbates memory fragmentation. This > fragmentation can prevent their use on long-running devices. > > This patch modifies the perf driver to be memory-friendly by default, > by allocating non-contiguous AUX pages. For PMUs requiring contiguous > pages (Intel BTS and some Intel PT), the existing > PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't > require but can benefit from contiguous pages (some Intel PT), a new > capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their > existing behavior. > > Signed-off-by: Yabin Cui <yabinc(a)google.com> > --- > Changes since v2: > Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags. > Then the only place needing PREFER_LARGE is intel/pt.c. > > Changes since v1: > In v1, default is preferring contiguous pages, and add a flag to > allocate non-contiguous pages. In v2, default is allocating > non-contiguous pages, and add a flag to prefer contiguous pages. > > v1 patchset: > perf,coresight: Reduce fragmentation with non-contiguous AUX pages for > cs_etm > > arch/x86/events/intel/pt.c | 2 ++ > include/linux/perf_event.h | 1 + > kernel/events/ring_buffer.c | 20 +++++++++++++------- > 3 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index fa37565f6418..25ead919fc48 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1863,6 +1863,8 @@ static __init int pt_init(void) > > if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) > pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG; > + else > + pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE; > > pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | > PERF_PMU_CAP_ITRACE | Why this PMU has PERF_PMU_CAP_AUX_PREFER_LARGE fallback option but not the other PMU in arch/x86/events/intel/bts.c even though both had PERF_PMU_CAP_AUX_NO_SG ? > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 0069ba6866a4..56d77348c511 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -301,6 +301,7 @@ struct perf_event_pmu_context; > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 > #define PERF_PMU_CAP_AUX_PAUSE 0x0200 > +#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 > > /** > * pmu::scope > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 5130b119d0ae..4d2f1c95673e 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > { > bool overwrite = !(flags & RING_BUFFER_WRITABLE); > int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu); > - int ret = -ENOMEM, max_order; > + int ret = -ENOMEM, max_order = 0; 0 order is now the default allocation granularity. This might benefit from a comment above explaining that max_order could change only with PERF_PMU_CAP_AUX_NO_SG or PERF_PMU_CAP_AUX_PREFER_LARGE PMU flags etc. > > if (!has_aux(event)) > return -EOPNOTSUPP; > @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > > if (!overwrite) { > /* > - * Watermark defaults to half the buffer, and so does the > - * max_order, to aid PMU drivers in double buffering. > + * Watermark defaults to half the buffer, to aid PMU drivers > + * in double buffering. > */ > if (!watermark) > watermark = min_t(unsigned long, > @@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > (unsigned long)nr_pages << (PAGE_SHIFT - 1)); > > /* > - * Use aux_watermark as the basis for chunking to > + * For PMUs that need or prefer large contiguous buffers, > + * use aux_watermark as the basis for chunking to > * help PMU drivers honor the watermark. > */ > - max_order = get_order(watermark); > + if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG | > + PERF_PMU_CAP_AUX_PREFER_LARGE)) > + max_order = get_order(watermark); > } else { > /* > - * We need to start with the max_order that fits in nr_pages, > + * For PMUs that need or prefer large contiguous buffers, > + * we need to start with the max_order that fits in nr_pages, > * not the other way around, hence ilog2() and not get_order. > */ > - max_order = ilog2(nr_pages); > + if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG | > + PERF_PMU_CAP_AUX_PREFER_LARGE)) > + max_order = ilog2(nr_pages); > watermark = 0; > } > Although not really sure, could event->pmu->capabilities check against the ORed PMU flags PERF_PMU_CAP_AUX_NO_SG and PERF_PMU_CAP_AUX_PREFER_LARGE be contained in a helper pmu_prefers_cont_alloc(struct *pmu ...) or something similar ?

5 months, 2 weeks

[PATCH] coresight: etm4x: Remove redundant claim register setting

by Leo Yan

The claim register is set twice in the restore flow; remove the duplicate operation. Signed-off-by: Leo Yan <leo.yan(a)arm.com> --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 5c20ed4cf4ed..228317991ec2 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -1958,8 +1958,6 @@ static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata) if (drvdata->numvmidc > 4) etm4x_relaxed_write32(csa, state->trcvmidcctlr0, TRCVMIDCCTLR1); - etm4x_relaxed_write32(csa, state->trcclaimset, TRCCLAIMSET); - if (!drvdata->skip_power_up) etm4x_relaxed_write32(csa, state->trcpdcr, TRCPDCR); -- 2.34.1

5 months, 2 weeks

[PATCH v4] coresight: Add a KUnit test for coresight_find_default_sink()

by James Clark

Add a test to confirm that default sink selection skips over an ETF and returns an ETR even if it's further away. This also makes it easier to add new unit tests in the future. Reviewed-by: Leo Yan <leo.yan(a)arm.com> Signed-off-by: James Clark <james.clark(a)linaro.org> --- Changes in v4: - Rename etm to src now that it's not CORESIGHT_DEV_SUBTYPE_SOURCE_PROC - Remove the now empty src_ops too - Fix a rebase mistake in the Makefile that removed CTCU - Link to v3: https://lore.kernel.org/r/20250312-james-cs-kunit-test-v3-1-dcfb69730161@li… Changes in v3: - Use CORESIGHT_DEV_SUBTYPE_SOURCE_BUS type instead of the default (CORESIGHT_DEV_SUBTYPE_SOURCE_PROC) so that the test still works even when TRBE sinks are registered. This also removes the need for the fake CPU ID callback. - Link to v2: https://lore.kernel.org/r/20250305-james-cs-kunit-test-v2-1-83ba682b976c@li… Changes in v2: - Let devm free everything rather than doing individual kfrees: "Like with managed drivers, KUnit-managed fake devices are automatically cleaned up when the test finishes, but can be manually cleaned up early with kunit_device_unregister()." - Link to v1: https://lore.kernel.org/r/20250225164639.522741-1-james.clark@linaro.org --- drivers/hwtracing/coresight/Kconfig | 9 +++ drivers/hwtracing/coresight/Makefile | 1 + drivers/hwtracing/coresight/coresight-core.c | 1 + .../hwtracing/coresight/coresight-kunit-tests.c | 74 ++++++++++++++++++++++ 4 files changed, 85 insertions(+) diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig index ecd7086a5b83..f064e3d172b3 100644 --- a/drivers/hwtracing/coresight/Kconfig +++ b/drivers/hwtracing/coresight/Kconfig @@ -259,4 +259,13 @@ config CORESIGHT_DUMMY To compile this driver as a module, choose M here: the module will be called coresight-dummy. + +config CORESIGHT_KUNIT_TESTS + tristate "Enable Coresight unit tests" + depends on KUNIT + default KUNIT_ALL_TESTS + help + Enable Coresight unit tests. Only useful for development and not + intended for production. + endif diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile index 8e62c3150aeb..4e6ea5b05b01 100644 --- a/drivers/hwtracing/coresight/Makefile +++ b/drivers/hwtracing/coresight/Makefile @@ -53,3 +53,4 @@ obj-$(CONFIG_ULTRASOC_SMB) += ultrasoc-smb.o obj-$(CONFIG_CORESIGHT_DUMMY) += coresight-dummy.o obj-$(CONFIG_CORESIGHT_CTCU) += coresight-ctcu.o coresight-ctcu-y := coresight-ctcu-core.o +obj-$(CONFIG_CORESIGHT_KUNIT_TESTS) += coresight-kunit-tests.o diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index fb43ef6a3b1f..47af75ba7a00 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -959,6 +959,7 @@ coresight_find_default_sink(struct coresight_device *csdev) } return csdev->def_sink; } +EXPORT_SYMBOL_GPL(coresight_find_default_sink); static int coresight_remove_sink_ref(struct device *dev, void *data) { diff --git a/drivers/hwtracing/coresight/coresight-kunit-tests.c b/drivers/hwtracing/coresight/coresight-kunit-tests.c new file mode 100644 index 000000000000..c8f361767c45 --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-kunit-tests.c @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <kunit/test.h> +#include <kunit/device.h> +#include <linux/coresight.h> + +#include "coresight-priv.h" + +static struct coresight_device *coresight_test_device(struct device *dev) +{ + struct coresight_device *csdev = devm_kcalloc(dev, 1, + sizeof(struct coresight_device), + GFP_KERNEL); + csdev->pdata = devm_kcalloc(dev, 1, + sizeof(struct coresight_platform_data), + GFP_KERNEL); + return csdev; +} + +static void test_default_sink(struct kunit *test) +{ + /* + * Source -> ETF -> ETR -> CATU + * ^ + * | default + */ + struct device *dev = kunit_device_register(test, "coresight_kunit"); + struct coresight_device *src = coresight_test_device(dev), + *etf = coresight_test_device(dev), + *etr = coresight_test_device(dev), + *catu = coresight_test_device(dev); + struct coresight_connection conn = {}; + + src->type = CORESIGHT_DEV_TYPE_SOURCE; + /* + * Don't use CORESIGHT_DEV_SUBTYPE_SOURCE_PROC, that would always return + * a TRBE sink if one is registered. + */ + src->subtype.source_subtype = CORESIGHT_DEV_SUBTYPE_SOURCE_BUS; + etf->type = CORESIGHT_DEV_TYPE_LINKSINK; + etf->subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER; + etr->type = CORESIGHT_DEV_TYPE_SINK; + etr->subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM; + catu->type = CORESIGHT_DEV_TYPE_HELPER; + + conn.src_dev = src; + conn.dest_dev = etf; + coresight_add_out_conn(dev, src->pdata, &conn); + + conn.src_dev = etf; + conn.dest_dev = etr; + coresight_add_out_conn(dev, etf->pdata, &conn); + + conn.src_dev = etr; + conn.dest_dev = catu; + coresight_add_out_conn(dev, etr->pdata, &conn); + + KUNIT_ASSERT_PTR_EQ(test, coresight_find_default_sink(src), etr); +} + +static struct kunit_case coresight_testcases[] = { + KUNIT_CASE(test_default_sink), + {} +}; + +static struct kunit_suite coresight_test_suite = { + .name = "coresight_test_suite", + .test_cases = coresight_testcases, +}; + +kunit_test_suites(&coresight_test_suite); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("James Clark <james.clark(a)linaro.org>"); +MODULE_DESCRIPTION("Arm CoreSight KUnit tests"); --- base-commit: 3eadce8308bc8d808fd9e3a9d211c84215087451 change-id: 20250305-james-cs-kunit-test-3af1df2401e6 Best regards, -- James Clark <james.clark(a)linaro.org>

5 months, 2 weeks

Re: [PATCH v2] perf: Allocate non-contiguous AUX pages by default

by James Clark

On 29/04/2025 10:31 pm, Yabin Cui wrote: > perf always allocates contiguous AUX pages based on aux_watermark. > However, this contiguous allocation doesn't benefit all PMUs. For > instance, ARM SPE and TRBE operate with virtual pages, and Coresight > ETR allocates a separate buffer. For these PMUs, allocating contiguous > AUX pages unnecessarily exacerbates memory fragmentation. This > fragmentation can prevent their use on long-running devices. > > This patch modifies the perf driver to allocate non-contiguous AUX > pages by default. For PMUs that can benefit from contiguous pages ( > Intel PT and BTS), a new PMU capability, PERF_PMU_CAP_AUX_PREFER_LARGE > is introduced to maintain their existing behavior. > > Signed-off-by: Yabin Cui <yabinc(a)google.com> > --- > Changes since v1: > In v1, default is preferring contiguous pages, and add a flag to > allocate non-contiguous pages. In v2, default is allocating > non-contiguous pages, and add a flag to prefer contiguous pages. > > v1 patchset: > perf,coresight: Reduce fragmentation with non-contiguous AUX pages for > cs_etm > > arch/x86/events/intel/bts.c | 3 ++- > arch/x86/events/intel/pt.c | 3 ++- > include/linux/perf_event.h | 1 + > kernel/events/ring_buffer.c | 18 +++++++++++------- > 4 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c > index a95e6c91c4d7..9129f00e4b9f 100644 > --- a/arch/x86/events/intel/bts.c > +++ b/arch/x86/events/intel/bts.c > @@ -625,7 +625,8 @@ static __init int bts_init(void) > return -ENOMEM; > > bts_pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE | > - PERF_PMU_CAP_EXCLUSIVE; > + PERF_PMU_CAP_EXCLUSIVE | > + PERF_PMU_CAP_AUX_PREFER_LARGE; > bts_pmu.task_ctx_nr = perf_sw_context; > bts_pmu.event_init = bts_event_init; > bts_pmu.add = bts_event_add; > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index fa37565f6418..37179e813b8c 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1866,7 +1866,8 @@ static __init int pt_init(void) > > pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | > PERF_PMU_CAP_ITRACE | > - PERF_PMU_CAP_AUX_PAUSE; > + PERF_PMU_CAP_AUX_PAUSE | > + PERF_PMU_CAP_AUX_PREFER_LARGE; > pt_pmu.pmu.attr_groups = pt_attr_groups; > pt_pmu.pmu.task_ctx_nr = perf_sw_context; > pt_pmu.pmu.event_init = pt_event_init; > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 0069ba6866a4..56d77348c511 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -301,6 +301,7 @@ struct perf_event_pmu_context; > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 > #define PERF_PMU_CAP_AUX_PAUSE 0x0200 > +#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 > > /** > * pmu::scope > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 5130b119d0ae..d76249ce4f17 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > { > bool overwrite = !(flags & RING_BUFFER_WRITABLE); > int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu); > - int ret = -ENOMEM, max_order; > + int ret = -ENOMEM, max_order = 0; > > if (!has_aux(event)) > return -EOPNOTSUPP; > @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > > if (!overwrite) { > /* > - * Watermark defaults to half the buffer, and so does the > - * max_order, to aid PMU drivers in double buffering. > + * Watermark defaults to half the buffer, to aid PMU drivers > + * in double buffering. > */ > if (!watermark) > watermark = min_t(unsigned long, > @@ -698,16 +698,20 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > (unsigned long)nr_pages << (PAGE_SHIFT - 1)); > > /* > - * Use aux_watermark as the basis for chunking to > + * For PMUs that prefer large contiguous buffers, > + * use aux_watermark as the basis for chunking to > * help PMU drivers honor the watermark. > */ > - max_order = get_order(watermark); > + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_PREFER_LARGE) > + max_order = get_order(watermark); > } else { > /* > - * We need to start with the max_order that fits in nr_pages, > + * For PMUs that prefer large contiguous buffers, > + * we need to start with the max_order that fits in nr_pages, > * not the other way around, hence ilog2() and not get_order. > */ > - max_order = ilog2(nr_pages); > + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_PREFER_LARGE) > + max_order = ilog2(nr_pages); Doesn't this one need to be 'PERF_PMU_CAP_AUX_PREFER_LARGE | PERF_PMU_CAP_AUX_NO_SG', otherwise the NO_SG test further down doesn't work for devices that only have NO_SG and not PREFER_LARGE. NO_SG implies PREFER_LARGE behavior, except that NO_SG additionally hard fails if it can't do it in one alloc. But I think you shouldn't have to set them both to get the correct behavior.

5 months, 2 weeks

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

CoreSight May 2025