CoreSight

coresight@lists.linaro.org

4 participants
2693 discussions

Re: [PATCH v1 1/2] dt-bindings: arm: Add Qualcomm extended CTI

by Jinlong Mao

On 2025/4/28 15:39, Krzysztof Kozlowski wrote: > On 28/04/2025 09:31, Jinlong Mao wrote: >> >> >> On 2024/9/3 20:42, Krzysztof Kozlowski wrote: >>> On 03/09/2024 14:18, Mao Jinlong wrote: >>>> Add Qualcomm extended CTI support in CTI binding file. Qualcomm >>>> extended CTI supports up to 128 triggers. >>>> >>>> Signed-off-by: Mao Jinlong <quic_jinlmao(a)quicinc.com> >>>> --- >>>> .../devicetree/bindings/arm/arm,coresight-cti.yaml | 14 ++++++++++++++ >>>> 1 file changed, 14 insertions(+) >>>> >>>> diff --git a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >>>> index 6a73eaa66a42..141efba7c697 100644 >>>> --- a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >>>> +++ b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >>>> @@ -87,6 +87,10 @@ properties: >>>> - const: arm,coresight-cti-v8-arch >>>> - const: arm,coresight-cti >>>> - const: arm,primecell >>>> + - items: >>>> + - const: qcom,coresight-cti-extended >>> >>> That's just enum in previous entry/list. >> Sorry for the late response. This is a new CTI type. Need the three >> items in compatible at the same time, just like other kind of CTIs. > > > Comment stays valid, you did not address it at all. Hi Krzysztof, Do you mean we only need const: qcom,coresight-cti-extended here ? No need const: arm,coresight-cti and const: arm,primecell as they are in previous entry/list, right ? > >> >>> >>>> + - const: arm,coresight-cti >>>> + - const: arm,primecell >>>> >>>> reg: >>>> maxItems: 1 >>>> @@ -254,6 +258,16 @@ examples: >>>> clocks = <&soc_smc50mhz>; >>>> clock-names = "apb_pclk"; >>>> }; >>>> + # minimum extended CTI definition. >>>> + - | >>> >>> No need for new example. No differences here. >> This is a new type CTI. > > > Comment stays valid, you did not address it at all. > > Best regards, > Krzysztof

11 months, 3 weeks

Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability

by James Clark

On 23/04/2025 8:52 pm, Yabin Cui wrote: > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan(a)arm.com> wrote: >> >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: >> >> [...] >> >>>> Hi Yabin, >>>> >>>> I was wondering if this is just the opposite of >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default >>>> for all devices to solve the issue you describe. Because we already >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order >>>> allocations for AUX buffers optimistically") that explains that the >>>> current allocation strategy is an optimization. >>>> >>>> Your change seems to decide that for certain devices we want to >>>> optimize for fragmentation rather than performance. If these are >>>> rarely used features specifically when looking at performance should >>>> we not continue to optimize for performance? Or at least make it user >>>> configurable? >>> >>> So there seems to be 3 categories: >>> >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) >>> >>> - 2) Would be nice to have continguous AUX buffers, for a bit more >>> performance. >>> >>> - 3) Doesn't really care. >>> >>> So we do have #1, and it appears Yabin's usecase is #3? > > Yes, in my usecase, I care much more about MM-friendly than a little potential > performance when using PMU. It's not a rarely used feature. On Android, we > collect ETM data periodically on internal user devices for AutoFDO optimization > (for both userspace libraries and the kernel). Allocating a large > chunk of contiguous > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may > need to kill many processes to fulfill the request. It affects user > experience even > after using PMU. > > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to > sacrifice performance for MM-friendly, why support scatter gather mode? If there > are strong performance reasons to allocate contiguous AUX pages in > scatter gather > mode, I hope max_order is configurable in userspace. > > Currently, max_order is affected by aux_watermark. But aux_watermark > also affects > how frequently the PMU overflows AUX buffer and notifies userspace. > It's not ideal > to set aux_watermark to 1 page size. So if we want to make max_order user > configurable, maybe we can add a one bit field in perf_event_attr? > >> >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware >> trace data is copied by a driver from low level's contiguous buffer to >> the AUX buffer. >> >> In this case we cannot benefit much from continguous AUX buffers. >> >> Thanks, >> Leo Hi Yabin, So after doing some testing it looks like there is 0 difference in overhead for max_order=0 vs ensuring the buffer is one contiguous allocation for Arm SPE, and TRBE would be exactly the same. This makes sense because we're vmapping pages individually anyway regardless of the base allocation. Seems like the performance optimization of the optimistically large mappings is only for devices that require extra buffer management stuff other than normal virtual memory. Can we add a new capability PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the old (before the optimistic large allocs change) max_order=0 behavior becomes the default again, and PREFER_LARGE is just for those two devices. Other and new devices would get the more memory friendly allocations by default, as it's unlikely they'll benefit from anything different. Thanks James

11 months, 3 weeks

Re: [PATCH v3 06/11] dt-bindings: arm: arm,coresight-static-replicator: add optional clocks

by Suzuki K Poulose

Hi Dmitry On 25/04/2025 18:47, Dmitry Baryshkov wrote: > As most other CoreSight devices the replicator can use either of the > optional clocks. Document those optional clocks in the schema. > Additionally document the one-off case of Zynq-7000 platforms which uses > apb_pclk and two additional debug clocks. What is the plan for merging this series ? I am happy to pick this patch via coresight tree. > > Fixes: 3c15fddf3121 ("dt-bindings: arm: Convert CoreSight bindings to DT schema") > Reviewed-by: Rob Herring (Arm) <robh(a)kernel.org> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)oss.qualcomm.com> Otherwise, Acked-by: Suzuki K Poulose <suzuki.poulose(a)arm.com> Suzuki > --- > .../bindings/arm/arm,coresight-static-replicator.yaml | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/Documentation/devicetree/bindings/arm/arm,coresight-static-replicator.yaml b/Documentation/devicetree/bindings/arm/arm,coresight-static-replicator.yaml > index a6f793ea03b6c193fc0ff72a45e0249a63a2ba3c..0c1017affbad2f03892b250ad864d9a5dc8d02f1 100644 > --- a/Documentation/devicetree/bindings/arm/arm,coresight-static-replicator.yaml > +++ b/Documentation/devicetree/bindings/arm/arm,coresight-static-replicator.yaml > @@ -30,6 +30,19 @@ properties: > power-domains: > maxItems: 1 > > + clocks: > + minItems: 1 > + maxItems: 3 > + > + clock-names: > + oneOf: > + - items: > + - enum: [apb_pclk, atclk] > + - items: # Zynq-700 > + - const: apb_pclk > + - const: dbg_trc > + - const: dbg_apb > + > in-ports: > $ref: /schemas/graph.yaml#/properties/ports > additionalProperties: false >

11 months, 3 weeks

Re: [PATCH v1 1/2] dt-bindings: arm: Add Qualcomm extended CTI

by Jinlong Mao

On 2024/9/3 20:42, Krzysztof Kozlowski wrote: > On 03/09/2024 14:18, Mao Jinlong wrote: >> Add Qualcomm extended CTI support in CTI binding file. Qualcomm >> extended CTI supports up to 128 triggers. >> >> Signed-off-by: Mao Jinlong <quic_jinlmao(a)quicinc.com> >> --- >> .../devicetree/bindings/arm/arm,coresight-cti.yaml | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> index 6a73eaa66a42..141efba7c697 100644 >> --- a/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> +++ b/Documentation/devicetree/bindings/arm/arm,coresight-cti.yaml >> @@ -87,6 +87,10 @@ properties: >> - const: arm,coresight-cti-v8-arch >> - const: arm,coresight-cti >> - const: arm,primecell >> + - items: >> + - const: qcom,coresight-cti-extended > > That's just enum in previous entry/list. Sorry for the late response. This is a new CTI type. Need the three items in compatible at the same time, just like other kind of CTIs. > >> + - const: arm,coresight-cti >> + - const: arm,primecell >> >> reg: >> maxItems: 1 >> @@ -254,6 +258,16 @@ examples: >> clocks = <&soc_smc50mhz>; >> clock-names = "apb_pclk"; >> }; >> + # minimum extended CTI definition. >> + - | > > No need for new example. No differences here. This is a new type CTI. > > > Best regards, > Krzysztof >

11 months, 3 weeks

Re: [PATCH v3 2/2] coresight: core: Disable helpers for devices that fail to enable

by James Clark

On 08/04/2025 8:59 pm, Yabin Cui wrote: > When enabling a SINK or LINK type coresight device fails, the > associated helpers should be disabled. > > Signed-off-by: Yabin Cui <yabinc(a)google.com> > Suggested-by: Suzuki K Poulose <suzuki.poulose(a)arm.com> > --- > drivers/hwtracing/coresight/coresight-core.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c > index fb43ef6a3b1f..a56ba9087538 100644 > --- a/drivers/hwtracing/coresight/coresight-core.c > +++ b/drivers/hwtracing/coresight/coresight-core.c > @@ -486,8 +486,10 @@ int coresight_enable_path(struct coresight_path *path, enum cs_mode mode, > * that need disabling. Disabling the path here > * would mean we could disrupt an existing session. > */ > - if (ret) > + if (ret) { > + coresight_disable_helpers(csdev); Hi Yabin, Unfortunately coresight_disable_helpers() takes a path pointer now so this needs to be updated. I tested with that change made and it works ok. > goto out; > + } > break; > case CORESIGHT_DEV_TYPE_SOURCE: > /* sources are enabled from either sysFS or Perf */ > @@ -496,10 +498,13 @@ int coresight_enable_path(struct coresight_path *path, enum cs_mode mode, > parent = list_prev_entry(nd, link)->csdev; > child = list_next_entry(nd, link)->csdev; > ret = coresight_enable_link(csdev, parent, child, source); > - if (ret) > + if (ret) { > + coresight_disable_helpers(csdev); > goto err; > + } > break; > default: > + coresight_disable_helpers(csdev); Minor nit, you could collapse these last two into "goto err_disable_helpers" and add another label before err: that disables helpers before falling through to err:. Other than that: Reviewed-by: James Clark <james.clark(a)linaro.org> > goto err; > } > }

11 months, 3 weeks

OpenCSD, ARM Cortex R8: observing I_COND_FLUSH and errors, no OCSD_GEN_TRC_ELEM_INSTR_RANGE seen.

by Tushar Rane

I am trying to get OpenCSD working to decode traces for code running on Dual core Arm Cortex-R8, all running in the context of Nucleus OS. I am running into some issues as described below; will appreciate any advice/suggestions. I have been able to get some of this working. So far following is the progress: -------------------------------------------------------------------------------------------------------- - OpenCSD library version: 1.5.5 - Using c_api_pkt_print_test.c test program to configure OpenCSD library and decode. Configuration or changes done in c_api_pkt_print_test.c::create_decoder_etmv4 for the Arm Cortex R8. Not using snapshot, but hard coded the mem_file_path and trace_file_path for my Arm Cortex R8 files. Logs are attached at the end of this message for reference. - Configuration from the code running on Arm Cortex-R8 for ETM, ETB, ETMFUN, Replicator, seems to be working. - Able to get ETB traces successfully. - Changed c_api_pkt_print_test.c and was able to compile and get some decoding done. - In the output of "c_api_pkt_print_test -raw -decode", the addresses can be co-related to the executed code, so it seems to be working so far till here. For example: Frame Data; Index 3984; ID_DATA[0x06]; e3 06 19 96 60 02 00 81 03 9a 04 00 ff ff Idx:3981;[ 0x96 0xDF 0xE3 ];I_ADDR_S_IS1 : Address, Short, IS1.; Addr=0x00000000201FE3BE ~[0xE3BE] Idx:3985;[ 0x06 0x19 ];I_EXCEPT : Exception.; Data Fault; Ret Addr Follows; Idx:3987;[ 0x96 0x60 ];I_ADDR_S_IS1 : Address, Short, IS1.; Addr=0x00000000201FE3C0 ~[0xC0] Idx:3989;[ 0x02 0x00 ];I_TIMESTAMP : Timestamp.; Updated val = 0x0 The “Addr=” can be co-related to the executed code before the trace was collected. Following are the issues encountered and the post here is to seek any ideas on what could be causing these and what could be to be done to fix them. ------------------------------------------------------------------- a. In the output for “c_api_pkt_print_test -raw -decode”, the decoder is not generating any OCSD_GEN_TRC_ELEM_INSTR_RANGE elements, which Is the main issue here. b. There are multiple occurrences of following errors: OCSD_ERR_BAD_DECODE_PKT (along with I_COND_FLUSH), OCSD_ERR_COMMIT_PKT_OVERRUN Idx:441;[ 0x9B 0x21 0xDB 0x21 0x20 ];I_ADDR_L_32IS1 : Address, Long, 32 bit, IS1.; Addr=0x000000002021DB42; Idx:446;[ 0x43 ];I_RESERVED_CFG : Reserved header for current configuration.[I_COND_FLUSH] DCD_ETMV4_0006 : 0x0019 (OCSD_ERR_BAD_DECODE_PKT) [Reserved or unknown packet in decoder.]; TrcIdx=446; CS ID=06; Unknown packet type.Frame Data; Index 448; ID_DATA[0x06]; 02 00 f7 da 2f 03 81 83 00 00 00 00 43 96 29 Idx:448;[ 0x02 0x00 ];I_TIMESTAMP : Timestamp.; Updated val = 0x0 Idx:448; TrcID:0x06; OCSD_GEN_TRC_ELEM_NO_SYNC( [bad-packet]) Idx:450;[ 0xF7 ];I_ATOM_F1 : Atom format 1.; E c. On running just “c_api_pkt_print_test -raw” (no decode), multiple OCSD_ERR_INVALID_PCKT_HDR errors are observed, along with I_COND_FLUSH are observed: PKTP_ETMV4I_0006 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=460; CS ID=06; Idx:460; I_RESERVED_CFG : Reserved header for current configuration.[I_COND_FLUSH] However, conditional Instruction Tracing is disabled (as seen below) , so it's unclear why I_COND_FLUSH above are observed. Could I_COND_FLUSH be throwing off the decoder? NOTES: -------- I did run a test by configuring the processor to stall, setting TRCSTALLCTLR to 0x50c (in case traces were being dropped and processor stalling helped). However that did not make any difference. CONFIGURATION DONE IN CODE RUNNING ON ARM CORTEX-R8: ------------------------------------------------------------------- // INSTP0 = 0 (Only branches are P0 instructions) // BB = 0 (Branch broadcast mode disabled) // CCI = 0 (cycle counting in instruction trace disabled) // CID = 1 (Context ID tracing enabled) // COND = 0 (Conditional instruction tracing disabled) // TS = 1 (Global timestamp tracing enabled) // RS = 1 (Return stack enabled) // DA = 0 (Data address tracing disabled) // DV = 0 (Data value tracing disabled) TRCCONFIGR = 0x1841; // disable all event tracing TRCEVENTCTL0R = 0; TRCEVENTCTL1R = 0; // ISTALL = 0 (Do not stall processor) // DSTALL = 0 (Do not stall processor) // LEVEL = 0 (Lowest level, where zero invasion occurs) // the risk of overflow) TRCSTALLCTLR = 0; // Enable trace synchronization every 256 bytes of trace TRCSYNCPR = 0x8 //Trace ID = 6, only on trace stream from one Cortex R8 core. TRCTRACEIDR = 6; // Disable the timestamp event. The trace unit still generates // timestamps due to other reasons such // as trace synchronization TRCTSCTLR = 0; // Enable ViewInst to trace everything, with the Start/Stop logic // started // EXLEVEL_S3 = 0 (Enable ViewInst in this exception level. // Highest privilege level. FW runs at this level) // TRCERR = 1 (System error exception is always traced regardless of // the value of ViewInst) // SSSTATUS = 1 (Start/stop logic is in the started state) // TYPE = 0 (Single selected resource) // SEL = 1 (Selects the resource number 1, which always returns TRUE) TRCVICTLR = 0xa01 // No address range filtering for ViewIns TRCVIIECTLR = 0; // No start or stop points for ViewInst TRCVISSCTLR = 0; // enable ETM TRCPRGCTLR = 1; Configuration for ETMv4 done in c_api_pkt_print_test.c --------------------------------------------------------- trace_config.arch_ver = ARCH_V7; trace_config.core_prof = profile_CortexR; trace_config.reg_configr = 0x1841; printf("trace_config.reg_configr = 0x%x", trace_config.reg_configr); trace_config.reg_traceidr = 0x6; // CRT = 6 if(test_trc_id_override != 0) { trace_config.reg_traceidr = (uint32_t)test_trc_id_override; } test_trc_id_override = trace_config.reg_traceidr; /* remember what id we actually used */ trace_config.reg_idr0 = 0x6001eff; trace_config.reg_idr1 = 0x4100f400; trace_config.reg_idr2 = 0x420084; trace_config.reg_idr8 = 0x40; trace_config.reg_idr9 = 0x40; trace_config.reg_idr10 = 0x40; trace_config.reg_idr11 = 0x11; trace_config.reg_idr12 = 0x20; trace_config.reg_idr13 = 0x0; /* create an ETMV4 decoder - no context needed as we have a single stream to a single handler. */ return create_generic_decoder(dcd_tree_h,OCSD_BUILTIN_DCD_ETMV4I,(void *)&trace_config,0); Attached following logs: ------------------------------ Output of “c_api_pkt_print_test -raw -decode” - 5.raw.decode.newoutput.log Output of “c_api_pkt_print_test -raw” - 5.raw.newoutput.log I would appreciate any feedback regarding this to help get the needed output from the decoder.

11 months, 3 weeks

Re: [PATCH] coresight: trbe: Save/restore state across CPU low power state

by Anshuman Khandual

On 4/24/25 04:30, Yabin Cui wrote: > Similar to ETE, TRBE may lose its context when a CPU enters low > power state. To make things worse, if ETE state is restored without > restoring TRBE state, it can lead to a stuck CPU due to an enabled > source device with no enabled sink devices. Could you please provide some more details about when the cpu gets stuck e.g dmesg, traces etc. Also consider adding those details in the commit message as well to establish the problem, this patch is trying to address. > > This patch introduces support for "arm,coresight-loses-context-with-cpu" > in the TRBE driver. When present, TRBE registers are saved before > and restored after CPU low power state. To prevent CPU hangs, TRBE > state is always saved after ETE state and restored after ETE state. The save and restore order here is 1. Save ETE 2. Save TRBE 3. Restore ETE 4. Restore TRBE > > Signed-off-by: Yabin Cui <yabinc(a)google.com> > --- > .../coresight/coresight-etm4x-core.c | 13 ++++- > drivers/hwtracing/coresight/coresight-trbe.c | 53 +++++++++++++++++++ > include/linux/coresight.h | 6 +++ > 3 files changed, 71 insertions(+), 1 deletion(-) > > diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c > index e5972f16abff..1bbaa1249206 100644 > --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c > +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c > @@ -1863,6 +1863,7 @@ static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) > static int etm4_cpu_save(struct etmv4_drvdata *drvdata) > { > int ret = 0; > + struct coresight_device *sink; > > /* Save the TRFCR irrespective of whether the ETM is ON */ > if (drvdata->trfcr) > @@ -1871,8 +1872,14 @@ static int etm4_cpu_save(struct etmv4_drvdata *drvdata) > * Save and restore the ETM Trace registers only if > * the ETM is active. > */ > - if (coresight_get_mode(drvdata->csdev) && drvdata->save_state) > + if (coresight_get_mode(drvdata->csdev) && drvdata->save_state) { > ret = __etm4_cpu_save(drvdata); > + if (ret == 0) { > + sink = coresight_get_percpu_sink(drvdata->cpu); > + if (sink && sink_ops(sink)->percpu_save) > + sink_ops(sink)->percpu_save(sink); > + } > + } > return ret; > } > > @@ -1977,6 +1984,10 @@ static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata) > > static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) > { > + struct coresight_device *sink = coresight_get_percpu_sink(drvdata->cpu); > + > + if (sink && sink_ops(sink)->percpu_restore) > + sink_ops(sink)->percpu_restore(sink); TRBE gets restored first which contradicts the order mentioned earlier in the commit message ? > if (drvdata->trfcr) > write_trfcr(drvdata->save_trfcr); > if (drvdata->state_needs_restore) > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c > index fff67aac8418..38bf46951a82 100644 > --- a/drivers/hwtracing/coresight/coresight-trbe.c > +++ b/drivers/hwtracing/coresight/coresight-trbe.c > @@ -115,6 +115,13 @@ static int trbe_errata_cpucaps[] = { > */ > #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256 > > +struct trbe_save_state { > + u64 trblimitr; > + u64 trbbaser; > + u64 trbptr; > + u64 trbsr; > +}; This seems to accommodate all required TRBE registers. Although this is very intuitive could you please also add a comment above this structure explaining the elements like other existing structures in the file ? > + > /* > * struct trbe_cpudata: TRBE instance specific data > * @trbe_flag - TRBE dirty/access flag support > @@ -123,6 +130,9 @@ static int trbe_errata_cpucaps[] = { > * @cpu - CPU this TRBE belongs to. > * @mode - Mode of current operation. (perf/disabled) > * @drvdata - TRBE specific drvdata > + * @state_needs_save - Need to save trace registers when entering cpu idle > + * @state_needs_restore - Need to restore trace registers when exiting cpu idle > + * @save_state - Saved trace registers > * @errata - Bit map for the errata on this TRBE. > */ > struct trbe_cpudata { > @@ -133,6 +143,9 @@ struct trbe_cpudata { > enum cs_mode mode; > struct trbe_buf *buf; > struct trbe_drvdata *drvdata; > + bool state_needs_save; This tracks whether coresight_loses_context_with_cpu() is supported on the CPU or not. > + bool state_needs_restore; This tracks whether the state has been saved earlier and hence can be restored later on when required. > + struct trbe_save_state save_state; > DECLARE_BITMAP(errata, TRBE_ERRATA_MAX); > }; > > @@ -1187,12 +1200,49 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) > return IRQ_HANDLED; > } > > +static void arm_trbe_cpu_save(struct coresight_device *csdev) > +{ > + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); > + struct trbe_save_state *state = &cpudata->save_state; Please move the above statement after the following conditional block. Because struct trbe_save_state is not going to be required if arm_trbe_cpu_save() returns prematurely from here. > + > + if (cpudata->mode == CS_MODE_DISABLED || !cpudata->state_needs_save) > + return;> + > + state->trbbaser = read_sysreg_s(SYS_TRBBASER_EL1); > + state->trbptr = read_sysreg_s(SYS_TRBPTR_EL1); > + state->trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); > + state->trbsr = read_sysreg_s(SYS_TRBSR_EL1); > + cpudata->state_needs_restore = true; This looks right. > +} > + > +static void arm_trbe_cpu_restore(struct coresight_device *csdev) > +{ > + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); > + struct trbe_save_state *state = &cpudata->save_state; Please move this assignment inside the block where these registers actually get restored after all checks are done. > + > + if (cpudata->state_needs_restore) { > + /* > + * To avoid disruption of normal tracing, restore trace > + * registers only when TRBE lost power (TRBLIMITR == 0). > + */ Although this sounds like a reasonable condition, but what happens when TRBE restoration is skipped ? > + if (read_sysreg_s(SYS_TRBLIMITR_EL1) == 0) { state = &cpudata->save_state > + write_sysreg_s(state->trbbaser, SYS_TRBBASER_EL1); > + write_sysreg_s(state->trbptr, SYS_TRBPTR_EL1); > + write_sysreg_s(state->trbsr, SYS_TRBSR_EL1); > + set_trbe_enabled(cpudata, state->trblimitr); > + } > + cpudata->state_needs_restore = false; That means earlier captured state is dropped without a restoration. > + } > +} > + > static const struct coresight_ops_sink arm_trbe_sink_ops = { > .enable = arm_trbe_enable, > .disable = arm_trbe_disable, > .alloc_buffer = arm_trbe_alloc_buffer, > .free_buffer = arm_trbe_free_buffer, > .update_buffer = arm_trbe_update_buffer, > + .percpu_save = arm_trbe_cpu_save, > + .percpu_restore = arm_trbe_cpu_restore, Why this percpu_* prefix ? > }; > > static const struct coresight_ops arm_trbe_cs_ops = { > @@ -1358,6 +1408,9 @@ static void arm_trbe_probe_cpu(void *info) > cpudata->trbe_flag = get_trbe_flag_update(trbidr); > cpudata->cpu = cpu; > cpudata->drvdata = drvdata; > + cpudata->state_needs_save = coresight_loses_context_with_cpu( > + &drvdata->pdev->dev); > + cpudata->state_needs_restore = false; The init here looks good. > return; > cpu_clear: > cpumask_clear_cpu(cpu, &drvdata->supported_cpus); > diff --git a/include/linux/coresight.h b/include/linux/coresight.h > index d79a242b271d..fec375d02535 100644 > --- a/include/linux/coresight.h > +++ b/include/linux/coresight.h > @@ -362,6 +362,10 @@ enum cs_mode { > * @alloc_buffer: initialises perf's ring buffer for trace collection. > * @free_buffer: release memory allocated in @get_config. > * @update_buffer: update buffer pointers after a trace session. > + * @percpu_save: saves state when CPU enters idle state. > + * Only set for percpu sink. > + * @percpu_restore: restores state when CPU exits idle state. > + * only set for percpu sink. > */ > struct coresight_ops_sink { > int (*enable)(struct coresight_device *csdev, enum cs_mode mode, > @@ -374,6 +378,8 @@ struct coresight_ops_sink { > unsigned long (*update_buffer)(struct coresight_device *csdev, > struct perf_output_handle *handle, > void *sink_config); > + void (*percpu_save)(struct coresight_device *csdev); > + void (*percpu_restore)(struct coresight_device *csdev); Again - why this percpu_* prefix ? > }; > > /**

11 months, 3 weeks

Re: [PATCH] coresight: trbe: Save/restore state across CPU low power state

by Anshuman Khandual

On 4/25/25 00:07, Yabin Cui wrote: > On Thu, Apr 24, 2025 at 1:46 AM Anshuman Khandual > <anshuman.khandual(a)arm.com> wrote: >> >> On 4/24/25 04:30, Yabin Cui wrote: >>> Similar to ETE, TRBE may lose its context when a CPU enters low >>> power state. To make things worse, if ETE state is restored without >>> restoring TRBE state, it can lead to a stuck CPU due to an enabled >>> source device with no enabled sink devices. >>> >>> This patch introduces support for "arm,coresight-loses-context-with-cpu" >> >> But could "arm,coresight-loses-context-with-cpu" device tree property >> be associated wit TRBE devices ? OR this state save and restore needs >> to be handled in the firmware. > > This property is handled by ETM/ETE driver. In ETM/ETE driver, the state has > options to be saved by firmware or by the driver. On my test device, which > is Pixel 9, the state is saved by the driver. I have also tested that TRBE state > can be saved by the TRBE driver. Basically if the state is saved and restored in the driver for the source ETM devices, then the same should also happen for corresponding TRBE sink devices as well ? > >> >>> in the TRBE driver. When present, TRBE registers are saved before >>> and restored after CPU low power state. To prevent CPU hangs, TRBE >>> state is always saved after ETE state and restored after ETE state. >>> >>> Signed-off-by: Yabin Cui <yabinc(a)google.com> >>> --- >>> .../coresight/coresight-etm4x-core.c | 13 ++++- >>> drivers/hwtracing/coresight/coresight-trbe.c | 53 +++++++++++++++++++ >>> include/linux/coresight.h | 6 +++ >>> 3 files changed, 71 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c >>> index e5972f16abff..1bbaa1249206 100644 >>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c >>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c >>> @@ -1863,6 +1863,7 @@ static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) >>> static int etm4_cpu_save(struct etmv4_drvdata *drvdata) >>> { >>> int ret = 0; >>> + struct coresight_device *sink; >>> >>> /* Save the TRFCR irrespective of whether the ETM is ON */ >>> if (drvdata->trfcr) >>> @@ -1871,8 +1872,14 @@ static int etm4_cpu_save(struct etmv4_drvdata *drvdata) >>> * Save and restore the ETM Trace registers only if >>> * the ETM is active. >>> */ >>> - if (coresight_get_mode(drvdata->csdev) && drvdata->save_state) >>> + if (coresight_get_mode(drvdata->csdev) && drvdata->save_state) { >>> ret = __etm4_cpu_save(drvdata); >>> + if (ret == 0) { >>> + sink = coresight_get_percpu_sink(drvdata->cpu); >>> + if (sink && sink_ops(sink)->percpu_save) >>> + sink_ops(sink)->percpu_save(sink); >>> + } >>> + } >>> return ret; >>> } >>> >>> @@ -1977,6 +1984,10 @@ static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata) >>> >>> static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) >>> { >>> + struct coresight_device *sink = coresight_get_percpu_sink(drvdata->cpu); >>> + >>> + if (sink && sink_ops(sink)->percpu_restore) >>> + sink_ops(sink)->percpu_restore(sink); >>> if (drvdata->trfcr) >>> write_trfcr(drvdata->save_trfcr); >>> if (drvdata->state_needs_restore) >>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c >>> index fff67aac8418..38bf46951a82 100644 >>> --- a/drivers/hwtracing/coresight/coresight-trbe.c >>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c >>> @@ -115,6 +115,13 @@ static int trbe_errata_cpucaps[] = { >>> */ >>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256 >>> >>> +struct trbe_save_state { >>> + u64 trblimitr; >>> + u64 trbbaser; >>> + u64 trbptr; >>> + u64 trbsr; >>> +}; >>> + >>> /* >>> * struct trbe_cpudata: TRBE instance specific data >>> * @trbe_flag - TRBE dirty/access flag support >>> @@ -123,6 +130,9 @@ static int trbe_errata_cpucaps[] = { >>> * @cpu - CPU this TRBE belongs to. >>> * @mode - Mode of current operation. (perf/disabled) >>> * @drvdata - TRBE specific drvdata >>> + * @state_needs_save - Need to save trace registers when entering cpu idle >>> + * @state_needs_restore - Need to restore trace registers when exiting cpu idle >>> + * @save_state - Saved trace registers >>> * @errata - Bit map for the errata on this TRBE. >>> */ >>> struct trbe_cpudata { >>> @@ -133,6 +143,9 @@ struct trbe_cpudata { >>> enum cs_mode mode; >>> struct trbe_buf *buf; >>> struct trbe_drvdata *drvdata; >>> + bool state_needs_save; >>> + bool state_needs_restore; >>> + struct trbe_save_state save_state; >>> DECLARE_BITMAP(errata, TRBE_ERRATA_MAX); >>> }; >>> >>> @@ -1187,12 +1200,49 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) >>> return IRQ_HANDLED; >>> } >>> >>> +static void arm_trbe_cpu_save(struct coresight_device *csdev) >>> +{ >>> + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); >>> + struct trbe_save_state *state = &cpudata->save_state; >>> + >>> + if (cpudata->mode == CS_MODE_DISABLED || !cpudata->state_needs_save) >>> + return; >>> + >>> + state->trbbaser = read_sysreg_s(SYS_TRBBASER_EL1); >>> + state->trbptr = read_sysreg_s(SYS_TRBPTR_EL1); >>> + state->trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1); >>> + state->trbsr = read_sysreg_s(SYS_TRBSR_EL1); >>> + cpudata->state_needs_restore = true; >>> +} >>> + >>> +static void arm_trbe_cpu_restore(struct coresight_device *csdev) >>> +{ >>> + struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); >>> + struct trbe_save_state *state = &cpudata->save_state; >>> + >>> + if (cpudata->state_needs_restore) { >>> + /* >>> + * To avoid disruption of normal tracing, restore trace >>> + * registers only when TRBE lost power (TRBLIMITR == 0). >>> + */ >>> + if (read_sysreg_s(SYS_TRBLIMITR_EL1) == 0) { >>> + write_sysreg_s(state->trbbaser, SYS_TRBBASER_EL1); >>> + write_sysreg_s(state->trbptr, SYS_TRBPTR_EL1); >>> + write_sysreg_s(state->trbsr, SYS_TRBSR_EL1); >>> + set_trbe_enabled(cpudata, state->trblimitr); >>> + } >>> + cpudata->state_needs_restore = false; >>> + } >>> +} >>> + >>> static const struct coresight_ops_sink arm_trbe_sink_ops = { >>> .enable = arm_trbe_enable, >>> .disable = arm_trbe_disable, >>> .alloc_buffer = arm_trbe_alloc_buffer, >>> .free_buffer = arm_trbe_free_buffer, >>> .update_buffer = arm_trbe_update_buffer, >>> + .percpu_save = arm_trbe_cpu_save, >>> + .percpu_restore = arm_trbe_cpu_restore, >>> }; >>> >>> static const struct coresight_ops arm_trbe_cs_ops = { >>> @@ -1358,6 +1408,9 @@ static void arm_trbe_probe_cpu(void *info) >>> cpudata->trbe_flag = get_trbe_flag_update(trbidr); >>> cpudata->cpu = cpu; >>> cpudata->drvdata = drvdata; >>> + cpudata->state_needs_save = coresight_loses_context_with_cpu( >>> + &drvdata->pdev->dev); >>> + cpudata->state_needs_restore = false; >>> return; >>> cpu_clear: >>> cpumask_clear_cpu(cpu, &drvdata->supported_cpus); >>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h >>> index d79a242b271d..fec375d02535 100644 >>> --- a/include/linux/coresight.h >>> +++ b/include/linux/coresight.h >>> @@ -362,6 +362,10 @@ enum cs_mode { >>> * @alloc_buffer: initialises perf's ring buffer for trace collection. >>> * @free_buffer: release memory allocated in @get_config. >>> * @update_buffer: update buffer pointers after a trace session. >>> + * @percpu_save: saves state when CPU enters idle state. >>> + * Only set for percpu sink. >>> + * @percpu_restore: restores state when CPU exits idle state. >>> + * only set for percpu sink. >>> */ >>> struct coresight_ops_sink { >>> int (*enable)(struct coresight_device *csdev, enum cs_mode mode, >>> @@ -374,6 +378,8 @@ struct coresight_ops_sink { >>> unsigned long (*update_buffer)(struct coresight_device *csdev, >>> struct perf_output_handle *handle, >>> void *sink_config); >>> + void (*percpu_save)(struct coresight_device *csdev); >>> + void (*percpu_restore)(struct coresight_device *csdev); >>> }; >>> >>> /**

11 months, 3 weeks

Re: [PATCH 3/4] coresight: tmc: refactor the tmc-etr mode setting

by hejunhao

On 2025/4/23 10:07, Jie Gan wrote: > > > On 4/22/2025 8:52 PM, hejunhao wrote: >> >> On 2025/4/18 15:12, Jie Gan wrote: >>> >>> >>> On 4/18/2025 1:58 PM, Junhao He wrote: >>>> When trying to run perf and sysfs mode simultaneously, the WARN_ON() >>>> in tmc_etr_enable_hw() is triggered sometimes: >>>> >>>> WARNING: CPU: 42 PID: 3911571 at drivers/hwtracing/coresight/ >>>> coresight-tmc-etr.c:1060 tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] >>>> [..snip..] >>>> Call trace: >>>> tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] (P) >>>> tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] (L) >>>> tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] >>>> coresight_enable_path+0x1c8/0x218 [coresight] >>>> coresight_enable_sysfs+0xa4/0x228 [coresight] >>>> enable_source_store+0x58/0xa8 [coresight] >>>> dev_attr_store+0x20/0x40 >>>> sysfs_kf_write+0x4c/0x68 >>>> kernfs_fop_write_iter+0x120/0x1b8 >>>> vfs_write+0x2c8/0x388 >>>> ksys_write+0x74/0x108 >>>> __arm64_sys_write+0x24/0x38 >>>> el0_svc_common.constprop.0+0x64/0x148 >>>> do_el0_svc+0x24/0x38 >>>> el0_svc+0x3c/0x130 >>>> el0t_64_sync_handler+0xc8/0xd0 >>>> el0t_64_sync+0x1ac/0x1b0 >>>> ---[ end trace 0000000000000000 ]--- >>>> >>>> Since the sysfs buffer allocation and the hardware enablement is not >>>> in the same critical region, it's possible to race with the perf >>>> >>>> mode: >>>> [sysfs mode] [perf mode] >>>> tmc_etr_get_sysfs_buffer() >>>> spin_lock(&drvdata->spinlock) >>>> [sysfs buffer allocation] >>>> spin_unlock(&drvdata->spinlock) >>>> spin_lock(&drvdata->spinlock) >>>> tmc_etr_enable_hw() >>>> drvdata->etr_buf = >>>> etr_perf->etr_buf >>>> spin_unlock(&drvdata->spinlock) >>>> spin_lock(&drvdata->spinlock) >>>> tmc_etr_enable_hw() >>>> WARN_ON(drvdata->etr_buf) // WARN sicne etr_buf initialized at >>>> the perf side >>>> spin_unlock(&drvdata->spinlock) >>>> >>>> To resolve this, configure the tmc-etr mode before invoking >>>> `enable_perf()` or sysfs interfaces. Prior to mode configuration, >>>> explicitly check if the tmc-etr sink is already enabled in a >>>> different mode to prevent race conditions between mode transitions. >>>> Furthermore, enforce spinlock protection around the critical >>>> sections to serialize concurrent accesses from sysfs and perf >>>> subsystems. >>> >>> Is any issue observed during this race condition? >>> >>> The etr_buf is checked before assign it to sysfs_buf or perf_buf. >>> In my understanding, the warning raised to indicate the etr already >>> enabled with sysfs mode or perf mode, so terminate current enable >>> session, right? >>> >>> Thanks, >>> Jie >>> >> >> The sysfs task is executing the process to enable tmc-etr and has >> allocated >> a sysfs_buf. However, the sysfs spinlock does not ensure atomicity of >> the >> entire enable_etr_sink_sysfs() function execution. In this scenario, >> the perf >> session may preemptively call tmc_etr_enable_hw() to enable tmc-etr. >> >> Consequently, during race conditions, perf always prioritizes >> execution, and >> the sysfs_buf remains un-released, leading to memory leaks. Then this >> triggers >> a WARN_ON(drvdata->etr_buf) warning. > > The pre-allocated sysfs_buf is stored in drvdata->sysfs_buf. And I > think it's fine to remain unreleased until the ETR is enabled with > sysfs mode. Yes, you're right. The tmc-etr is retaining the sysfs_buf, so no memory leak occurs. > The ETR will not setup a new sysfs_buf if drvdata->sysfs_buf already > exists. But in other words, it definitely consumes some > memory(especailly allocated a large size of sysfs_buf) for a while or > forever if we dont enable the ETR with sysfs mode in the future? > > I think this is the issue try to address? > Patch 3 of patchset [1] is visible, serving as the basis, and you can understand what the patch aims to achieve. This patch aims to avoid race conditions, as warnings can lead to concerns about potential hidden bugs, such as getting out of sync. I may need to update the commit to include the commit log for patch 1's fix in patchset [1]. [1]: https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-4-yangyicong@… Best regards, Junhao. > Thanks, > Jie > >> >> This patch also addresses the issue described in [1], simplifying the >> setup and >> checks for "mode". >> >> [1] Closes: https://lore.kernel.org/linux-arm- >> kernel/20241202092419.11777-2-yangyicong(a)huawei.com/ >> >> Best regards, >> Junhao. >> >>>> >>>> Fixes: 296b01fd106e ("coresight: Refactor out buffer allocation >>>> function for ETR") >>>> Reported-by: Yicong Yang <yangyicong(a)hisilicon.com> >>>> Closes: https://lore.kernel.org/linux-arm- >>>> kernel/20241202092419.11777-2-yangyicong(a)huawei.com/ >>>> Signed-off-by: Junhao He <hejunhao3(a)huawei.com> >>>> --- >>>> .../hwtracing/coresight/coresight-tmc-etr.c | 77 >>>> +++++++++++-------- >>>> 1 file changed, 47 insertions(+), 30 deletions(-) >>>> >>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/ >>>> drivers/hwtracing/coresight/coresight-tmc-etr.c >>>> index a48bb85d0e7f..3d94d64cacaa 100644 >>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c >>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c >>>> @@ -1190,11 +1190,6 @@ static struct etr_buf >>>> *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) >>>> spin_lock_irqsave(&drvdata->spinlock, flags); >>>> } >>>> - if (drvdata->reading || coresight_get_mode(csdev) == >>>> CS_MODE_PERF) { >>>> - ret = -EBUSY; >>>> - goto out; >>>> - } >>>> - >>>> /* >>>> * If we don't have a buffer or it doesn't match the >>>> requested size, >>>> * use the buffer allocated above. Otherwise reuse the >>>> existing buffer. >>>> @@ -1205,7 +1200,6 @@ static struct etr_buf >>>> *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) >>>> drvdata->sysfs_buf = new_buf; >>>> } >>>> -out: >>>> spin_unlock_irqrestore(&drvdata->spinlock, flags); >>>> /* Free memory outside the spinlock if need be */ >>>> @@ -1216,7 +1210,7 @@ static struct etr_buf >>>> *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) >>>> static int tmc_enable_etr_sink_sysfs(struct coresight_device >>>> *csdev) >>>> { >>>> - int ret = 0; >>>> + int ret; >>>> unsigned long flags; >>>> struct tmc_drvdata *drvdata = >>>> dev_get_drvdata(csdev->dev.parent); >>>> struct etr_buf *sysfs_buf = tmc_etr_get_sysfs_buffer(csdev); >>>> @@ -1226,23 +1220,10 @@ static int tmc_enable_etr_sink_sysfs(struct >>>> coresight_device *csdev) >>>> spin_lock_irqsave(&drvdata->spinlock, flags); >>>> - /* >>>> - * In sysFS mode we can have multiple writers per sink. Since >>>> this >>>> - * sink is already enabled no memory is needed and the HW need >>>> not be >>>> - * touched, even if the buffer size has changed. >>>> - */ >>>> - if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { >>>> - csdev->refcnt++; >>>> - goto out; >>>> - } >>>> - >>>> ret = tmc_etr_enable_hw(drvdata, sysfs_buf); >>>> - if (!ret) { >>>> - coresight_set_mode(csdev, CS_MODE_SYSFS); >>>> + if (!ret) >>>> csdev->refcnt++; >>>> - } >>>> -out: >>>> spin_unlock_irqrestore(&drvdata->spinlock, flags); >>>> if (!ret) >>>> @@ -1652,11 +1633,6 @@ static int tmc_enable_etr_sink_perf(struct >>>> coresight_device *csdev, void *data) >>>> struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle); >>>> spin_lock_irqsave(&drvdata->spinlock, flags); >>>> - /* Don't use this sink if it is already claimed by sysFS */ >>>> - if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { >>>> - rc = -EBUSY; >>>> - goto unlock_out; >>>> - } >>>> if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) { >>>> rc = -EINVAL; >>>> @@ -1685,7 +1661,6 @@ static int tmc_enable_etr_sink_perf(struct >>>> coresight_device *csdev, void *data) >>>> if (!rc) { >>>> /* Associate with monitored process. */ >>>> drvdata->pid = pid; >>>> - coresight_set_mode(csdev, CS_MODE_PERF); >>>> drvdata->perf_buf = etr_perf->etr_buf; >>>> csdev->refcnt++; >>>> } >>>> @@ -1698,14 +1673,56 @@ static int tmc_enable_etr_sink_perf(struct >>>> coresight_device *csdev, void *data) >>>> static int tmc_enable_etr_sink(struct coresight_device *csdev, >>>> enum cs_mode mode, void *data) >>>> { >>>> + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); >>>> + enum cs_mode old_mode; >>>> + int rc; >>>> + >>>> + scoped_guard(spinlock_irqsave, &drvdata->spinlock) { >>>> + old_mode = coresight_get_mode(csdev); >>>> + if (old_mode != CS_MODE_DISABLED && old_mode != mode) >>>> + return -EBUSY; >>>> + >>>> + if (drvdata->reading) >>>> + return -EBUSY; >>>> + >>>> + /* >>>> + * In sysFS mode we can have multiple writers per sink. >>>> Since this >>>> + * sink is already enabled no memory is needed and the HW >>>> need not be >>>> + * touched, even if the buffer size has changed. >>>> + */ >>>> + if (old_mode == CS_MODE_SYSFS) { >>>> + csdev->refcnt++; >>>> + return 0; >>>> + } >>>> + >>>> + /* >>>> + * minor note: >>>> + * When sysfs-task1 get locked, it setup the mode first. Then >>>> + * sysfs-task2 gets locked，it will directly return success >>>> even >>>> + * when the tmc-etr is not enabled at this moment. >>>> Ultimately, >>>> + * sysfs-task1 will still successfully enable tmc-etr. >>>> + * This is a transient state and does not cause an anomaly. >>>> + */ >>>> + coresight_set_mode(csdev, mode); >>>> + } >>>> + >>>> switch (mode) { >>>> case CS_MODE_SYSFS: >>>> - return tmc_enable_etr_sink_sysfs(csdev); >>>> + rc = tmc_enable_etr_sink_sysfs(csdev); >>>> + break; >>>> case CS_MODE_PERF: >>>> - return tmc_enable_etr_sink_perf(csdev, data); >>>> + rc = tmc_enable_etr_sink_perf(csdev, data); >>>> + break; >>>> default: >>>> - return -EINVAL; >>>> + rc = -EINVAL; >>>> } >>>> + >>>> + scoped_guard(spinlock_irqsave, &drvdata->spinlock) { >>>> + if (rc && old_mode != mode) >>>> + coresight_set_mode(csdev, old_mode); >>>> + } >>>> + >>>> + return rc; >>>> } >>>> static int tmc_disable_etr_sink(struct coresight_device *csdev) >>> >>> . >>> >> > > . >

11 months, 3 weeks

Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU

by Anshuman Khandual

On 4/24/25 01:31, Yabin Cui wrote: > On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan(a)arm.com> wrote: >> >> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote: >>> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or >>> TRBE), doesn't require contiguous pages for its AUX buffer. >> >> Though contiguous pages are not mandatory for TRBE, I would set the >> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit >> performance. > > As explained in the patch 1/2, my use case periodically collects ETM data > from the field (using both TRBE and ETR), and needs to reduce memory > fragmentation. If the performance impact is big, we can make it user > configurable. Otherwise, shall we default it to non-contiguous pages? But is not that already happening ? cs_etm does not set the PMU cap PERF_PMU_CAP_AUX_NO_SG that means it can allocate non-contig memory chunk. Where am I missing ? > >> >> For non per CPU sinks, it is fine to allocate non-contiguous pages. >> >> Thanks, >> Leo >> >>> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability >>> to the cs_etm PMU. This allows the kernel to allocate non-contiguous >>> pages for the AUX buffer, reducing memory fragmentation when using >>> cs_etm. >>> >>> Signed-off-by: Yabin Cui <yabinc(a)google.com> >>> --- >>> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c >>> index f4cccd68e625..c98646eca7f8 100644 >>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c >>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c >>> @@ -899,7 +899,8 @@ int __init etm_perf_init(void) >>> int ret; >>> >>> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | >>> - PERF_PMU_CAP_ITRACE); >>> + PERF_PMU_CAP_ITRACE | >>> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); >>> >>> etm_pmu.attr_groups = etm_pmu_attr_groups; >>> etm_pmu.task_ctx_nr = perf_sw_context; >>> -- >>> 2.49.0.805.g082f7c87e0-goog >>> >>> >

11 months, 4 weeks

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

CoreSight