On 03/08/2021 11:25, Anshuman Khandual wrote:
>
>
> On 7/28/21 7:22 PM, Suzuki K Poulose wrote:
>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>> an erratum, which when triggered, might cause the TRBE to overwrite
>> the trace data already collected in FILL mode, in the event of a WRAP.
>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>> and could write upto 3 cache line size worth trace. Thus, this could
>> corrupt the trace at the "BASE" pointer.
>>
>> The workaround is to program the write pointer 256bytes from the
>> base, such that if the erratum is triggered, it doesn't overwrite
>> the trace data that was captured. This skipped region could be
>> padded with ignore packets at the end of the session, so that
>> the decoder sees a continuous buffer with some padding at the
>> beginning. The trace data written at the base is considered
>> lost as the limit could have been in the middle of the perf
>> ring buffer, and jumping to the "base" is not acceptable.
>
> If the write pointer is guaranteed to have always started beyond
> [base + 256] and then wraps around. Why cannot the initial trace
> at [base .. base + 256] not to be considered for perf ?
Remember that we could be dealing with a region within the larger
ring buffer.
Ring buffer
| write (head + 256)
v
\---------xxxxx----------------------------------/
^ ^ ^
head | | Limit Ring buffer end
In this case, you are talking about the area marked as
"xxx", which ideally should be appearing just after Limit,
in the normal course of the trace without the erratum.
Thus leaving the data at [base.. base+256] should
be ideally moved to the end (after limit ptr). And :
1) We don't always have space after the "limit" in the ring buffer.
2) We don't know if the erratum was triggered at all, under the
micro architectural conditions.
So, it is ideal to live with this ignoring the trace for now.
>
>> We set the flags already to indicate that some amount of trace
>> was lost during the FILL event IRQ. So this is fine.
>
> Right, from perf data flag point of view it is not a problem.
> But I am still wondering why cannot the trace at the base can
> not be consumed by perf.
>
Please see above.
>>
>> One important change with the work around is, we program the
>> TRBBASER_EL1 to current page where we are allowed to write.
>> Otherwise, it could overwrite a region that may be consumed
>> by the perf. Towards this, we always make sure that the
>
> While enabling TRBE after required reconfiguration, this will
> cause both TRBBASER_EL1 and TRBPTR_EL1 to change each time
> around (basically TRBBASER_EL1 follows handle->head) ?
Yes. This is to ensure that the TRBE doesn't corrupt the "trace
data" at the "start" of the actual ring buffer (not the area
that we are allowed to write). In a nutshell, TRBE is only allowed
to write within the area [head ... head + size].
>
>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>> so that we can set the BASE to the PAGE base and move the
>> TRBPTR to the 256bytes offset.
>
> Now that base needs to follow handle->head, both needs to be
> PAGE_SIZE aligned (including the write pointer) because base
> pointer needs to be PAGE_SIZE aligned ?
Base must always be page aligned for TRBE. Now, with the erratum,
the Base must be within [ head ... head + size ]. Thus we cannot
put TRBPTR (write ptr) outside [TRBBASER ... TRBLIMITR], the
write ptr follows the base and it is shifted by 256byte to allow
the scratch space for overwrite, when the erratum triggers.
>
>>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 111 +++++++++++++++++--
>> 1 file changed, 102 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 9ea28813182b..cd997ed5d918 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -17,6 +17,7 @@
>>
>> #include <asm/barrier.h>
>> #include <asm/cputype.h>
>> +#include <asm/cpufeature.h>
>>
>> #include "coresight-self-hosted-trace.h"
>> #include "coresight-trbe.h"
>> @@ -84,9 +85,17 @@ struct trbe_buf {
>> * per TRBE instance, we keep track of the list of errata that
>> * affects the given instance of the TRBE.
>> */
>> -#define TRBE_ERRATA_MAX 0
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> +#define TRBE_ERRATA_MAX 1
>
> Although I am not sure how to achieve it, TRBE_ERRATA_MAX should be
> derived from errata cpucap start and end markers which identify all
> TRBE specific erratas. The hard coding above could be avoided.
This lets us hand pick the TRBE errata and avoid having to do
something you say is hard to achieve. What do you think is
problematic with this approach ?
>
>> +
>> +/*
>> + * Safe limit for the number of bytes that may be overwritten
>> + * when the erratum is triggered.
>
> Specify which errata ?
There are two distinct CPU errata. I will try to make it a bit more
clear.
>
>> + */
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP 256
>
> Needs to have _BYTES. TRBE_ERRATA_OVERWRITE_FILL_MODE_SKIP_BYTES ?
>
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> + [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>
> s/TRBE_WORKAROUND_OVERWRITE_FILL_MODE/TRBE_ERRATA_OVERWRITE_FILL_MODE instead ?
>
> ARM64_WORKAROUND_TRBE_XXX -----> TRBE_ERRATA_XXX
I think WORKAROUND still makes it clear, and is inline with what we
follow in generic arm64 and I would prefer to keep that.
>
>> };
>>
>> /*
>> @@ -531,10 +540,13 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>> return TRBE_FAULT_ACT_FATAL;
>>
>> - if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> - if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> - return TRBE_FAULT_ACT_WRAP;
>> - }
>> + /*
>> + * It is not necessary to verify the TRBPTR == TRBBASER to detect
>> + * a FILL event. Moreover, CPU errata could make this check invalid.
>> + */
>
> Why should the check be dropped for CPU instances which dont have the errata ?
Good point. On the other hand, if the TRBPTR != TRBBASER, that
indicates a TRBE erratum. So, I don't see too much value in it. I could
keep that back in bypassed by the erratum check.
>
>> + if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED))
>> + return TRBE_FAULT_ACT_WRAP;
>> +
>> return TRBE_FAULT_ACT_SPURIOUS;
>> }
>>
>> @@ -544,6 +556,7 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>> {
>> u64 write;
>> u64 start_off, end_off;
>> + u64 size;
>>
>> /*
>> * If the TRBE has wrapped around the write pointer has
>> @@ -559,7 +572,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>
>> if (WARN_ON_ONCE(end_off < start_off))
>> return 0;
>> - return (end_off - start_off);
>> +
>> + size = end_off - start_off;
>> + /*
>> + * If the TRBE is affected by the following erratum, we must fill
>> + * the space we skipped with IGNORE packets. And we are always
>> + * guaranteed to have at least a PAGE_SIZE space in the buffer.
>> + */
>> + if (trbe_has_erratum(TRBE_WORKAROUND_OVERWRITE_FILL_MODE, buf->cpudata) &&
>> + !WARN_ON(size < TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP))
>
> Why warn here ? The skid can some times be less than 256 bytes. OR because
> now write pointer alignment is PAGE_SIZE (from later code), size too needs
> to be always a PAGE_SIZE multiple ?
Because, we expect at least a page size worth space before we run.
(Since BASE, WRITE and LIMIT are page aligned). If that guarantee is
broken, we have serious problem with the size calculation logic.
>
>> + __trbe_pad_buf(buf, start_off, TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP);
>> +
>> + return size;
>> }
>>
>> static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> @@ -704,20 +728,73 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> return size;
>> }
>>
>> +
>> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>> +{
>> + /*
>> + * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
>> + * line size from the "TRBBASER_EL1" in the event of a "FILL".
>> + * Thus, we could loose some amount of the trace at the base.
>> + *
>> + * To work around this:
>> + * - Software must leave 256bytes of space from the base, so that
>> + * the trace collected now is not overwritten.
>> + * - Fill the first 256bytes with IGNORE packets for the decoder
>> + * to ignore at the end of the session, so that the decoder ignores
>> + * this gap.
>> + *
>> + * This also means that, the TRBE driver must set the TRBBASER_EL1
>> + * such that, when the erratum is triggered, it doesn't overwrite
>> + * the "area" outside the area marked by (handle->head, +size).
>> + * So, we make sure that the handle->head is always PAGE aligned,
>> + * by tweaking the required alignment for the TRBE (trbe_align).
>> + * And when we enable the TRBE,
>> + *
>> + * - move the TRBPTR_EL1 to 256bytes past the starting point.
>> + * So that any trace collected in this run is not overwritten.
>> + *
>> + * - set the TRBBASER_EL1 to the original trbe_write. This will
>> + * ensure that, if the TRBE hits the erratum, it would only
>> + * write within the region allowed for the TRBE.
>> + *
>> + * At the trace collection time, we always pad the skipped bytes
>> + * with IGNORE packets to make sure the decoder doesn't see any
>> + * overwritten packets.
>> + */
>> + if (trbe_has_erratum(TRBE_WORKAROUND_OVERWRITE_FILL_MODE, buf->cpudata)) {
>> + if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
>> + return -EINVAL;
>> + buf->trbe_hw_base = buf->trbe_write;
>> + buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP;
>
> Not sure if I really understood this correctly, but would not the
> above keep on reducing the usable buffer space for TRBE each time
> around ? BTW buffer adjustment complexity here requires a proper
Yes, unfortunately, we have to make sure every time the TRBE is
turned ON, we leave this space. We always recaculate the offsets
anyway, before turning it ON and this additional step makes sure that
the WRITE ptr is moved 256bytes (and also the BASE is moved to the
"head").
I don't think this adjustment changes much. It is only shifting the
write pointer within the region [head ... limit]. The calculation
of the head and the limit still remains the same.
> ASCII diagram based illustration like before.
>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int __arm_trbe_enable(struct trbe_buf *buf,
>> struct perf_output_handle *handle)
>> {
>> + int ret = 0;
>> +
>> buf->trbe_limit = compute_trbe_buffer_limit(handle);
>> buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>> if (buf->trbe_limit == buf->trbe_base) {
>> - trbe_stop_and_truncate_event(handle);
>> - return -ENOSPC;
>> + ret = -ENOSPC;
>> + goto err;
>> }
>> /* Set the base of the TRBE to the buffer base */
>> buf->trbe_hw_base = buf->trbe_base;
>> +
>> + ret = trbe_apply_work_around_before_enable(buf);
>> + if (ret)
>> + goto err;
>> +
>> *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>> trbe_enable_hw(buf);
>> return 0;
>> +err:
>> + trbe_stop_and_truncate_event(handle);
>> + return ret;
>> }
>>
>> static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> @@ -1003,7 +1080,23 @@ static void arm_trbe_probe_cpu(void *info)
>> pr_err("Unsupported alignment on cpu %d\n", cpu);
>> goto cpu_clear;
>> }
>> - cpudata->trbe_align = cpudata->trbe_hw_align;
>> +
>> + /*
>> + * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>
> Better to address it with the original errata name i.e
> ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE instead. But not a big issue
> though.
We deal with the TRBE_xx name everywhere, for even checking the erratum,
so using the other one is going to confuse the code reading.
>
>> + * we must always program the TBRPTR_EL1, 256bytes from a page
>> + * boundary, with TRBBASER_EL1 set to the page, to prevent
>> + * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
>> + *
>> + * Thus make sure we always align our write pointer to a PAGE_SIZE,
>> + * which also guarantees that we have at least a PAGE_SIZE space in
>> + * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
>> + * the required bytes at the base.
>> + */
>
> Should not TRBPTR_EL1 be aligned to 256 bytes instead, as TRBBASER_EL1 is
> always at PAGE_SIZE boundary. Hence TRBPTR_EL1 could never be in between
> base and [base + 256 bytes]. Why alignment needs to go all the way upto
> PAGE_SIZE instead ? OR am I missing something here ?
Please see the diagram above. When the TRBE wraps, it jumps to the
TRBBASER, and if that is outside the region permitted for TRBE, i.e
[head... head + size], we risk corrupting data potentially consumed by
the userspace. So, we must make sure that the TRBBASER is within the
[head...head+size], which is why we update the hw_base accordingly
when we detect the erratum.
>
>> + if (trbe_has_erratum(TRBE_WORKAROUND_OVERWRITE_FILL_MODE, cpudata))
>> + cpudata->trbe_align = PAGE_SIZE;
>> + else
>> + cpudata->trbe_align = cpudata->trbe_hw_align;
>> +
>> cpudata->trbe_flag = get_trbe_flag_update(trbidr);
>> cpudata->cpu = cpu;
>> cpudata->drvdata = drvdata;
>>
>
> I will visit this patch again. Not sure if I really understand all this
> changes correctly enough.
Please let me know if you have further questions.
Thanks for taking a look
Suzuki
>
This patchset add support for SMB(System Memory Buffer) device, SMB
obtains CPU instructions from Coresight ETM device and stores these
messages in system memory.
SMB is developed by Ultrasoc technology, which is acquired by Siemens,
and we still use "Ultrasoc" to name document and driver.
Change since RFC:
- Move ultrasoc driver to drivers/hwtracing/coresight.
- Remove ultrasoc-axi-com.c, as AXI-COM doesn't need to be configured in
basic tracing function.
- Remove ultrasoc.c as SMB does not need to register with the ultrasoc core.
- Address the comments from Mathieu and Suzuki.
- Link: https://lists.linaro.org/pipermail/coresight/2021-June/006535.html
Qi Liu (2):
Documentation: tracing: Documentation for ultrasoc SMB drivers
coresight: ultrasoc: Add System Memory Buffer driver
.../trace/coresight/ultrasoc-trace.rst | 193 +++++
MAINTAINERS | 7 +
drivers/hwtracing/coresight/Kconfig | 3 +
drivers/hwtracing/coresight/Makefile | 2 +
drivers/hwtracing/coresight/ultrasoc/Kconfig | 12 +
drivers/hwtracing/coresight/ultrasoc/Makefile | 6 +
.../coresight/ultrasoc/ultrasoc-smb.c | 722 ++++++++++++++++++
.../coresight/ultrasoc/ultrasoc-smb.h | 142 ++++
8 files changed, 1087 insertions(+)
create mode 100644 Documentation/trace/coresight/ultrasoc-trace.rst
create mode 100644 drivers/hwtracing/coresight/ultrasoc/Kconfig
create mode 100644 drivers/hwtracing/coresight/ultrasoc/Makefile
create mode 100644 drivers/hwtracing/coresight/ultrasoc/ultrasoc-smb.c
create mode 100644 drivers/hwtracing/coresight/ultrasoc/ultrasoc-smb.h
--
2.17.1
This series adds CPU erratum work arounds related to the self-hosted
tracing. The list of affected errata handled in this series are :
* TRBE may overwrite trace in FILL mode
- Arm Neoverse-N2 #2139208
- Cortex-A710 #2119858
* A TSB instruction may not flush the trace completely when executed
in trace prohibited region.
- Arm Neoverse-N2 #2067961
- Cortex-A710 #2054223
The series applies on the self-hosted/trbe fixes posted here [0].
A tree containing both the series is available here [1].
[0] https://lkml.kernel.org/r/20210723124456.3828769-1-suzuki.poulose@arm.com
[1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v1
Suzuki K Poulose (10):
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Allow driver to choose a different alignment
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add erratum detection for TRBE overwrite in FILL mode
coresight: trbe: Workaround TRBE errat overwrite in FILL mode
arm64: Enable workaround for TRBE overwrite in FILL mode
arm64: errata: Add workaround for TSB flush failures
Documentation/arm64/silicon-errata.rst | 8 +
arch/arm64/Kconfig | 70 ++++++
arch/arm64/include/asm/barrier.h | 17 +-
arch/arm64/include/asm/cputype.h | 4 +
arch/arm64/kernel/cpu_errata.c | 44 ++++
arch/arm64/tools/cpucaps | 2 +
drivers/hwtracing/coresight/coresight-trbe.c | 227 ++++++++++++++++---
7 files changed, 341 insertions(+), 31 deletions(-)
--
2.24.1
On 02/08/2021 07:43, Anshuman Khandual wrote:
>
>
> On 7/28/21 7:22 PM, Suzuki K Poulose wrote:
>> Add a minimal infrastructure to keep track of the errata
>> affecting the given TRBE instance. Given that we have
>> heterogeneous CPUs, we have to manage the list per-TRBE
>> instance to be able to apply the work around as needed.
>>
>> We rely on the arm64 errata framework for the actual
>> description and the discovery of a given erratum, to
>> keep the Erratum work around at a central place and
>> benefit from the code and the advertisement from the
>> kernel. We use a local mapping of the erratum to
>> avoid bloating up the individual TRBE structures.
>
> I guess there is no other way around apart from each TRBE instance
> tracking applicable erratas locally per CPU, even though it sounds
> bit redundant.
>
>> i.e, each arm64 TRBE erratum bit is assigned a new number
>> within the driver to track. Each trbe instance updates
>> the list of affected erratum at probe time on the CPU.
>> This makes sure that we can easily access the list of
>> errata on a given TRBE instance without much overhead.
>
> It also ensures that the generic errata framework is queried just
> once during individual CPU probe.
>
>>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++++++++++
>> 1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index b8586c170889..0368bf405e35 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,8 @@
>> #define pr_fmt(fmt) DRVNAME ": " fmt
>>
>> #include <asm/barrier.h>
>> +#include <asm/cputype.h>
>> +
>> #include "coresight-self-hosted-trace.h"
>> #include "coresight-trbe.h"
>>
>> @@ -65,6 +67,35 @@ struct trbe_buf {
>> struct trbe_cpudata *cpudata;
>> };
>>
>> +/*
>> + * TRBE erratum list
>> + *
>> + * We rely on the corresponding cpucaps to be defined for a given
>> + * TRBE erratum. We map the given cpucap into a TRBE internal number
>> + * to make the tracking of the errata lean.
>> + *
>> + * This helps in :
>> + * - Not duplicating the detection logic
>> + * - Streamlined detection of erratum across the system
>> + *
>> + * Since the erratum work arounds could be applied individually
>> + * per TRBE instance, we keep track of the list of errata that
>> + * affects the given instance of the TRBE.
>> + */
>> +#define TRBE_ERRATA_MAX 0
>> +
>> +static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> +};
>
> This needs to be tighten up. There should be build time guard rails in
> arm64 errata cpucaps, so that only TRBE specific ones could be assigned
> here as trbe_errata_cpucaps[].
I don't get your point. The actual arm64 erratum caps are not linear
and as such we don't have to force it. This approach gives us a hand
picked exact list of errata that apply to the TRBE driver by mapping
it linearly here. The only reason why we have that TRBE_ERRATA_MAX,
is such that we can track it per TRBE instance and ...
>
>> +
>> +/*
>> + * struct trbe_cpudata: TRBE instance specific data
>> + * @trbe_flag - TRBE dirty/access flag support
>> + * @tbre_align - Actual TRBE alignment required for TRBPTR_EL1.
>> + * @cpu - CPU this TRBE belongs to.
>> + * @mode - Mode of current operation. (perf/disabled)
>> + * @drvdata - TRBE specific drvdata
>> + * @errata - Bit map for the errata on this TRBE.
>> + */
>> struct trbe_cpudata {
>> bool trbe_flag;
>> u64 trbe_align;
>> @@ -72,6 +103,7 @@ struct trbe_cpudata {
>> enum cs_mode mode;
>> struct trbe_buf *buf;
>> struct trbe_drvdata *drvdata;
>> + DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);
>> };
>>
>> struct trbe_drvdata {
>> @@ -84,6 +116,21 @@ struct trbe_drvdata {
>> struct platform_device *pdev;
>> };
>>
>> +static void trbe_check_errata(struct trbe_cpudata *cpudata)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
>
> BUILD_BUG_ON() - if trbe_errata_cpucaps[i] is not inside TRBE specific
> errata cpucap range ?
... also run these detection tests.
>
>> + if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
>> + set_bit(i, cpudata->errata);
>> + }
>> +}
>> +
>> +static inline bool trbe_has_erratum(int i, struct trbe_cpudata *cpudata)
>
> Switch the argument positions here ? 'int i' should be the second one.
>
ok.
>> +{
>> + return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);
>> +}
>> +
>> static int trbe_alloc_node(struct perf_event *event)
>> {
>> if (event->cpu == -1)
>> @@ -925,6 +972,7 @@ static void arm_trbe_probe_cpu(void *info)
>> goto cpu_clear;
>> }
>>
>> + trbe_check_errata(cpudata);
>
> This should be called right at the end before arm_trbe_probe_cpu() exits
> on the success path. Errata should not be evaluated if TRBE on the CPU
> wont be used for some reason i.e cpumask_clear_cpu() path.
ok
>
>> cpudata->trbe_align = 1ULL << get_trbe_address_align(trbidr);
>> if (cpudata->trbe_align > SZ_2K) {
>> pr_err("Unsupported alignment on cpu %d\n", cpu);
>>
>
> This patch should be moved after [PATCH 5/10] i.e just before adding the
> first TRBE errata.
>
I will take a look.
Thanks for the review
Suzuki
In the previous patch set for fixing CoreSight snapshot mode [1], the
patch for perf tool has been merged into the mainline kernel [2]; other
two patches for CoreSight driver have been left out.
This patch series resends these two missed out patches, alongside
patches 01 and 02 are updated with minor improvement commits.
This patch series has been tested on Arm64 Juno board.
Changes from v2:
- Minor improvement the commits for patches 01 and 02.
[1] https://lore.kernel.org/lkml/20210701093537.90759-1-leo.yan@linaro.org/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Leo Yan (2):
coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
coresight: Update comments for removing cs_etm_find_snapshot()
drivers/hwtracing/coresight/coresight-etb10.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 12 ++++--------
3 files changed, 6 insertions(+), 10 deletions(-)
--
2.25.1
Changes since v1:
* Re-implement with a new magic number instead of piggybacking on ETMv4
* Improve comments and function name around cs_etm_decoder__get_etmv4_arch_ver()
* Add a warning for unrecognised magic numbers
* Split typo fix into new commit
* Add Leo's reviewed-by tags
* Create a new struct for ETE config (cs_ete_trace_params) instead of re-using ETMv4 config
Applies to perf/core f3c33cbd922
Also available at https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-ete-v2
James Clark (9):
perf cs-etm: Refactor initialisation of decoder params.
perf cs-etm: Initialise architecture based on TRCIDR1
perf cs-etm: Refactor out ETMv4 header saving
perf cs-etm: Save TRCDEVARCH register
perf cs-etm: Fix typo
perf cs-etm: Update OpenCSD decoder for ETE
perf cs-etm: Create ETE decoder
perf cs-etm: Print the decoder name
perf cs-etm: Show a warning for an unknown magic number
tools/build/feature/test-libopencsd.c | 4 +-
tools/perf/arch/arm/util/cs-etm.c | 97 ++++++++----
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 148 ++++++++----------
.../perf/util/cs-etm-decoder/cs-etm-decoder.h | 13 ++
tools/perf/util/cs-etm.c | 43 ++++-
tools/perf/util/cs-etm.h | 10 ++
6 files changed, 200 insertions(+), 115 deletions(-)
--
2.28.0
Hello everyone,
I'm interested in using CoreSight on a DragonBoard 410c. My preference
is to perform trace acquisition and decoding using perf tools. I
successfully cross-compiled the Linaro Linux release 21.03 [1] and
built a boot image with CoreSight components enabled. Then I tried to
figure out trace decoding, which I prefer to perform on a host
machine. I successfully compiled OpenCSD on my Ubuntu host using the
instructions in [2], but I got errors when I tried to compile perf
with OpenCSD.
Then I moved to the other option that was trace decoding on the board.
I successfully compiled OpenCSD on my DragonBoard 410c. Since the free
space on the eMMC is less than the size of the kernel source, I stored
the kernel source on a MicroSD card and tried to compile perf within
the MicroSD. But I got the following error in both cases of trying to
compile perf with OpenCSD (make -C tools/perf VF=1 CORESIGHT=1) or
compiling it standalone (make -C tools/perf):
linaro@linaro-alip:/media/linaro/mymicrosd/kernel$ make -C tools/perf
VF=1 CORESIGHT=1
make: Entering directory '/media/linaro/mymicrosd/kernel/tools/perf'
BUILD: Doing 'make -j4' parallel build
make[1]: ./check-headers.sh: Permission denied
make[1]: *** [Makefile.perf:232: sub-make] Error 127
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/media/linaro/mymicrosd/kernel/tools/perf'
To conclude, my problem regarding trace acquisition is that I cannot
compile perf on DragonBoard 410c, and my problem regarding trace
decoding is that I cannot compile perf with OpenCSD either on a Ubuntu
host or on the board.
Any help is greatly appreciated.
Regards,
Farzam
[1] https://releases.linaro.org/96boards/dragonboard410c/linaro/debian/21.03/
[2] https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md
Hi Tao,
Apologies for the late reply - this patch fell through the cracks.
On Thu, Aug 19, 2021 at 05:29:37PM +0800, Tao Zhang wrote:
> The input parameter of the function pm_runtime_put should be the
> same in the function cti_enable_hw and cti_disable_hw. The correct
> parameter to use here should be dev->parent.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-cti-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index e2a3620..8988b2e 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
> coresight_disclaim_device_unlocked(csdev);
> CS_LOCK(drvdata->base);
> spin_unlock(&drvdata->spinlock);
> - pm_runtime_put(dev);
> + pm_runtime_put(dev->parent);
You are correct - I have added this patch to my next tree.
Thanks,
Mathieu
> return 0;
>
> /* not disabled this call */
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>