Hi Tao,
On Sun, Sep 26, 2021 at 06:35:50PM +0800, Tao Zhang wrote:
> clang-12 fails to build the etm4x driver with -fsanitize=array-bounds,
> where it decides to unroll certain loops in a way that result in a
> C variable getting put into an inline assembly.
>
> Search this build failure and find this is a known issue and there
> has been a mail thread discussing it.
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210429145752.…
> According to the modification suggestions of this mail thread,
> coresight infrastucture has already provided another API that
> can replace the function that caused the error.
>
> Used here "csdev_access_read32" to replace the original API
> "etm4x_relaxed_read32".
>
> This patch applies to coresight/next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index f58afbab6e6d..0bca8e2be070 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -797,7 +797,7 @@ static void etm4_disable_hw(void *info)
> /* read back the current counter values */
> for (i = 0; i < drvdata->nr_cntr; i++) {
> config->cntr_val[i] =
> - etm4x_relaxed_read32(csa, TRCCNTVRn(i));
> + csdev_access_read32(csa, TRCCNTVRn(i));
It seems like the patch you are referencing above was never applied... So the
question is, how is it that only this instance is giving you trouble when there
are many more instances of the same pattern in the file?
Thanks,
Mathieu
> }
>
> coresight_disclaim_device_unlocked(csdev);
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
On 23/09/2021 07:13, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>> an erratum, which when triggered, might cause the TRBE to overwrite
>> the trace data already collected in FILL mode, in the event of a WRAP.
>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>> and could write upto 3 cache line size worth trace. Thus, this could
>> corrupt the trace at the "BASE" pointer.
>>
>> The workaround is to program the write pointer 256bytes from the
>
> 3 cache lines = 256 bytes on all implementation which might have TRBE ?
> OR this skid bytes should be derived from the platform cache line size
> instead.
256bytes is the aligned (to the power of 2) value for the safe guard.
Not 3 cache lines. Ideally, if there is another CPU that has larger
cache line size, affected by the erratum, yes, we must do that.
But for now this is sufficient.
>
>> base, such that if the erratum is triggered, it doesn't overwrite
>> the trace data that was captured. This skipped region could be
>> padded with ignore packets at the end of the session, so that
>> the decoder sees a continuous buffer with some padding at the
>> beginning. The trace data written at the base is considered
>> lost as the limit could have been in the middle of the perf
>> ring buffer, and jumping to the "base" is not acceptable.
>> We set the flags already to indicate that some amount of trace
>> was lost during the FILL event IRQ. So this is fine.
>
> Via PERF_AUX_FLAG_TRUNCATED ? Should be specified here to be clear.
Please note that setting the flag is not a side effect of the
work around. And as such I don't think this needs to be mentioned
here. e.g, we changed this to COLLISION recently for WRAP events.
It makes sense to keep the details to the driver.
>
>>
>> One important change with the work around is, we program the
>> TRBBASER_EL1 to current page where we are allowed to write.
>> Otherwise, it could overwrite a region that may be consumed
>> by the perf. Towards this, we always make sure that the
>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>> so that we can set the BASE to the PAGE base and move the
>> TRBPTR to the 256bytes offset.
>>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> Change since v1:
>> - Updated comment with ASCII art
>> - Add _BYTES suffix for the space to skip for the work around.
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>> 1 file changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index f569010c672b..983dd5039e52 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,7 @@
>> #define pr_fmt(fmt) DRVNAME ": " fmt
>>
>> #include <asm/barrier.h>
>> +#include <asm/cpufeature.h>
>> #include <asm/cputype.h>
>>
>> #include "coresight-self-hosted-trace.h"
>> @@ -84,9 +85,17 @@ struct trbe_buf {
>> * per TRBE instance, we keep track of the list of errata that
>> * affects the given instance of the TRBE.
>> */
>> -#define TRBE_ERRATA_MAX 0
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> +#define TRBE_ERRATA_MAX 1
>> +
>> +/*
>> + * Safe limit for the number of bytes that may be overwritten
>> + * when the erratum is triggered.
>> + */
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256
>
> As mentioned earlier, does it depend on the platform cache line size ?
> Otherwise if the skip bytes is something platform independent, should
> be mentioned here in a comment.
I could add in a comment.
>
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> + [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> };
>>
>> /*
>> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>> set_trbe_limit_pointer_enabled(buf->trbe_limit);
>> }
>>
>> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
>> + u64 trbsr)
>> {
>> int ec = get_trbe_ec(trbsr);
>> int bsc = get_trbe_bsc(trbsr);
>> + struct trbe_buf *buf = etm_perf_sink_config(handle);
>> + struct trbe_cpudata *cpudata = buf->cpudata;
>
> Passing down the perf handle to derive trbe_cpudata seems to be right.
>
>>
>> WARN_ON(is_trbe_running(trbsr));
>> if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>> return TRBE_FAULT_ACT_FATAL;
>>
>> - if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> - if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> - return TRBE_FAULT_ACT_WRAP;
>> - }
>> + /*
>> + * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> + * it might write data after a WRAP event in the fill mode.
>> + * Thus the check TRBPTR == TRBBASER will not be honored.
>> + */
>
> Needs bit formatting/alignment cleanup.
>
>> + if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
>> + (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
>> + get_trbe_write_pointer() == get_trbe_base_pointer()))
>> + return TRBE_FAULT_ACT_WRAP;
>> +
>
> Right, TRBE without the errata should continue to have the write
> pointer = base pointer check. Could all TRBE errata checks like
> the following be shortened (without the workaround index) for
> better readability ? But not something very important.
>
> trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)
Do you mean something like :
trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ->
trbe_may_overwrite_in_fill_mode(cpudata) ?
>
>
>> return TRBE_FAULT_ACT_SPURIOUS;
>> }
>>
>> @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>> {
>> u64 write;
>> u64 start_off, end_off;
>> + u64 size;
>> + u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>>
>> /*
>> * If the TRBE has wrapped around the write pointer has
>> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>
>> if (WARN_ON_ONCE(end_off < start_off))
>> return 0;
>> - return (end_off - start_off);
>> +
>> + size = end_off - start_off;
>> + /*
>> + * If the TRBE is affected by the following erratum, we must fill
>> + * the space we skipped with IGNORE packets. And we are always
>> + * guaranteed to have at least a PAGE_SIZE space in the buffer.
>> + */
>> + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
>> + !WARN_ON(size < overwrite_skip))
>> + __trbe_pad_buf(buf, start_off, overwrite_skip);
>> +
>> + return size;
>> }
>>
>> static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> clr_trbe_irq();
>> isb();
>>
>> - act = trbe_get_fault_act(status);
>> + act = trbe_get_fault_act(handle, status);
>> /*
>> * If this was not due to a WRAP event, we have some
>> * errors and as such buffer is empty.
>> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> return size;
>> }
>>
>> +
>> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>> +{
>> + /*
>> + * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
>
> few cache lines = 3 cache lines ?
Yes, upto 3.
>
>> + * line size from the "TRBBASER_EL1" in the event of a "FILL".
>> + * Thus, we could loose some amount of the trace at the base.
>> + *
>> + * Before Fix:
>> + *
>> + * normal-BASE head normal-PTR tail normal-LIMIT
>> + * | \/ /
>> + * -------------------------------------------------------------
>> + * | | |xyzdefghij..|... tuvw| |
>> + * -------------------------------------------------------------
>> + * / | \
>> + * After Fix-> TRBBASER TRBPTR TRBLIMITR.LIMIT
>> + *
>> + * In the normal course of action, we would set the TRBBASER to the
>> + * beginning of the ring-buffer (normal-BASE). But with the erratum,
>> + * the TRBE could overwrite the contents at the "normal-BASE", after
>> + * hitting the "normal-LIMIT", since it doesn't stop as expected. And
>> + * this is wrong. So we must always make sure that the TRBBASER is
>> + * within the region [head, head+size].
>> + *
>> + * Also, we would set the TRBPTR to head (after adjusting for
>> + * alignment) at normal-PTR. This would mean that the last few bytes
>> + * of the trace (say, "xyz") might overwrite the first few bytes of
>> + * trace written ("abc"). More importantly they will appear in what\
>> + * userspace sees as the beginning of the trace, which is wrong. We may
>> + * not always have space to move the latest trace "xyz" to the correct
>> + * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
>> + * Thus it is easier to ignore those bytes than to complicate the
>> + * driver to move it, assuming that the erratum was triggered and doing
>> + * additional checks to see if there is indeed allowed space at
>> + * TRBLIMITR.LIMIT.
>> + *
>> + * To summarize, with the work around:
>> + *
>> + * - We always align the offset for the next session to PAGE_SIZE
>> + * (This is to ensure we can program the TRBBASER to this offset
>> + * within the region [head...head+size]).
>> + *
>> + * - At TRBE enable:
>> + * - Set the TRBBASER to the page aligned offset of the current
>> + * proposed write offset. (which is guaranteed to be aligned
>> + * as above)
>> + * - Move the TRBPTR to skip first 256bytes (that might be
>> + * overwritten with the erratum). This ensures that the trace
>> + * generated in the session is not re-written.
>> + *
>> + * - At trace collection:
>> + * - Pad the 256bytes skipped above again with IGNORE packets.
>> + */
>> + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
>> + if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
>> + return -EINVAL;
>> + buf->trbe_hw_base = buf->trbe_write;
>> + buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int __arm_trbe_enable(struct trbe_buf *buf,
>> struct perf_output_handle *handle)
>> {
>> + int ret = 0;
>> +
>> perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>> buf->trbe_limit = compute_trbe_buffer_limit(handle);
>> buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>> if (buf->trbe_limit == buf->trbe_base) {
>> - trbe_stop_and_truncate_event(handle);
>> - return -ENOSPC;
>> + ret = -ENOSPC;
>> + goto err;
>> }
>> /* Set the base of the TRBE to the buffer base */
>> buf->trbe_hw_base = buf->trbe_base;
>> +
>> + ret = trbe_apply_work_around_before_enable(buf);
>> + if (ret)
>> + goto err;
>> +
>> *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>> trbe_enable_hw(buf);
>> return 0;
>> +err:
>> + trbe_stop_and_truncate_event(handle);
>> + return ret;
>> }
>>
>> static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> if (!is_perf_trbe(handle))
>> return IRQ_NONE;
>>
>> - act = trbe_get_fault_act(status);
>> + act = trbe_get_fault_act(handle, status);
>> switch (act) {
>> case TRBE_FAULT_ACT_WRAP:
>> truncated = !!trbe_handle_overflow(handle);
>> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>> }
>>
>> trbe_check_errata(cpudata);
>> - cpudata->trbe_align = cpudata->trbe_hw_align;
>> + /*
>> + * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> + * we must always program the TBRPTR_EL1, 256bytes from a page
>> + * boundary, with TRBBASER_EL1 set to the page, to prevent
>> + * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
>> + *
>> + * Thus make sure we always align our write pointer to a PAGE_SIZE,
>> + * which also guarantees that we have at least a PAGE_SIZE space in
>> + * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
>> + * the required bytes at the base.
>> + */
>> + if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
>> + cpudata->trbe_align = PAGE_SIZE;
>> + else
>> + cpudata->trbe_align = cpudata->trbe_hw_align;
>> +
>
> But like trbe_apply_work_around_before_enable(), trbe_align assignment
> should also be wrapped inside a new helper which should contain these
> comments and conditional block. Because it makes sense to have errata
> work arounds in the leaf level helper functions, rather than TRBE core
> operations.
That would imply we re-initialize the trbe_align in the new helper after
setting the value here for all other unaffected TRBEs. I would rather
leave it as it is, until we have more work arounds that touch this area.
This is one of code called per TRBE instance.
Suzuki
On 23/09/2021 04:15, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>
>> - A page used by the rest of the kernel/user (if the LIMIT = end of
>> perf ring buffer)
>> - A page within the ring buffer, but outside the driver's range.
>> [head, head + size]. This may contain some trace data, may be
>> consumed by the userspace.
>>
>> We workaround this erratum by :
>> - Making sure that there is at least an extra PAGE space left in the
>> TRBE's range than we normally assign. This will be additional to other
>> restrictions (e.g, the TRBE alignment for working around
>> TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>> Thus we would have 2 * PAGE_SIZE)
>>
>> - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>> range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>
>> TRBLIMITR.LIMIT -= PAGE_SIZE
>>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 02f9e00e2091..ea907345354c 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -86,7 +86,8 @@ struct trbe_buf {
>> * affects the given instance of the TRBE.
>> */
>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> -#define TRBE_ERRATA_MAX 1
>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
>> +#define TRBE_ERRATA_MAX 2
>>
>> /*
>> * Safe limit for the number of bytes that may be overwritten
>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>> };
>>
>> /*
>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>
>> static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>> {
>> - return TRBE_TRACE_MIN_BUF_SIZE;
>> + u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>> +
>> + /*
>> + * When the TRBE is affected by an erratum that could make it
>> + * write to the next "virtually addressed" page beyond the LIMIT.
>
> What if the next "virtually addressed" page is just blocked from future
> usage in the kernel and never really gets mapped into a physical page ?
That is the case today for vmap(), the end of the vm_area has a guard
page. But that implies when the erratum is triggered, the TRBE
encounters a fault and we need to handle that in the driver. This works
for "end" of the ring buffer. But not when the LIMIT is in the middle
of the ring buffer.
> In that case it would be guaranteed that, a next "virtually addressed"
> page would not even exist after the LIMIT pointer and hence the errata
> would not be triggered. Something like there is a virtual mapping cliff
> right after the LIMIT pointer from the MMU perspective.
>
> Although it might be bit tricky. Currently the entire ring buffer gets
> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
> the above solution, each computation of the LIMIT pointer needs to be
> followed by a temporary unmapping of next virtual page from existing
> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
> always contains all the physical pages from the perf ring buffer.
It is much easier to leave a page aside than to do this map, unmap
dance, which might even change the VA address you get and thus it
complicates the TRBE driver in general. I believe this is much
simpler and we can reason about the code better. And all faults
are still illegal for the driver, which helps us to detect any
other issues in the TRBE.
Suzuki
On 24/09/2021 11:06, Tao Zhang wrote:
> Add ETM PID for Kryo-5XX to the list of supported ETMs.
> Otherwise, Kryo-5XX ETMs will not be initialized successfully.
> e.g.
> This change can be verified on qrb5165-rb5 board. ETM4-ETM7 nodes
> will not be visible without this change.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
I have queued this one.
Suzuki
Hi Tao
On 24/09/2021 11:06, Tao Zhang wrote:
> Add the basic coresight components found on Qualcomm SM8250 Soc. The
> basic coresight components include ETF, ETMs,STM and the related
> funnels.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Acked-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
PS: This patch must go via the Qcom DT maintainers. I would
recommend sending this to the following people, so that it
can be queued.
$ scripts/get_maintainer.pl arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
Andy Gross <agross(a)kernel.org> (maintainer:ARM/QUALCOMM SUPPORT)
Bjorn Andersson <bjorn.andersson(a)linaro.org> (maintainer:ARM/QUALCOMM
SUPPORT)
Rob Herring <robh+dt(a)kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED
DEVICE TREE BINDINGS)
linux-arm-msm(a)vger.kernel.org (open list:ARM/QUALCOMM SUPPORT)
devicetree(a)vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE
TREE BINDINGS)
linux-kernel(a)vger.kernel.org (open list)
Kind regards
Suzuki
Here is the updated patches from the v3 of the series [0], parts of which
have been queued to coresight/next. This series now applies on the
coresight/next branch.
Changes since v3:
- Split the spurious IRQ handling patch to :
a) coresight: trbe: irq handler: Do not disable TRBE if no action is needed
b) coresight: trbe: Fix handling of spurious interrupts
- Added a helper to mark the ring buffer when there is WRAP event
and added a comment to explain.
[0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com
Suzuki K Poulose (5):
coresight: trbe: irq handler: Do not disable TRBE if no action is needed
coresight: trbe: Fix handling of spurious interrupts
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Prohibit trace before disabling TRBE
.../coresight/coresight-self-hosted-trace.h | 4 +-
drivers/hwtracing/coresight/coresight-trbe.c | 96 ++++++++++++-------
2 files changed, 64 insertions(+), 36 deletions(-)
--
2.24.1
On 23/09/2021 10:24, Tao Zhang wrote:
> On Tue, Sep 21, 2021 at 05:35:37PM +0100, Suzuki K Poulose wrote:
>> Hi Tao
>>
>> Are there no sinks at all on this platform ? I had this question on the
>> previous series. How is CoreSight useful on this platform otherwise ?
>>
>> On 13/09/2021 07:40, Tao Zhang wrote:
> ETF/ETR are the sinks on this target. And I have added the ETF to this
> device tree file. Since the ETR needs SMMU support on this target and
> SMMU has not been supported for now. I will add the ETR to device tree
> later if the SMMU is ready for this platform.
Thanks. That is fine. Btw, these sort of additional information could be
added to the cover letter to give a better picture of what you are
trying to do and why.
>>> Add the basic coresight components found on Qualcomm SM8250 Soc. The
>>> basic coresight components include ETF, ETMs,STM and the related
>>> funnels.
>>>
>>> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
>>> ---
>>> arch/arm64/boot/dts/qcom/qrb5165-rb5.dts | 442 ++++++++++++++++++++++-
>>> 1 file changed, 438 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> index 8ac96f8e79d4..9c8f87d80afc 100644
>>> --- a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> +++ b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> @@ -222,11 +222,445 @@
>>> regulator-max-microvolt = <1800000>;
>>> regulator-always-on;
>>> };
>>> -};
>>> -&adsp {
>>> - status = "okay";
>>> - firmware-name = "qcom/sm8250/adsp.mbn";
>>
>> Unrelated change ? Please keep it separate from the CoreSight changes.
>>
>> Suzuki
> I combined this change and ETM pid change into one seies because the ETM
> pid change validation needs ETM support. If there is no ETM
> configuration in the device tree, ETM pid change can not be verified.
> Do you think it would be better to separate them? Do I need to resubmit
> to separate them into two separate patches?
No, I am asking about the lines removed above. i.e,
-&adsp {
- status = "okay";
- firmware-name = "qcom/sm8250/adsp.mbn";
It doesn't seem to be added back in the patch. So that means, the DT
lost those lines, without any mention of that in the description of
the patch. Moreover, the lines do not look like they were anything to
do with CoreSight. This is why I mentioned they look like "unrelated".
Suzuki
On 23/09/2021 10:16, Tao Zhang wrote:
> On Tue, Sep 21, 2021 at 05:31:34PM +0100, Suzuki K Poulose wrote:
>> Hi Tao
>>
>> On 13/09/2021 07:40, Tao Zhang wrote:
>>> This series adds Coresight support for SM8250 Soc on RB5 board.
>>> It is composed of two elements.
>>> a) Add ETM PID for Kryo-5XX.
>>> b) Add coresight support to DTS for RB5.
>>>
>>> This series applies to coresight/next
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>>
>>
>> Please could you mention what has changed since the previous version
>> in the cover letter ?
>>
>> Kind regards
>> Suzuki
>>
> The version 2 of the series add more comments "Cortex-A77" for ETM pid.
> Do I need to rewrite the cover letter and then resubmit it for review?
No need to resubmit the series for reveiwing the patch. Ultimately
patches are what matters. But it is a good practise to keep the
changelog in the cover letter for a revision of a series. That helps
the reviewers to understand what has changed and we all are able to
spend our time efficiently. So, please follow that if possible.
Thanks
Suzuki
This series fixes the following issues with the TRBE and Self-Hosted
trace for CoreSight.
The Self-hosted trace filter control registers are now save-restored
across CPU PM event. And more importantly the Trace Filtering is now
used to control per ETM session (rather than allowing the trace
throughout the life time of the system). i.e, ETM configuration of
the given run is used to enforce trace filtering (TRFCR) along with the
Trace Exclusion controls in TRCVICTLR.
For the TRBE, we were using the TRUNCATED flag in the AUX buffer on
every IRQ to indicate that we may have lost a few bytes of trace. But
this causes the event to be disabled until the userspace re-enables
it back, even when there is space left in the ring buffer. To make
things worse, we were restarting the AUX handle, which would soon
be disabled, potentially creating 0 sized records (without truncation),
which the perf tool tends to ignore. This might cause the event to be
disabled permanently. Also, sometimes we leave the buffer TRUNCATED,
but delay the closing of the handle to event schedule out, which could
cause significant black out in the trace capture. This was reported
by Tamas Zsoldos.
This series removes the use of TRUNCATED flag for every IRQ. Instead,
uses the COLLISION flag to indicate that there was a WRAP event in the
trace collection. A patch to report the COLLISIONs has been queued for
the perf tool. The TRUNCATED is only used if we really run out of space
in the buffer. And also we make sure the "handle" is closed immediately
on TRUNCATED case, which triggers the userspace to take action. The core
perf layer has been hardened to handle this case where a "handle" is
closed out.
Finally, we make sure that the CPU trace is prohibited, when the TRBE
is left disabled. The ETE/ETM driver will program the Trace Filtering
appropriately since we do this dynamically now with the first half
of the series.
Applies on coresigh/next.
The tree is also available here:
https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/trbe-self-hoste…
Changes since v2 [1]:
- Address comments from Anshuman
- Perf tool patch queued, to report COLLISIONs in a session.
Changes since v1 [0]:
- Moved TRFCR related accessors to a new header file
- Following a discussion, dropped the TRUNCATED flag from
the TRBE IRQ handler on WRAP. Instead mark COLLISION.
- Added new patches to harden the ETM perf layer to handle
an error in the sink driver.
- Fix TRBE spurious IRQ handling
- Cleanup TRBE driver to make the "TRUNCATE" cases managed
at a central place.
[0] https://lkml.kernel.org/r/20210712113830.2803257-1-suzuki.poulose@arm.com
[1] https://lkml.kernel.org/r/20210723124611.3828908-1-suzuki.poulose@arm.com
Suzuki K Poulose (10):
coresight: etm4x: Save restore TRFCR_EL1
coresight: etm4x: Use Trace Filtering controls dynamically
coresight: etm-pmu: Ensure the AUX handle is valid
coresight: trbe: Ensure the format flag is set always
coresight: trbe: Drop duplicate TRUNCATE flags
coresight: trbe: Fix handling of spurious interrupts
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: Unify the enabling sequence
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Prohibit trace before disabling TRBE
.../hwtracing/coresight/coresight-etm-perf.c | 27 ++++-
.../coresight/coresight-etm4x-core.c | 100 ++++++++++++----
drivers/hwtracing/coresight/coresight-etm4x.h | 9 +-
.../coresight/coresight-self-hosted-trace.h | 34 ++++++
drivers/hwtracing/coresight/coresight-trbe.c | 111 ++++++++++--------
5 files changed, 200 insertions(+), 81 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-self-hosted-trace.h
--
2.24.1
On Wed, Sep 15, 2021 at 10:56:17AM +0530, Anshuman Khandual wrote:
>
>
> On 9/14/21 3:56 PM, Suzuki K Poulose wrote:
> > We mark the buffer as TRUNCATED when there is no space left
> > in the buffer. But we do it at different points.
> > __trbe_normal_offset()
> > and also, at all the callers of the above function via
> > compute_trbe_buffer_limit(), when the limit == base (i.e
> > offset = 0 as returned by the __trbe_normal_offset()).
> >
> > So, given that the callers already mark the buffer as TRUNCATED
> > drop the caller inside the __trbe_normal_offset().
> >
> > This is in preparation to moving the handling of TRUNCATED
> > into a central place.
> >
> > Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
> > Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
> > Cc: Mike Leach <mike.leach(a)linaro.org>
> > Cc: Leo Yan <leo.yan(a)linaro.org>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>
> Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
>
> A small nit though.
>
> > ---
> > Change since v2:
> > - Added the flag for trbe_handle_spurious() - Anshuman
> > ---
> > drivers/hwtracing/coresight/coresight-trbe.c | 8 ++------
> > 1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > index 81bf183a73a1..5297b11f26b7 100644
> > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > @@ -252,13 +252,9 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
> > * trbe_base trbe_base + nr_pages
> > *
> > * Perf aux buffer does not have any space for the driver to write into.
> > - * Just communicate trace truncation event to the user space by marking
> > - * it with PERF_AUX_FLAG_TRUNCATED.
>
> Should not this comment be moved, where PERF_AUX_FLAG_TRUNCATED
> gets marked after this change ?
I moved the comment and applied this patch.
>
> > */
> > - if (!handle->size) {
> > - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > + if (!handle->size)
> > return 0;
> > - }
> >
> > /* Compute the tail and wakeup indices now that we've aligned head */
> > tail = PERF_IDX2OFF(handle->head + handle->size, buf);
> > @@ -360,7 +356,6 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
> > return limit;
> >
> > trbe_pad_buf(handle, handle->size);
> > - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > return 0;
> > }
> >
> > @@ -688,6 +683,7 @@ static void trbe_handle_spurious(struct perf_output_handle *handle)
> > buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
> > if (buf->trbe_limit == buf->trbe_base) {
> > trbe_drain_and_disable_local();
> > + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > return;
> > }
> > trbe_enable_hw(buf);
> >