On 23/09/2021 04:15, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>
>> - A page used by the rest of the kernel/user (if the LIMIT = end of
>> perf ring buffer)
>> - A page within the ring buffer, but outside the driver's range.
>> [head, head + size]. This may contain some trace data, may be
>> consumed by the userspace.
>>
>> We workaround this erratum by :
>> - Making sure that there is at least an extra PAGE space left in the
>> TRBE's range than we normally assign. This will be additional to other
>> restrictions (e.g, the TRBE alignment for working around
>> TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>> Thus we would have 2 * PAGE_SIZE)
>>
>> - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>> range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>
>> TRBLIMITR.LIMIT -= PAGE_SIZE
>>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 02f9e00e2091..ea907345354c 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -86,7 +86,8 @@ struct trbe_buf {
>> * affects the given instance of the TRBE.
>> */
>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> -#define TRBE_ERRATA_MAX 1
>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
>> +#define TRBE_ERRATA_MAX 2
>>
>> /*
>> * Safe limit for the number of bytes that may be overwritten
>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>> };
>>
>> /*
>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>
>> static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>> {
>> - return TRBE_TRACE_MIN_BUF_SIZE;
>> + u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>> +
>> + /*
>> + * When the TRBE is affected by an erratum that could make it
>> + * write to the next "virtually addressed" page beyond the LIMIT.
>
> What if the next "virtually addressed" page is just blocked from future
> usage in the kernel and never really gets mapped into a physical page ?
That is the case today for vmap(), the end of the vm_area has a guard
page. But that implies when the erratum is triggered, the TRBE
encounters a fault and we need to handle that in the driver. This works
for "end" of the ring buffer. But not when the LIMIT is in the middle
of the ring buffer.
> In that case it would be guaranteed that, a next "virtually addressed"
> page would not even exist after the LIMIT pointer and hence the errata
> would not be triggered. Something like there is a virtual mapping cliff
> right after the LIMIT pointer from the MMU perspective.
>
> Although it might be bit tricky. Currently the entire ring buffer gets
> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
> the above solution, each computation of the LIMIT pointer needs to be
> followed by a temporary unmapping of next virtual page from existing
> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
> always contains all the physical pages from the perf ring buffer.
It is much easier to leave a page aside than to do this map, unmap
dance, which might even change the VA address you get and thus it
complicates the TRBE driver in general. I believe this is much
simpler and we can reason about the code better. And all faults
are still illegal for the driver, which helps us to detect any
other issues in the TRBE.
Suzuki
On 24/09/2021 11:06, Tao Zhang wrote:
> Add ETM PID for Kryo-5XX to the list of supported ETMs.
> Otherwise, Kryo-5XX ETMs will not be initialized successfully.
> e.g.
> This change can be verified on qrb5165-rb5 board. ETM4-ETM7 nodes
> will not be visible without this change.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
I have queued this one.
Suzuki
Hi Tao
On 24/09/2021 11:06, Tao Zhang wrote:
> Add the basic coresight components found on Qualcomm SM8250 Soc. The
> basic coresight components include ETF, ETMs,STM and the related
> funnels.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Acked-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
PS: This patch must go via the Qcom DT maintainers. I would
recommend sending this to the following people, so that it
can be queued.
$ scripts/get_maintainer.pl arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
Andy Gross <agross(a)kernel.org> (maintainer:ARM/QUALCOMM SUPPORT)
Bjorn Andersson <bjorn.andersson(a)linaro.org> (maintainer:ARM/QUALCOMM
SUPPORT)
Rob Herring <robh+dt(a)kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED
DEVICE TREE BINDINGS)
linux-arm-msm(a)vger.kernel.org (open list:ARM/QUALCOMM SUPPORT)
devicetree(a)vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE
TREE BINDINGS)
linux-kernel(a)vger.kernel.org (open list)
Kind regards
Suzuki
Here is the updated patches from the v3 of the series [0], parts of which
have been queued to coresight/next. This series now applies on the
coresight/next branch.
Changes since v3:
- Split the spurious IRQ handling patch to :
a) coresight: trbe: irq handler: Do not disable TRBE if no action is needed
b) coresight: trbe: Fix handling of spurious interrupts
- Added a helper to mark the ring buffer when there is WRAP event
and added a comment to explain.
[0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com
Suzuki K Poulose (5):
coresight: trbe: irq handler: Do not disable TRBE if no action is needed
coresight: trbe: Fix handling of spurious interrupts
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Prohibit trace before disabling TRBE
.../coresight/coresight-self-hosted-trace.h | 4 +-
drivers/hwtracing/coresight/coresight-trbe.c | 96 ++++++++++++-------
2 files changed, 64 insertions(+), 36 deletions(-)
--
2.24.1
On 23/09/2021 10:24, Tao Zhang wrote:
> On Tue, Sep 21, 2021 at 05:35:37PM +0100, Suzuki K Poulose wrote:
>> Hi Tao
>>
>> Are there no sinks at all on this platform ? I had this question on the
>> previous series. How is CoreSight useful on this platform otherwise ?
>>
>> On 13/09/2021 07:40, Tao Zhang wrote:
> ETF/ETR are the sinks on this target. And I have added the ETF to this
> device tree file. Since the ETR needs SMMU support on this target and
> SMMU has not been supported for now. I will add the ETR to device tree
> later if the SMMU is ready for this platform.
Thanks. That is fine. Btw, these sort of additional information could be
added to the cover letter to give a better picture of what you are
trying to do and why.
>>> Add the basic coresight components found on Qualcomm SM8250 Soc. The
>>> basic coresight components include ETF, ETMs,STM and the related
>>> funnels.
>>>
>>> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
>>> ---
>>> arch/arm64/boot/dts/qcom/qrb5165-rb5.dts | 442 ++++++++++++++++++++++-
>>> 1 file changed, 438 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> index 8ac96f8e79d4..9c8f87d80afc 100644
>>> --- a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> +++ b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
>>> @@ -222,11 +222,445 @@
>>> regulator-max-microvolt = <1800000>;
>>> regulator-always-on;
>>> };
>>> -};
>>> -&adsp {
>>> - status = "okay";
>>> - firmware-name = "qcom/sm8250/adsp.mbn";
>>
>> Unrelated change ? Please keep it separate from the CoreSight changes.
>>
>> Suzuki
> I combined this change and ETM pid change into one seies because the ETM
> pid change validation needs ETM support. If there is no ETM
> configuration in the device tree, ETM pid change can not be verified.
> Do you think it would be better to separate them? Do I need to resubmit
> to separate them into two separate patches?
No, I am asking about the lines removed above. i.e,
-&adsp {
- status = "okay";
- firmware-name = "qcom/sm8250/adsp.mbn";
It doesn't seem to be added back in the patch. So that means, the DT
lost those lines, without any mention of that in the description of
the patch. Moreover, the lines do not look like they were anything to
do with CoreSight. This is why I mentioned they look like "unrelated".
Suzuki
On 23/09/2021 10:16, Tao Zhang wrote:
> On Tue, Sep 21, 2021 at 05:31:34PM +0100, Suzuki K Poulose wrote:
>> Hi Tao
>>
>> On 13/09/2021 07:40, Tao Zhang wrote:
>>> This series adds Coresight support for SM8250 Soc on RB5 board.
>>> It is composed of two elements.
>>> a) Add ETM PID for Kryo-5XX.
>>> b) Add coresight support to DTS for RB5.
>>>
>>> This series applies to coresight/next
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>>
>>
>> Please could you mention what has changed since the previous version
>> in the cover letter ?
>>
>> Kind regards
>> Suzuki
>>
> The version 2 of the series add more comments "Cortex-A77" for ETM pid.
> Do I need to rewrite the cover letter and then resubmit it for review?
No need to resubmit the series for reveiwing the patch. Ultimately
patches are what matters. But it is a good practise to keep the
changelog in the cover letter for a revision of a series. That helps
the reviewers to understand what has changed and we all are able to
spend our time efficiently. So, please follow that if possible.
Thanks
Suzuki
This series fixes the following issues with the TRBE and Self-Hosted
trace for CoreSight.
The Self-hosted trace filter control registers are now save-restored
across CPU PM event. And more importantly the Trace Filtering is now
used to control per ETM session (rather than allowing the trace
throughout the life time of the system). i.e, ETM configuration of
the given run is used to enforce trace filtering (TRFCR) along with the
Trace Exclusion controls in TRCVICTLR.
For the TRBE, we were using the TRUNCATED flag in the AUX buffer on
every IRQ to indicate that we may have lost a few bytes of trace. But
this causes the event to be disabled until the userspace re-enables
it back, even when there is space left in the ring buffer. To make
things worse, we were restarting the AUX handle, which would soon
be disabled, potentially creating 0 sized records (without truncation),
which the perf tool tends to ignore. This might cause the event to be
disabled permanently. Also, sometimes we leave the buffer TRUNCATED,
but delay the closing of the handle to event schedule out, which could
cause significant black out in the trace capture. This was reported
by Tamas Zsoldos.
This series removes the use of TRUNCATED flag for every IRQ. Instead,
uses the COLLISION flag to indicate that there was a WRAP event in the
trace collection. A patch to report the COLLISIONs has been queued for
the perf tool. The TRUNCATED is only used if we really run out of space
in the buffer. And also we make sure the "handle" is closed immediately
on TRUNCATED case, which triggers the userspace to take action. The core
perf layer has been hardened to handle this case where a "handle" is
closed out.
Finally, we make sure that the CPU trace is prohibited, when the TRBE
is left disabled. The ETE/ETM driver will program the Trace Filtering
appropriately since we do this dynamically now with the first half
of the series.
Applies on coresigh/next.
The tree is also available here:
https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/trbe-self-hoste…
Changes since v2 [1]:
- Address comments from Anshuman
- Perf tool patch queued, to report COLLISIONs in a session.
Changes since v1 [0]:
- Moved TRFCR related accessors to a new header file
- Following a discussion, dropped the TRUNCATED flag from
the TRBE IRQ handler on WRAP. Instead mark COLLISION.
- Added new patches to harden the ETM perf layer to handle
an error in the sink driver.
- Fix TRBE spurious IRQ handling
- Cleanup TRBE driver to make the "TRUNCATE" cases managed
at a central place.
[0] https://lkml.kernel.org/r/20210712113830.2803257-1-suzuki.poulose@arm.com
[1] https://lkml.kernel.org/r/20210723124611.3828908-1-suzuki.poulose@arm.com
Suzuki K Poulose (10):
coresight: etm4x: Save restore TRFCR_EL1
coresight: etm4x: Use Trace Filtering controls dynamically
coresight: etm-pmu: Ensure the AUX handle is valid
coresight: trbe: Ensure the format flag is set always
coresight: trbe: Drop duplicate TRUNCATE flags
coresight: trbe: Fix handling of spurious interrupts
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: Unify the enabling sequence
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Prohibit trace before disabling TRBE
.../hwtracing/coresight/coresight-etm-perf.c | 27 ++++-
.../coresight/coresight-etm4x-core.c | 100 ++++++++++++----
drivers/hwtracing/coresight/coresight-etm4x.h | 9 +-
.../coresight/coresight-self-hosted-trace.h | 34 ++++++
drivers/hwtracing/coresight/coresight-trbe.c | 111 ++++++++++--------
5 files changed, 200 insertions(+), 81 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-self-hosted-trace.h
--
2.24.1
On Wed, Sep 15, 2021 at 10:56:17AM +0530, Anshuman Khandual wrote:
>
>
> On 9/14/21 3:56 PM, Suzuki K Poulose wrote:
> > We mark the buffer as TRUNCATED when there is no space left
> > in the buffer. But we do it at different points.
> > __trbe_normal_offset()
> > and also, at all the callers of the above function via
> > compute_trbe_buffer_limit(), when the limit == base (i.e
> > offset = 0 as returned by the __trbe_normal_offset()).
> >
> > So, given that the callers already mark the buffer as TRUNCATED
> > drop the caller inside the __trbe_normal_offset().
> >
> > This is in preparation to moving the handling of TRUNCATED
> > into a central place.
> >
> > Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
> > Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
> > Cc: Mike Leach <mike.leach(a)linaro.org>
> > Cc: Leo Yan <leo.yan(a)linaro.org>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>
> Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
>
> A small nit though.
>
> > ---
> > Change since v2:
> > - Added the flag for trbe_handle_spurious() - Anshuman
> > ---
> > drivers/hwtracing/coresight/coresight-trbe.c | 8 ++------
> > 1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > index 81bf183a73a1..5297b11f26b7 100644
> > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > @@ -252,13 +252,9 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
> > * trbe_base trbe_base + nr_pages
> > *
> > * Perf aux buffer does not have any space for the driver to write into.
> > - * Just communicate trace truncation event to the user space by marking
> > - * it with PERF_AUX_FLAG_TRUNCATED.
>
> Should not this comment be moved, where PERF_AUX_FLAG_TRUNCATED
> gets marked after this change ?
I moved the comment and applied this patch.
>
> > */
> > - if (!handle->size) {
> > - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > + if (!handle->size)
> > return 0;
> > - }
> >
> > /* Compute the tail and wakeup indices now that we've aligned head */
> > tail = PERF_IDX2OFF(handle->head + handle->size, buf);
> > @@ -360,7 +356,6 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
> > return limit;
> >
> > trbe_pad_buf(handle, handle->size);
> > - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > return 0;
> > }
> >
> > @@ -688,6 +683,7 @@ static void trbe_handle_spurious(struct perf_output_handle *handle)
> > buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
> > if (buf->trbe_limit == buf->trbe_base) {
> > trbe_drain_and_disable_local();
> > + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> > return;
> > }
> > trbe_enable_hw(buf);
> >
Changes since v1:
* Replace goto with return after failure to maintain original failure behaviour
* Add Mathieu and Suzuki's reviewed-by tags
* Repeated testing
James Clark (1):
coresight: Don't immediately close events that are run on invalid
CPU/sink combos
.../hwtracing/coresight/coresight-etm-perf.c | 29 +++++++++++++++----
1 file changed, 23 insertions(+), 6 deletions(-)
--
2.28.0
When a traced process runs on a CPU that can't reach the selected sink,
the event will be stopped with PERF_HES_STOPPED. This means that even if
the process migrates to a valid CPU, tracing will not resume.
This can be reproduced (on N1SDP) by using taskset to start the process
on CPU 0, and then switching it to CPU 2 (ETF 1 is only reachable from
CPU 2):
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls
This produces a single 0 length AUX record, and then no more trace:
0x3c8 [0x30]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T]
After the fix, the same command produces normal AUX records. The perf
self test "89: Check Arm CoreSight trace data recording and synthesized
samples" no longer fails intermittently. This was because the taskset in
the test is after the fork, so there is a period where the task is
scheduled on a random CPU rather than forced to a valid one.
Specifically selecting an invalid CPU will still result in a failure to
open the event because it will never produce trace:
./perf record -C 2 -e cs_etm/@tmc_etf0/
failed to mmap with 12 (Cannot allocate memory)
The only scenario that has changed is if the CPU mask has a valid CPU
sink combo in it.
Testing
=======
* Coresight self test passes consistently:
./perf test Coresight
* CPU wide mode still produces trace:
./perf record -e cs_etm// -a
* Invalid -C options still fail to open:
./perf record -C 2,3 -e cs_etm/@tmc_etf0/
failed to mmap with 12 (Cannot allocate memory)
* Migrating a task to a valid sink/CPU now produces trace:
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls
* If the task remains on an invalid CPU, no trace is emitted:
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- ls
Signed-off-by: James Clark <james.clark(a)arm.com>
---
.../hwtracing/coresight/coresight-etm-perf.c | 27 +++++++++++++++----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 8ebd728d3a80..79346f0f0e0b 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -452,9 +452,14 @@ static void etm_event_start(struct perf_event *event, int flags)
* sink from this ETM. We can't do much in this case if
* the sink was specified or hinted to the driver. For
* now, simply don't record anything on this ETM.
+ *
+ * As such we pretend that everything is fine, and let
+ * it continue without actually tracing. The event could
+ * continue tracing when it moves to a CPU where it is
+ * reachable to a sink.
*/
if (!cpumask_test_cpu(cpu, &event_data->mask))
- goto fail_end_stop;
+ goto out;
path = etm_event_cpu_path(event_data, cpu);
/* We need a sink, no need to continue without one */
@@ -466,16 +471,15 @@ static void etm_event_start(struct perf_event *event, int flags)
if (coresight_enable_path(path, CS_MODE_PERF, handle))
goto fail_end_stop;
- /* Tell the perf core the event is alive */
- event->hw.state = 0;
-
/* Finally enable the tracer */
if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
goto fail_disable_path;
+out:
+ /* Tell the perf core the event is alive */
+ event->hw.state = 0;
/* Save the event_data for this ETM */
ctxt->event_data = event_data;
-out:
return;
fail_disable_path:
@@ -517,6 +521,19 @@ static void etm_event_stop(struct perf_event *event, int mode)
if (WARN_ON(!event_data))
return;
+ /*
+ * Check if this ETM was allowed to trace, as decided at
+ * etm_setup_aux(). If it wasn't allowed to trace, then
+ * nothing needs to be torn down other than outputting a
+ * zero sized record.
+ */
+ if (handle->event && (mode & PERF_EF_UPDATE) &&
+ !cpumask_test_cpu(cpu, &event_data->mask)) {
+ event->hw.state = PERF_HES_STOPPED;
+ perf_aux_output_end(handle, 0);
+ return;
+ }
+
if (!csdev)
return;
--
2.28.0