On 17/01/17 12:15, Sudeep Holla wrote:
> From: Mike Leach <mike.leach(a)linaro.org>
>
> The CoreSight support added for Juno is valid for only Juno r0.
> The Juno r1 and r2 variants have additional components and alternative
> connection routes between trace source and sinks.
>
> This patch builds on top of the existing r0 support and extends it to
> Juno r1/r2 variants.
>
> Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
> Signed-off-by: Mike Leach <mike.leach(a)linaro.org>
> [sudeep.holla(a)arm.com: minor changelog update and major reorganisation of
> the common coresight components back into juno-base.dtsi to avoid
> duplication, also renamed funnel node names]
> Signed-off-by: Sudeep Holla <sudeep.holla(a)arm.com>
> ---
> arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi | 100 ++++++++++++++++++++++++++++++
> arch/arm64/boot/dts/arm/juno-r1.dts | 9 +++
> arch/arm64/boot/dts/arm/juno-r2.dts | 9 +++
> 3 files changed, 118 insertions(+)
> create mode 100644 arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi
>
> diff --git a/arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi b/arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi
> new file mode 100644
> index 000000000000..563463ed28c7
> --- /dev/null
> +++ b/arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi
> @@ -0,0 +1,100 @@
> +/ {
> + funnel@20130000 { /* cssys2 */
Typo, that should be csys1. Rest looks good to me.
Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
In order to support cross compile the OpenCSD for arm/arm64, this patch
added two environment variables, 'CROSS_COMPILE' and 'ARCH' which can
be set by users before compiling the code.
Like documented in Kernel Makefile, CROSS_COMPILE specifies the prefix
used for all executables used during compilation. CROSS_COMPILE can be
set on the command line, as is ARCH.
For example, if you want to build the libraryies for Aarch64, you can set
"ARCH=arm64", the the compiled libs will be located at
lib/linux-arm64/<rel\dbg>.`
Signed-off-by: Chunyan Zhang <zhang.chunyan(a)linaro.org>
---
decoder/build/linux/makefile | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/decoder/build/linux/makefile b/decoder/build/linux/makefile
index 3b6a623..dc1f32b 100644
--- a/decoder/build/linux/makefile
+++ b/decoder/build/linux/makefile
@@ -47,7 +47,11 @@ export LIB_CAPI_NAME
# determine base architecture, heavily borrowed from the Linux kernel v4.4's
# tools/perf/config/Makefile.arch
+# For example, to compile for arm64 on a X86 PC, you can issue the command:
+# "export ARCH=arm64"
+ifndef ARCH
ARCH := $(shell uname -m 2>/dev/null || echo not)
+endif
# source root directories
export OCSD_LIB_ROOT=$(OCSD_ROOT)/lib
@@ -58,10 +62,10 @@ export OCSD_SOURCE=$(OCSD_ROOT)/source
export OCSD_TESTS=$(OCSD_ROOT)/tests
# tools
-export MASTER_CC=gcc
-export MASTER_CPP=g++
-export MASTER_LINKER=g++
-export MASTER_LIB=ar
+export MASTER_CC=$(CROSS_COMPILE)gcc
+export MASTER_CPP=$(CROSS_COMPILE)g++
+export MASTER_LINKER=$(CROSS_COMPILE)g++
+export MASTER_LIB=$(CROSS_COMPILE)ar
# compile flags
MASTER_CC_FLAGS := -c -Wall -DLINUX
@@ -87,6 +91,10 @@ ifeq ($(ARCH),x86)
else ifeq ($(ARCH),x86_64)
MFLAG:="-m64"
BIT_VARIANT=64
+else ifeq ($(ARCH),arm)
+ BIT_VARIANT=-arm
+else ifeq ($(ARCH),arm64)
+ BIT_VARIANT=-arm64
endif
MASTER_CC_FLAGS += $(MFLAG)
--
2.7.4
The patch adds documentation to HOWTO.md on how to use CoreSight ETM to perform
Feedback Directed Optimization.
Sebastian Pop (1):
HOWTO: add example of how to extract coverage files for autoFDO
HOWTO.md | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
--
2.6.3
---------- Forwarded message ----------
From: Mike Leach <mike.leach(a)linaro.org>
Date: 2 January 2017 at 22:55
Subject: [PATCH] coresight: etm4x: Fix enabling of cycle accurate tracing
in perf.
To: mathieu.poirier(a)linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org, coresignt(a)lists.linaro.org, Mike
Leach <mike.leach(a)linaro.org>
Using perf record 'cyclacc' option in cs_etm event was not setting up cycle
accurate trace correctly.
Corrects bit set in TRCCONFIGR to enable cycle accurate trace.
Programs TRCCCCTLR with a valid threshold value as required by ETMv4 spec.
Signed-off-by: Mike Leach <mike.leach(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-etm4x.c | 7 +++++--
drivers/hwtracing/coresight/coresight-etm4x.h | 1 +
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 4db8d6a..07be032 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -216,8 +216,11 @@ static int etm4_parse_event_config(struct
etmv4_drvdata *drvdata,
goto out;
/* Go from generic option to ETMv4 specifics */
- if (attr->config & BIT(ETM_OPT_CYCACC))
- config->cfg |= ETMv4_MODE_CYCACC;
+ if (attr->config & BIT(ETM_OPT_CYCACC)) {
+ config->cfg |= BIT(4);
+ /* TRM: Must program this for cycacc to work */
+ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT;
+ }
if (attr->config & BIT(ETM_OPT_TS))
config->cfg |= ETMv4_MODE_TIMESTAMP;
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h
b/drivers/hwtracing/coresight/coresight-etm4x.h
index ba8d3f8..8a62c6c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -146,6 +146,7 @@
#define ETM_ARCH_V4 0x40
#define ETMv4_SYNC_MASK 0x1F
#define ETM_CYC_THRESHOLD_MASK 0xFFF
+#define ETM_CYC_THRESHOLD_DEFAULT 256
#define ETMv4_EVENT_MASK 0xFF
#define ETM_CNTR_MAX_VAL 0xFFFF
#define ETM_TRACEID_MASK 0x3f
--
2.7.4
--
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK
On 8 December 2016 at 02:04, Chunyan Zhang <zhang.chunyan(a)linaro.org> wrote:
>
> Hi Nicolas,
>
> On 8 December 2016 at 16:07, Nicolas GUION <nicolas.guion(a)st.com> wrote:
>
>> Chunyan,
>>
>> No problem and it offers me the opportunity to inform you that this last
>> months in ST I worked on ARM coresight trace.
>>
>> Several month ago I contacted Mathieu about ARM STM coresight feature.
>> Actually this year we started a new SOC project Accorod5, around A7ss and
>> of course with integration of ARM coresight components. Mathieu described
>> me the status in january, the next steps and especially added me in the
>> group for all patch dedicated to this topic.
>>
>>
>> So I followed the progression of the patch set delivery in official linux
>> stream, and in october I started the integration of this topic in our BSP
>> (based form 4.1)
>>
>> -update the both components (stm_class/coresight) of hwtracing from
>> recent kernel in our old kernel.
>> -integrate the on-going ftrace patch (it was the version 6)
>>
>>
> So happy to know you have been following the progress of this patch
> series, Steven Rostedt has included these for next merge window, it's
> supposed to be merged into 4.10.
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> <http://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.gitfor-n…>
> for-next
>
>>
>> One difference with the Linaro usage that your team usually describes is
>> the capture way, instead to use the target itself
>> we configure the stm to tpiu directly (skip ETF path) and use an external
>> probe to capture the trace (Lauterbach tool),
>> (to cover a long trace session, get the trace for kernel
>> crash/deadlock...)
>>
>
> As an assignee from Spreadtrum, I believe that Spreadtrum also would need
> this functionality.
>
Hello Nicolas,
First and foremost congratulation on the very good integration work. I
have been adamant on that point many times before and today won't be
different - ST has really good tracing technology and knowledge. You guys
have been working on this for a very long time and the results are there.
Au plaisir,
Mathieu
>
>> Here is a view of the T32 output with 2 masters (Cortex A7 and Cortex
>> M3), and 2 STM client for A7 part (Kernel log and FTRACE)
>>
> That's amazing, but I haven't seen the snapshot you mentioned here :)
>
>
>> this snaphot is not the last version, now the Timestamp are correctly
>> handled and the differentiation between the both A7 CPUs has been deported
>> on STMchannel due to a regression of our SOC
>> (our SOC didn't implement correctly the AHB link between the both A7
>> master to STM, so I used the even channel for A7_0 and odd channel for
>> A7_1, it was more or less the only modification from your patch)
>>
>>
>> Thanks for sharing,
> Chunyan
>
>>
>> *Great Job for all this coresight trace development!*
>>
>>
>> br
>>
>> Nicolas
>>
>>
>>
>> On 12/08/2016 08:30 AM, Chunyan Zhang wrote:
>>
>>
>>
>> On 8 December 2016 at 15:04, Nicolas GUION <nicolas.guion(a)st.com> wrote:
>>
>>> Hi Chunyan,
>>>
>>> Are you sure that you pointed the correct Nicolas, cause I'm really far
>>> to know the Dragonboard 410c board?
>>>
>>
>> Ah, my mistake, thanks for telling me :)
>>
>> Chunyan
>>
>>
>>> I'm working in STMicroelectronics and not usual with other boards than
>>> ST ones.
>>>
>>> br
>>> Nicolas
>>>
>>>
>>> On 12/08/2016 07:24 AM, Chunyan Zhang wrote:
>>>
>>> Hi Nicolas,
>>>
>>> I noticed on 96boards forum, some person reported a similar problem "*Dragonboard
>>> not working after failed linux instalation*" [1] which has been
>>> annoying me recently.
>>>
>>> I posted some details on that page the day before yesterday. Could you
>>> give me some suggestion on how to retrieve my Dragon board?
>>>
>>> Many thanks,
>>> Chunyan
>>>
>>>
>>> [1] http://www.96boards.org/forums/topic/dragonboard-not-working
>>> -after-failed-linux-instalation/#post-18901&gsc.tab=0
>>>
>>>
>>>
>>
>>
>
Hi,
could somebody help me understand why the total size of the
recorded ETM trace differs from run to run?
Is this something in my Juno machine setup, or do you also see this?
The maximum length that has been recorded is about 6MB on my setup.
When I am recording the trace of the same program on intel-pt:
$ perf record -e intel_pt//u ./sort
the amount of captured data is deterministic (around 296MB +/- a few KB.)
Thanks,
Sebastian
sort.c is from https://gcc.gnu.org/wiki/AutoFDO/Tutorial
+ gcc sort.c -o sort -O3
++ seq 1 10
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7779 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.610 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7789 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.545 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7797 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.353 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
5949 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.353 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7807 ms
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 3.287 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7772 ms
[ perf record: Woken up 3 times to write data ]
Warning:
AUX data lost 2 times out of 4!
[ perf record: Captured and wrote 0.126 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7811 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7784 ms
[ perf record: Woken up 3 times to write data ]
Warning:
AUX data lost 2 times out of 4!
[ perf record: Captured and wrote 0.619 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
7770 ms
[ perf record: Woken up 2 times to write data ]
Warning:
AUX data lost 1 times out of 1!
[ perf record: Captured and wrote 0.002 MB perf.data ]
+ for i in '$(seq 1 10)'
+ /root/etm/OpenCSD/tools/perf/perf record -e cs_etm/(a)20070000.etr/u
--per-thread ./sort
Bubble sorting array of 30000 elements
5934 ms
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.236 MB perf.data ]
Hi Mathieu,
perf/Documentation/intel-pt.txt describes how to make autoFDO work
with Intel-PT recorded traces:
# perf record -e intel_pt//u ./sort
# perf inject -i perf.data -o inj --itrace=i100usl --strip
# create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
# gcc -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
On the ARM side, I was able to get an ETM trace, and I started working
with my colleague Brian Rzycki on the second step that translates the trace
into branch events.
Attached is the current state of the patch that adds functionality
from intel-pt.c
to cs-etm.c. We are still trying to get more than one branch recorded in the
branch stack before emitting an event, and it looks like what we need is to
decode more than a packet at a time in cs-etm-decoder.c like in
cs_etm_decoder__buffer_packet()
Comments on the early version of the patch are welcome.
Thanks,
Sebastian