On Tue, 19 Jun 2018 at 07:37, Leo Yan <leo.yan(a)linaro.org> wrote:
>
> Hi Mathieu,
>
> On Mon, Jun 18, 2018 at 07:26:20PM -0600, Mathieu Poirier wrote:
> > On Mon, 18 Jun 2018 at 18:59, Leo Yan <leo.yan(a)linaro.org> wrote:
> > >
> > > Hi all,
> > >
> > > Just in case you have the same issue, when I use acme's branch perf/core
> > > with latest commit e238cf2e3d2e ("perf intel-pt: Fix packet decoding of
> > > CYC packets"), I can easily reproduce below failure with 'perf record'
> > > command.
> > >
> > > If you have fixing for this, please let me know. Otherwise I will dig a
> > > bit for this issue (probably relying on 'git bisect').
> >
> > Haven't see this, though my tree is based on mainline 4.18-rc1. Let
> > me know what bisect gives you.
>
> Finally I found this issue is caused by I removed 'nohlt' in kernel
> command line, so CPU Idle states are enabled.
>
> After CPU Idle states are enabled, we cannot access the register
> /sys/bus/coresight/devices/f659c000.etm/mgmt/trcauthstatus when the
> CPU stays in the low power states.
Interesting.
>
> So following this issue, I have two questions (sorry if you guys have
> discussed these questions before):
>
> - The first one question is should we support Runtime PM (Or GenPD) in
> Coresight? So when we access some component (e.g. the register
> trcauthstatus) we need to ensure the corresponding power domain is
> powered on properly before we access it.
We rely heavily on runtime PM everywhere [1] in the code to make sure
devices are accessed only when they're powered up.
TRCAUTHSTATUS is a "management" register [2] and as such should not
need to have the core powered up [3], which is why it is accessed with
macro coresight_etm4x_reg() rather than coresight_etm4x_cross_read().
That being said implementation can differ from the guidelines.
First thing to do is to try going from coresight_etm4x_reg() to
coresight_etm4x_cross_read() here [4] and see if there is any changes.
If that doesn't work you need to ask HiSilicon what power domain the
register is in. From there get back to me and we'll talk things over.
[1]. https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/arm/juno…
[2]. See "ARM Embedded Trace Macrocell Architecture Specifiation"
(ID032614), table 7-1 on page 7-311 for details.
[3]. Same document as above, table 7-4 on page 7-314
[4]. https://elixir.bootlin.com/linux/latest/source/drivers/hwtracing/coresight/…
>
> - The second one question is another side topic: if we support CPU
> wide tracing, should we support suspend/resume flow for Coresight
> ETM? So we can save and restore context for CPU respectively.
When using Perf suspend/resume is already supported - tracers won't be
kept on whey there is no more processes executing on a CPU (just
before going to idle). Tracers will also be switched back on once the
CPU has resumed, just before processes start being executed on it.
Configuration of the trace unit is kept in memory. In sysFS mode I
expected users to disable CPUidle, but that can be change if you have
a use case.
Thanks,
Mathieu
>
> Thanks,
> Leo Yan
>
> > > ---8<---
> > >
> > > # ./perf record -e cs_etm/(a)f6402000.etf/ --per-thread uname
> > > [ 240.623458] Internal error: synchronous external abort: 96000210 [#10] PREEMPT SMP
> > > [ 240.631055] Modules linked in:
> > > [ 240.634117] CPU: 1 PID: 2793 Comm: perf Tainted: G D 4.17.0-08502-ge238cf2 #26
> > > [ 240.642648] Hardware name: HiKey Development Board (DT)
> > > [ 240.647876] pstate: 40000005 (nZcv daif -PAN -UAO)
> > > [ 240.652677] pc : trcauthstatus_show+0x3c/0x78
> > > [ 240.657035] lr : trcauthstatus_show+0x34/0x78
> > > [ 240.661391] sp : ffff00000dcfbbc0
> > > [ 240.664703] x29: ffff00000dcfbbc0 x28: 0000000000000001
> > > [ 240.670019] x27: 00000000007000c0 x26: ffff800036fa4048
> > > [ 240.675335] x25: 0000000000001000 x24: ffff000008d95330
> > > [ 240.680651] x23: ffff8000372ad880 x22: ffff80003b346cb8
> > > [ 240.685966] x21: ffff800036fa8080 x20: ffff80003b346ca8
> > > [ 240.691281] x19: ffff0000099d5fb8 x18: 0000000000000000
> > > [ 240.696597] x17: 0000ffffbb44bb68 x16: ffff0000082a3908
> > > [ 240.701911] x15: 0000000000000000 x14: 0000000000000016
> > > [ 240.707227] x13: 7375746174736874 x12: 75616372742f746d
> > > [ 240.712542] x11: 0000000000000020 x10: 0000000080070007
> > > [ 240.717857] x9 : 0000000000000000 x8 : ffff800036fa9080
> > > [ 240.723173] x7 : 0000000000000000 x6 : 000000000000003f
> > > [ 240.728488] x5 : 0000000000000040 x4 : 0000000000000000
> > > [ 240.733803] x3 : 0000000000000000 x2 : 4d223a68697bbd00
> > > [ 240.739118] x1 : ffff80001fc30f80 x0 : 0000000000000001
> > > [ 240.744435] Process perf (pid: 2793, stack limit = 0x0000000051ca53e7)
> > > [ 240.750965] Call trace:
> > > [ 240.753411] trcauthstatus_show+0x3c/0x78
> > > [ 240.757423] dev_attr_show+0x3c/0x80
> > > [ 240.761002] sysfs_kf_seq_show+0xc0/0x158
> > > [ 240.765012] kernfs_seq_show+0x44/0x50
> > > [ 240.768764] seq_read+0x1cc/0x4b8
> > > [ 240.772078] kernfs_fop_read+0x13c/0x1e0
> > > [ 240.776002] __vfs_read+0x60/0x170
> > > [ 240.779403] vfs_read+0x94/0x150
> > > [ 240.782630] ksys_read+0x6c/0xd8
> > > [ 240.785856] sys_read+0x34/0x48
> > > [ 240.788998] el0_svc_naked+0x30/0x34
> > > [ 240.792575] Code: f9404c53 97f2054e f9400273 913ee273 (b9400273)
> > > [ 240.798673] ---[ end trace 838ff5bf36115622 ]---
> > >
> > > Message from syslogd@linaro-developer at May 23 09:50:03 ...
> > > kernel:[ 240.623458] Internal error: synchronous external abort: 96000210 [#10] PREEMPT SMP
> > >
> > > Message from syslogd@linaro-developer at May 23 09:50:03 ...
> > > kernel:[ 240.744435] Process perf (pid: 2793, stack limit = 0x0000000051ca53e7)
> > >
> > > Message from syslogd@linaro-developer at May 23 09:50:03 ...
> > > kernel:[ 240.792575] Code: f9404c53 97f2054e f9400273 913ee273 (b9400273)
> > > Segmentation fault
> > > root@linaro-developer:~#
> > >
I've been packaging libopenCSD for debian. That has resulted in
various changes, largely to the makefiles. Most of those changes
should go upstream, rather than exist only in the debian version.
I'll post my issues/changes here for discussion so we can decide what is upstreamable.
In the meantime anyone interested can try the debian packages here:
http://wookware.org/software/repo/
either manually from:
http://wookware.org/software/repo/pool/main/libo/libopencsd/
or with
deb [ trusted=yes ] http://wookware.org/software/repo unstable main
apt update; apt install libopencsd0
(and/or libopencsd-dev libopencsd-doc libopencsd-dbgsym)
(for debian unstable, amd64 only) (package is signed, but repo isn't. Sorry)
Or get the source and rebuild for a different debian-based distro/release.
I've not separated all my changes into proper standalone patches yet,
but some are so let's start with those.
1) The doxygen doc build doesn't find all the components because they moved.
This fixes that:
Index: libopencsd-0.8.0/decoder/docs/doxygen_config.dox
===================================================================
--- libopencsd-0.8.0.orig/decoder/docs/doxygen_config.dox
+++ libopencsd-0.8.0/decoder/docs/doxygen_config.dox
@@ -765,11 +765,11 @@ WARN_LOGFILE =
INPUT = ../include \
../include/interfaces \
- ../include/etmv3 \
- ../include/etmv4 \
- ../include/ptm \
- ../include/c_api \
- ../include/stm \
+ ../include/opencsd/etmv3 \
+ ../include/opencsd/etmv4 \
+ ../include/opencsd/ptm \
+ ../include/opencsd/c_api \
+ ../include/opencsd/stm \
../include/mem_acc \
../../README.md \
. \
2)
The makefile provides no doc-build. The patch below adds that.
I didn't include an install-docs target, although if you added one
with a variable to set the target dir then I'd use it.
Index: libopencsd-0.8.0/decoder/build/linux/makefile
===================================================================
--- libopencsd-0.8.0.orig/decoder/build/linux/makefile
+++ libopencsd-0.8.0/decoder/build/linux/makefile
@@ -155,6 +155,13 @@ tests: libs
cd $(OCSD_ROOT)/tests/build/linux/trc_pkt_lister && make
cd $(OCSD_ROOT)/tests/build/linux/c_api_pkt_print_test && make
+#
+# build docs
+.PHONY: docs
+docs:
+ (cd $(OCSD_ROOT/docs); doxygen doxygen_config.dox)
+
+
#############################################################
# clean targets
#
@@ -176,3 +183,4 @@ clean_install:
rm -f $(INSTALL_LIB_DIR)/lib$(LIB_BASE_NAME).so
rm -f $(INSTALL_LIB_DIR)/lib$(LIB_CAPI_NAME).so
rm -rf $(INSTALL_INCLUDE_DIR)/$(LIB_UAPI_INC_DIR)
+ rm -rf $(OCSD_ROOT)/docs/html
3) The static libraries are built but not installed.
Now static libraries aren't much use to anyone these days, but if you
build them then you might as well install them.
This patch does that:
Index: libopencsd-0.8.0/decoder/build/linux/makefile
===================================================================
--- libopencsd-0.8.0.orig/decoder/build/linux/makefile
+++ libopencsd-0.8.0/decoder/build/linux/makefile
@@ -136,6 +136,8 @@
mkdir -p $(INSTALL_LIB_DIR) $(INSTALL_INCLUDE_DIR)
$(INSTALL) --mode=644 $(LIB_TARGET_DIR)/lib$(LIB_BASE_NAME).so $(INSTALL_LIB_DIR)/
$(INSTALL) --mode=644 $(LIB_TARGET_DIR)/lib$(LIB_CAPI_NAME).so $(INSTALL_LIB_DIR)/
+ $(INSTALL) --mode=644 $(LIB_TARGET_DIR)/lib$(LIB_BASE_NAME).a $(INSTALL_LIB_DIR)/
+ $(INSTALL) --mode=644 $(LIB_TARGET_DIR)/lib$(LIB_CAPI_NAME).a $(INSTALL_LIB_DIR)/
cd $(OCSD_ROOT)/build/linux/rctdl_c_api_lib && make install_inc
################################
@@ -192,6 +194,6 @@
cd $(OCSD_ROOT)/tests/build/linux/c_api_pkt_print_test && make clean
clean_install:
- rm -f $(INSTALL_LIB_DIR)/lib$(LIB_BASE_NAME).so
- rm -f $(INSTALL_LIB_DIR)/lib$(LIB_CAPI_NAME).so
+ rm -f $(INSTALL_LIB_DIR)/lib$(LIB_BASE_NAME).{so,a}
+ rm -f $(INSTALL_LIB_DIR)/lib$(LIB_CAPI_NAME).{so,a}
rm -rf $(INSTALL_INCLUDE_DIR)/$(LIB_UAPI_INC_DIR)
4) The makefile arbitrarily restricts the build to arm and x86
architectures. The only reason for this is in order to build into a
particular named directory, and to set -m32/-m64 flags for x86 which
should be defaulted correctly on any sensible toolchain or cross-toolchain
anyway.
This code can build, and be used, on any arch and the makefile should
allow that, so this should be fixed.
I don't see the need to build into a host-arch named PLAT_DIR
directory. Is there one? It only does one build at a time, whether
crossing or not, so I see no reason not to use a fixed name for this
dir, such as 'builddir'. This has no effect on the ability to cross or
not. If you really want to change the name of PLAT_DIR then find out
the HOST triplet and use that. On debian this is
dpkg-architecture -q DEB_HOST_GNU_TYPE, (which is also the same prefix as would
be used to specify the toolchain prefix for crossing). But like I say, I
think a fixed builddir is actually all you need unless I'm missing something.
So this patch lets the build work on any arch:
Index: libopencsd-0.8.0/decoder/build/linux/makefile
===================================================================
--- libopencsd-0.8.0.orig/decoder/build/linux/makefile
+++ libopencsd-0.8.0/decoder/build/linux/makefile
@@ -92,27 +92,6 @@ BUILD_VARIANT=rel
endif
-# platform bit size variant
-ifeq ($(ARCH),x86)
- MFLAG:="-m32"
- BIT_VARIANT=32
-else ifeq ($(ARCH),x86_64)
- MFLAG:="-m64"
- BIT_VARIANT=64
-else ifeq ($(ARCH),arm)
- BIT_VARIANT=-arm
-else ifeq ($(ARCH),arm64)
- BIT_VARIANT=-arm64
-else ifeq ($(ARCH),aarch64)
- BIT_VARIANT=-arm64
-else ifeq ($(ARCH),aarch32)
- BIT_VARIANT=-arm
-endif
-
-MASTER_CC_FLAGS += $(MFLAG)
-MASTER_CPP_FLAGS += $(MFLAG)
-MASTER_LINKER_FLAGS += $(MFLAG)
-
# export build flags
export MASTER_CC_FLAGS
export MASTER_CPP_FLAGS
@@ -120,7 +99,7 @@ export MASTER_LINKER_FLAGS
export MASTER_LIB_FLAGS
# target directories
-export PLAT_DIR=linux$(BIT_VARIANT)/$(BUILD_VARIANT)
+export PLAT_DIR=builddir
export LIB_TARGET_DIR=$(OCSD_LIB_ROOT)/$(PLAT_DIR)
export LIB_TEST_TARGET_DIR=$(OCSD_TESTS)/lib/$(PLAT_DIR)
export BIN_TEST_TARGET_DIR=$(OCSD_TESTS)/bin/$(PLAT_DIR)
All these patches included as files too.
Wookey
--
Principal hats: Linaro, Debian, Wookware, ARM
http://wookware.org/
Latest version of OpenCSD library v0.9.0 is now released in the github
repository.
https://github.com/Linaro/OpenCSD
This contains:
- Updates for improved client performance - including perf.
- Additional documentation consisting of Programmers Guide, and
Generic Trace Output Packet reference.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Good afternoon to all,
For the 4.18 cycle I pushed Suzuki's new ETR driver implementation to
GitHub's perf-opencsd repository [1]. It includes support for SG and
contiguous mode, all handled automatically without the need for user
intervention. I gave it a couple of spin and things work as
advertised.
So it's up there now - please try it out in your setup while I review
the patches on the public mailing list.
Note that this has no correlation with the support for CPU-wide
scenarios patchset I sent out last week.
Regards,
Mathieu
[1]. https://github.com/Linaro/perf-opencsd/commits/master
This set adds support for CoreSight CPU-wide trace sessions. It borrows
most of its code from the per-thread implementation with exception that
range packets are processed and synthesised according to the time the
trace they contain has been executed.
This is done using the timestamp and contextID feature available on ETM4x
tracers (ETM3x/PTM aren't addressed yet). Decoding between processors is
done in chronological order using a min heap.
Of special interest is the way timestamp packets are used to account for
temporal execution of traced instructions. Since a timestamp typically
happen after range packets have been recorded, the timestamp from the
previous range is used as the start time of the current range. When a
timestamp for the previous range doesn't exist (i.e start of trace or
discontinuity) the start time is estimated.
Open question:
--------------
At this time the implementation supports tracing a single CPU since the
only HW we have exhibit an N:1 source/sink topology. The HW itself does
support collecting traces from more than one source but using the feature
in this way could be very confusing and mislead users.
For example the following:
# perf record -e cs_etm/20070000.etr/ -C 2,3 application1
would end up tracing everyting that is happening on CPU 2 and 3 for as long
as appliation1 is executing. Because the HW doesn't give us an interrupt
when buffers are full, traces from one CPU could easily clobber traces from
the other, giving the impression that nothing was executed on the latter.
So this would work:
# perf record -e cs_etm/20070000.etr/ -C 3 application1
I am open to discussion on the topic should someone think of something.
As with the cleanup set this code has been uploaded here [1].
Thanks,
Mathieu
[1].https://git.linaro.org/people/mathieu.poirier/coresight.git perf-opencsd-master-cpu-wide-support
Mathieu Poirier (12):
perf tools: Add defines for CONTEXTID configuration
perf tools: Configure contextID tracing in CPU-wide mode
perf tools: Configure timestsamp generation in CPU-wide mode
perf tools: Configure SWITCH_EVENTS in CPU-wide mode
perf tools: Add handling of itrace start events
perf tools: Add handling of switch-CPU-wide events
perf tools: Linking PE contextID with perf thread mechanic
perf tools: Allocate decoder tree as needed
perf tools: Make cs_etm__dump_event() work with CPU-wide scenarios
perf tools: Add notion of time to the decoding code
perf tools: Make function cs_etm_decoder__clear_buffer() public
perf tools: Add support for CPU-wide trace scenarios
include/linux/coresight-pmu.h | 2 +
tools/include/linux/coresight-pmu.h | 2 +
tools/perf/arch/arm/util/cs-etm.c | 174 ++++++++++--
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 140 +++++++++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 4 +-
tools/perf/util/cs-etm.c | 334 ++++++++++++++++++++++--
tools/perf/util/cs-etm.h | 17 ++
7 files changed, 623 insertions(+), 50 deletions(-)
--
2.7.4
The exception packet appears as one element with 'elem_type' ==
OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET,
which present for exception entry and exit respectively. The decoder
set packet fields 'packet->exc' and 'packet->exc_ret' to indicate the
exception packets; but exception packets don't have dedicated sample
type and shares the same sample type CS_ETM_RANGE with normal
instruction packets.
As result, the exception packets are taken as normal instruction packets
and this introduces confusion to mix different packet types.
Furthermore, these instruction range packets will be processed for
branch sample only when 'packet->last_instr_taken_branch' is true,
otherwise they will be omitted, this can introduce mess for exception
and exception returning due we don't have complete address range info
for context switching.
To process exception packets properly, this patch introduce two new
sample type: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; for these two
kind packets, they will be handled by cs_etm__exception(). The func
cs_etm__exception() forces to set previous CS_ETM_RANGE packet flag
'prev_packet->last_instr_taken_branch' to true, this matches well with
the program flow when the exception is trapped from user space to kernel
space, no matter if the most recent flow has branch taken or not; this
is also safe for returning to user space after exception handling.
After exception packets have their own sample type, the packet fields
'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove
them.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 26 +++++++++++++++++------
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++-----
tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++++
3 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 24aabf0..c1715ff 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -264,8 +264,6 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[i].last_instr_taken_branch = false;
- decoder->packet_buffer[i].exc = false;
- decoder->packet_buffer[i].exc_ret = false;
decoder->packet_buffer[i].cpu = INT_MIN;
}
}
@@ -292,8 +290,6 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
decoder->packet_count++;
decoder->packet_buffer[et].sample_type = sample_type;
- decoder->packet_buffer[et].exc = false;
- decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
@@ -345,6 +341,22 @@ cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
CS_ETM_TRACE_ON);
}
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+ const uint8_t trace_chan_id)
+{
+ return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_EXCEPTION);
+}
+
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder,
+ const uint8_t trace_chan_id)
+{
+ return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_EXCEPTION_RET);
+}
+
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -370,10 +382,12 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
- decoder->packet_buffer[decoder->tail].exc = true;
+ resp = cs_etm_decoder__buffer_exception(decoder,
+ trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
- decoder->packet_buffer[decoder->tail].exc_ret = true;
+ resp = cs_etm_decoder__buffer_exception_ret(decoder,
+ trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
case OCSD_GEN_TRC_ELEM_EO_TRACE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 108dc9d..cb57756 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -25,9 +25,11 @@ struct cs_etm_buffer {
};
enum cs_etm_sample_type {
- CS_ETM_EMPTY = 0,
- CS_ETM_RANGE = 1 << 0,
- CS_ETM_TRACE_ON = 1 << 1,
+ CS_ETM_EMPTY = 0,
+ CS_ETM_RANGE = 1 << 0,
+ CS_ETM_TRACE_ON = 1 << 1,
+ CS_ETM_EXCEPTION = 1 << 2,
+ CS_ETM_EXCEPTION_RET = 1 << 3,
};
struct cs_etm_packet {
@@ -35,8 +37,6 @@ struct cs_etm_packet {
u64 start_addr;
u64 end_addr;
u8 last_instr_taken_branch;
- u8 exc;
- u8 exc_ret;
int cpu;
};
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2ae6402..b85100b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -942,6 +942,25 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
return 0;
}
+static int cs_etm__exception(struct cs_etm_queue *etmq)
+{
+ /*
+ * When the exception packet is inserted, whether the last instruction
+ * in previous range packet is taken branch or not, we need to force
+ * to set 'prev_packet->last_instr_taken_branch' to true. This ensures
+ * to generate branch sample for the instruction range before the
+ * exception is trapped to kernel or before the exception returning.
+ *
+ * The exception packet includes the dummy address values, so don't
+ * swap PACKET with PREV_PACKET. This keeps PREV_PACKET to be useful
+ * for generating instruction and branch samples.
+ */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE)
+ etmq->prev_packet->last_instr_taken_branch = true;
+
+ return 0;
+}
+
static int cs_etm__flush(struct cs_etm_queue *etmq)
{
int err = 0;
@@ -1057,6 +1076,15 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
*/
cs_etm__sample(etmq);
break;
+ case CS_ETM_EXCEPTION:
+ case CS_ETM_EXCEPTION_RET:
+ /*
+ * If the exception packet is coming,
+ * make sure the previous instruction
+ * range packet to be handled properly.
+ */
+ cs_etm__exception(etmq);
+ break;
case CS_ETM_TRACE_ON:
/*
* Discontinuity in trace, flush
--
2.7.4
Usually the start tracing packet is a CS_ETM_TRACE_ON packet, this
packet is passed to cs_etm__flush(); cs_etm__flush() will check the
condition 'prev_packet->sample_type == CS_ETM_RANGE' but 'prev_packet'
is allocated by zalloc() so 'prev_packet->sample_type' is zero in
initialization and this condition is false. So cs_etm__flush() will
directly bail out without handling the start tracing packet.
This patch is to introduce a new sample type CS_ETM_EMPTY, which is used
to indicate the packet is an empty packet. cs_etm__flush() will swap
packets when it finds the previous packet is empty, so this can record
the start tracing packet into 'etmq->prev_packet'.
Another minor change in cs_etm__flush() is to check the condition
'etmq->prev_packet->sample_type == CS_ETM_TRACE_ON', if the previous
packet is also a CS_ETM_TRACE_ON packet, the function will skip for
contiguous CS_ETM_TRACE_ON packet.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 +
tools/perf/util/cs-etm.c | 26 ++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 743f5f4..612b575 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -23,6 +23,7 @@ struct cs_etm_buffer {
};
enum cs_etm_sample_type {
+ CS_ETM_EMPTY = 0,
CS_ETM_RANGE = 1 << 0,
CS_ETM_TRACE_ON = 1 << 1,
};
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..67564c1 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -924,9 +924,18 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
int err = 0;
struct cs_etm_packet *tmp;
- if (etmq->etm->synth_opts.last_branch &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ if (!etmq->prev_packet)
+ return 0;
+
+ /* Skip for contiguous CS_ETM_TRACE_ON packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+ return 0;
+
+ /* Handle start tracing packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_EMPTY)
+ goto swap_packet;
+
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Generate a last branch event for the branches left in the
* circular buffer at the end of the trace.
@@ -941,6 +950,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
etmq->period_instructions);
etmq->period_instructions = 0;
+ }
+
+swap_packet:
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
* the next incoming packet.
@@ -1020,6 +1033,13 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
*/
cs_etm__flush(etmq);
break;
+ case CS_ETM_EMPTY:
+ /*
+ * Should not receive empty packet,
+ * report error.
+ */
+ pr_err("CS ETM Trace: empty packet\n");
+ return -EINVAL;
default:
break;
}
--
2.7.4
This patch series is to support for using 'perf script' for CoreSight
trace disassembler, for this purpose this patch series adds a new
python script to parse CoreSight tracing event and use command 'objdump'
for disassembled lines, finally this can generate readable program
execution flow for reviewing tracing data.
Patches 0001 ~ 0003 are to generate samples for the start packet,
CS_ETM_TRACE_ON packet and exception packets.
Patch 0004 is to introduce invalid address macro.
Patch 0005 is to add python script for trace disassembler.
Patch 0006 is to add doc to explain python script usage and give
example for it.
This patch series has been rebased on acme git tree [1] with the latest
commit e9175538c04f ("perf script python: Add addr into perf sample dict")
and tested on Hikey (ARM64 octa CA53 cores).
In this version the script has no dependency on ARM64 platform and is
expected to support ARM32 platform, but I am lacking ARM32 platform for
testing on it, so firstly upstream to support ARM64 platform.
This patch series is firstly to support 'per-thread' recording tracing
data, and it has been verified for kernel panic kdump tracing data.
Please note, this patch series (v4) is ONLY used for discussion for packet
handling, after we get solid result I will send to LKML for reviewing and
merging into mainline kernel.
Changes from v3:
* Split packet handling for three patches, one is for start tracing
packet, one is for CS_ETM_TRACE_ON packet and the last one patch is
for exception packet;
* Introduce invalid address macro.
Changes from v2:
* Synced with Rob for handling CS_ETM_TRACE_ON packet, so refined 0001
patch according to dicussion;
* Minor cleanup and fixes in 0003 patch for python script: remove 'svc'
checking.
Changes from v1:
* According to Mike and Rob suggestion, add the fixing to generate samples
for the start packet and exception packets.
* Simplify the python script to remove the exception prediction algorithm,
we can rely on the sane exception packets for disassembler.
Leo Yan (6):
perf cs-etm: Fix start tracing packet handling
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample for exception packet
perf cs-etm: Introduce invalid address macro
perf script python: Add script for CoreSight trace disassembler
coresight: Document for CoreSight trace disassembler
Documentation/trace/coresight.txt | 52 +++++
tools/perf/scripts/python/arm-cs-trace-disasm.py | 235 +++++++++++++++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 19 +-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 11 +-
tools/perf/util/cs-etm.c | 101 ++++++++--
5 files changed, 390 insertions(+), 28 deletions(-)
create mode 100644 tools/perf/scripts/python/arm-cs-trace-disasm.py
--
2.7.4