We need a simple method to test Perf with Arm CoreSight drivers, this
could be used for smoke testing when new patch is coming for perf or
CoreSight drivers, and we also can use the test to confirm if the
CoreSight has been enabled successfully on new platforms.
This patch introduces the shell script test_arm_coresight.sh which is
under the 'pert test' framework. Simply to say, the testing rationale
is source oriented testing, it traverses every source (now only refers
to ETM device) and test its all possible sinks. To search the complete
paths from one specific source to its sinks, this patch relies on the
sysfs '/sys/bus/coresight/devices/devX/out:Y' for depth-first search
(DFS) for iteration connected device nodes, if the output device is
detected as one of ETR, ETF, or ETB types then it will test trace data
recording and decoding for this PMU device.
The script runs three output testings for every trace data:
- Test branch samples dumping with 'perf script' command;
- Test branch samples reporting with 'perf report' command;
- Use option '--itrace=i1000i' to insert synthesized instructions events
and the script will check if perf can output the percentage value
successfully based on the instruction samples.
If any device fails for the testing, the test will report failure and
directly exit with error. This test will be only applied on the
platform with PMU event 'cs_etm//', otherwise will skip the testing.
Below is detailed usage for it:
# cd $linux/tools/perf -> This is important so can use shell script
# perf test list
[...]
61: Check Arm CoreSight trace data recording and branch samples
62: Check open filename arg using perf trace + vfs_getname
63: Zstd perf.data compression/decompression
64: Add vfs_getname probe to get syscall args filenames
# perf test 61
61: Check Arm CoreSight trace data recording and branch samples: Ok
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 120 +++++++++++++++++++
1 file changed, 120 insertions(+)
create mode 100755 tools/perf/tests/shell/test_arm_coresight.sh
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
new file mode 100755
index 000000000000..7b1fa17a4512
--- /dev/null
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -0,0 +1,120 @@
+#!/bin/sh
+# Check Arm CoreSight trace data recording and branch samples
+
+# Uses the 'perf record' to record trace data with Arm CoreSight sinks;
+# then verify if there have any branch samples and instruction samples
+# are generated by CoreSight with 'perf script' and 'perf report'
+# commands.
+
+# Leo Yan <leo.yan(a)linaro.org>, 2019
+
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+file=$(mktemp /tmp/temporary_file.XXXXX)
+
+skip_if_no_cs_etm_event() {
+ perf list | grep -q 'cs_etm//' && return 0
+
+ # cs_etm event doesn't exist
+ return 2
+}
+
+skip_if_no_cs_etm_event || exit 2
+
+record_touch_file() {
+ echo "Recording trace (only user mode) with path: CPU$2 => $1"
+ perf record -o ${perfdata} -e cs_etm/@$1/u --per-thread \
+ -- taskset -c $2 touch $file
+}
+
+perf_script_branch_samples() {
+ echo "Looking at perf.data file for dumping branch samples:"
+
+ # Below is an example of the branch samples dumping:
+ # touch 6512 1 branches:u: ffffb220824c strcmp+0xc (/lib/aarch64-linux-gnu/ld-2.27.so)
+ # touch 6512 1 branches:u: ffffb22082e0 strcmp+0xa0 (/lib/aarch64-linux-gnu/ld-2.27.so)
+ # touch 6512 1 branches:u: ffffb2208320 strcmp+0xe0 (/lib/aarch64-linux-gnu/ld-2.27.so)
+ perf script -F,-time -i ${perfdata} | \
+ egrep " +touch +[0-9]+ .* +branches:([u|k]:)? +"
+}
+
+perf_report_branch_samples() {
+ echo "Looking at perf.data file for reporting branch samples:"
+
+ # Below is an example of the branch samples reporting:
+ # 73.04% 73.04% touch libc-2.27.so [.] _dl_addr
+ # 7.71% 7.71% touch libc-2.27.so [.] getenv
+ # 2.59% 2.59% touch ld-2.27.so [.] strcmp
+ perf report --stdio -i ${perfdata} | \
+ egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +touch "
+}
+
+perf_report_instruction_samples() {
+ echo "Looking at perf.data file for instruction samples:"
+
+ # Below is an example of the instruction samples reporting:
+ # 68.12% touch libc-2.27.so [.] _dl_addr
+ # 5.80% touch libc-2.27.so [.] getenv
+ # 4.35% touch ld-2.27.so [.] _dl_fixup
+ perf report --itrace=i1000i --stdio -i ${perfdata} | \
+ egrep " +[0-9]+\.[0-9]+% +touch"
+}
+
+arm_cs_iterate_devices() {
+ for dev in $1/out\:*; do
+
+ # Skip testing if it's not a directory
+ ! [ -d $dev ] && continue;
+
+ # Read out its symbol link file name
+ path=`readlink -f $dev`
+
+ # Extract device name from path, e.g.
+ # path = '/sys/devices/platform/20010000.etf/tmc_etf0'
+ # `> device_name = 'tmc_etf0'
+ device_name=`echo $path | awk -F/ '{print $(NF)}'`
+
+ echo $device_name | egrep -q "etr|etb|etf"
+
+ # Only test if the output device is ETR/ETB/ETF
+ if [ $? -eq 0 ]; then
+
+ pmu_dev="/sys/bus/event_source/devices/cs_etm/sinks/$device_name"
+
+ # Exit if PMU device node doesn't exist
+ if ! [ -f $pmu_dev ]; then
+ echo "PMU device $pmu_dev doesn't exist"
+ exit 1
+ fi
+
+ record_touch_file $device_name $2 &&
+ perf_script_branch_samples &&
+ perf_report_branch_samples &&
+ perf_report_instruction_samples
+
+ err=$?
+
+ # Exit when find failure
+ [ $err != 0 ] && exit $err
+
+ rm -f ${perfdata}
+ rm -f ${file}
+ fi
+
+ arm_cs_iterate_devices $dev $2
+ done
+}
+
+arm_cs_etm_test() {
+ # Iterate for every ETM device
+ for dev in /sys/bus/coresight/devices/etm*; do
+
+ # Find the ETM device belonging to which CPU
+ cpu=`cat $dev/cpu`
+
+ # Use depth-first search (DFS) to iterate outputs
+ arm_cs_iterate_devices $dev $cpu
+ done
+}
+
+arm_cs_etm_test
+exit 0
--
2.17.1
Coresight device connections are a bit complicated and is not
exposed currently to the user. One has to look at the platform
descriptions (DT bindings or ACPI bindings) to make an understanding.
Given the new naming scheme, it will be helpful to have this information
to choose the appropriate devices for tracing. This patch exposes
the device connections via links in the sysfs directories.
e.g, for a connection devA[OutputPort_X] -> devB[InputPort_Y]
is represented as two symlinks:
/sys/bus/coresight/.../devA/out:X -> /sys/bus/coresight/.../devB
/sys/bus/coresight/.../devB/in:Y -> /sys/bus/coresight/.../devA
Applies on coresight/next tree.
This is split from the ACPI bindings series. No functional changes.
Suzuki K Poulose (3):
coresight: Pass coresight_device for coresight_release_platform_data
coresight: add return value for fixup connections
coresight: Expose device connections via sysfs
drivers/hwtracing/coresight/coresight-platform.c | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 3 +-
drivers/hwtracing/coresight/coresight.c | 148 +++++++++++++++++++----
include/linux/coresight.h | 4 +
4 files changed, 132 insertions(+), 25 deletions(-)
--
2.7.4
We have a few places where we call smp_processor_id() from preemptible
contexts during the perf buffer handling. We do this to figure out the
numa node for the allocation in case the event is not CPU bound. Use
numa_node_id() instead in such cases to avoid a splat.
Suzuki K Poulose (4):
coresight: tmc-etr: Do not call smp_processor_id() from preemptible
coresight: tmc-etr: alloc_perf_buf: Do not call smp_processor_id from
preemptible
coresight: tmc-etf: Do not call smp_processor_id from preemptible
coresight: etb10: Do not call smp_processor_id from preemptible
drivers/hwtracing/coresight/coresight-etb10.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etf.c | 6 ++----
drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 ++++---------
3 files changed, 8 insertions(+), 17 deletions(-)
--
2.7.4
From: Wojciech Zmuda <wzmuda(a)n7space.com>
Hello,
After our discussion in [1] I came up with this proposal solution on how to timestamp
perf samples, so we can tell in trace post-processing, how long does it take to execute
traced blocks of code. I'll be grateful if you find some time to take a look at it.
The patch itself is 37 lines of additions, so it shouldn't consume a lot of your time.
I also have some ideas to further improve this patch I'm not quite sure about, so I'd like
to ask you for your input.
The simplest verification is visibility of the time field in 'perf script' output:
root@zynq:~# perf script --ns -F comm,cpu,event,time | head
Frame deformatter: Found 4 FSYNCS
singlethread-at [001] 4572.697372400: branches:u:
singlethread-at [001] 4572.697372404: branches:u:
singlethread-at [001] 4572.697372408: branches:u:
singlethread-at [001] 4572.697372412: branches:u:
singlethread-at [001] 4572.697372416: branches:u:
The step above works only if trace has been collected in CPU-wide mode. I have an idea
on how to improve this, but I'd like to get some more information first. I elaborated
in the last paragraph.
Another way to access these values is using the script engine, which I used for validation
of sanity of timestamp values. Using python script I checked if timestamps in subsequent
samples are monotonically increasing and it seems they are. The only exception are discontinuities
in trace. From my understanding, we can't timestamp discontinuities reasonably, since after
decoder synchronizes back after trace loss, it needs to wait for another timestamp packet.
from __future__ import print_function
import os
import sys
sys.path.append(os.environ['PERF_EXEC_PATH'] + \
'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
from perf_trace_context import *
prev_sample = 0
cnt = 0;
def process_event(s):
global prev_sample
global cnt
sample = s['sample']
if not prev_sample:
prev_sample = sample;
cnt = cnt+1
return
if sample['time'] < prev_sample['time']:
print('Sample %d has higher timestamp than sample %d' % (cnt-1, cnt))
cnt = cnt+1
By subtracting timestamps of two samples, we can approximately tell, how long did it take
(in timestamp units) to execute instructions between these two points. Now, (basing on Al's response
in [1]) knowing the timestamp generator clock frequency, we can tell how long did it take
in proper time units. The frequency is known (we have it in DT) and can be saved to and retrieved
from TSGEN register. This is not done at this moment (the register is empty), but the idea is,
if it was, it might then be saved in perf.data headers (similarly as TRCIDR are saved now),
and later programmatically retrieved to calculate timestamp differences into actual time.
I didn't experiment with it yet - I'd like to learn your opinion on it.
I've also tried to access the time field with 'perf script -F time' and while it's possible
for traces recorded in CPU-wide mode, it's not possible in per-thread mode due to PERF_SAMPLE_TIME
not being set in that mode (cs_etm__synth_events()). I tried to analyze that code, but I can't
understand relation between this flag and ETM's timeless_decoding property. It looks to me
that timeless decoding was equal to per-thread mode, where timestamps were not really utilized,
while in CPU-wide mode they were used to inter-queue correlation. Would it be a good direction to have
a separate property of TSGEN being on/off and to set PERF_SAMPLE_TIME based on this, rather
than on etm->timeless_decoding? I see an obstacle in the cs_etm__is_timeless_decoding() function,
that sets etm->timeless_decoding basing on events having this SAMPLE_TIME flag or not - could the
criterion here be the tracing mode (CPU-wide/per-thread) instead?
Thank you,
Wojciech
[1] https://lists.linaro.org/pipermail/coresight/2019-May/002577.html
Wojciech Zmuda (1):
perf cs-etm: Set time value for samples
tools/perf/util/cs-etm.c | 36 ++++++++++++++++++++++++++++++++++++
tools/perf/util/cs-etm.h | 1 +
2 files changed, 37 insertions(+)
--
2.11.0
This is the second revision of a set that fix miscellaneous problems
preventing snapshot mode from working properly when CoreSight events are
used.
It applies cleanly on the coresight next branch[1] and will be posted
again when 5.2-rc1 comes out.
Best regards,
Mathieu
[1]. https://git.linaro.org/kernel/coresight.git/ branch next
Changes for V2:
* Drop requirement to make the perf AUX buffer the same size as the sink
buffers.
* Re-worked the user space algorithm to find '*head' and '*old".
* Fixed typo in changelogs (Leo).
Mathieu Poirier (6):
coresight: etb10: Properly set AUX buffer head in snapshot mode
coresight: tmc-etr: Properly set AUX buffer head in snapshot mode
coresight: tmc-etf: Properly set AUX buffer head in snapshot mode
coresight: tmc-etf: Fix snapshot mode update function
coresight: perf: Don't set the truncated flag in snapshot mode
perf tools: Properly set the value of 'old' and 'head' in snapshot
mode
drivers/hwtracing/coresight/coresight-etb10.c | 21 ++-
.../hwtracing/coresight/coresight-tmc-etf.c | 28 ++--
.../hwtracing/coresight/coresight-tmc-etr.c | 19 ++-
tools/perf/arch/arm/util/cs-etm.c | 124 +++++++++++++++++-
4 files changed, 165 insertions(+), 27 deletions(-)
--
2.17.1
This patch set tries to add support Arm CoreSight testing with perf
tool. Since the testings might be very diverse according to different
requirements, to keep the testing as simple as possible from the start
point, I'd like to define the testings to fulfil below duties:
- Sanity testing for integration perf tool with CoreSight tracing;
- Trace source oriented testing, it needs to test for every source and
iterate the paths from the source to its possible sinks;
- Test with 'perf script' for branch samples;
- Test with 'perf report' for branch samples and synthesized instruction
samples.
Before started this work, we need a reliable and simple method to help
us to analysis every possible path from one specific source to its
output sinks. Suzuki has one the patch [1] which creates sysfs in/out
nodes in CoreSight device folders, based on this it's convenient to use
depth-first search (DFS) to traverse the paths from ETM to its connected
sink devices.
Patch 0001 introduces shell script, which based on sysfs in/out nodes to
find every feasible path from one CPU to one sink, then we can specify
the sink in perf record command and use taskset command to bind task to
the target CPU. Use this way it can very if the target CPU can generate
trace data and output to the specific sink successfully or not, below is
the iteration flow in Juno board:
Recording trace with path: CPU0 => 20010000.etf
Recording trace with path: CPU0 => 20070000.etr
Recording trace with path: CPU1 => 20010000.etf
Recording trace with path: CPU1 => 20070000.etr
Recording trace with path: CPU2 => 20010000.etf
Recording trace with path: CPU2 => 20070000.etr
Recording trace with path: CPU3 => 20010000.etf
Recording trace with path: CPU3 => 20070000.etr
Recording trace with path: CPU4 => 20010000.etf
Recording trace with path: CPU4 => 20070000.etr
Recording trace with path: CPU5 => 20010000.etf
Recording trace with path: CPU5 => 20070000.etr
Patch 0002 adds two testings for 'perf report', one is the general
testing and the second is testing with option '-itrace'.
I verified this patch set on Juno board, the code is based on CoreSight
next branch, and applied Suzuki's the patch set 'coresight: Support for
ACPI bindings' (have applied the total 36 patches) [2].
[1] https://archive.armlinux.org.uk/lurker/message/20190415.160419.bed67191.en.…
[2] https://archive.armlinux.org.uk/lurker/message/20190415.160343.cdd208bb.en.…
Leo Yan (2):
perf test: Introduce script for Arm CoreSight testing
perf test: Add 'perf report' testing for Arm CoreSight
.../shell/record+script+report_arm_cs_etm.sh | 95 +++++++++++++++++++
1 file changed, 95 insertions(+)
create mode 100755 tools/perf/tests/shell/record+script+report_arm_cs_etm.sh
--
2.17.1
This series adds the support for CoreSight devices on ACPI based
platforms. The device connections are encoded as _DSD graph property[0],
with CoreSight specific extensions to indicate the direction of data
flow as described in [1]. Components attached to CPUs are listed
as child devices of the corresponding CPU, removing explicit links
to the CPU like we do in the DT.
The majority of the series cleans up the driver and prepares the subsystem
for platform agnostic firwmare probing, naming scheme, searching etc.
We introduce platform independent helpers to parse the platform supplied
information. Thus we rename the platform handling code from:
of_coresight.c => coresight-platform.c
The CoreSight driver creates shadow devices that appear on the Coresight
bus, in addition to the real devices (e.g, AMBA bus devices). The name
of these devices match the real device. This makes the device name
a bit cryptic for ACPI platform. So this series also introduces a generic
platform agnostic device naming scheme for the shadow Coresight devices.
Towards this we also make changes to the way we lookup devices to resolve
the connections, as we can't use the names to identify the devices. So,
we use the "fwnode_handle" of the real device for the device lookups.
Towards that we clean up the drivers to keep track of the "CoreSight"
device rather than the "real" device. However, all real operations,
like DMA allocation, Power management etc. must be performed on
the real device which is the parent of the shadow device.
Finally we add the support for parsing the ACPI platform data. The power
management support is missing in the ACPI (and this is not specific to
CoreSight). The firmware must ensure that the respective power domains
are turned on.
Applies on Mathieu's coresight/next tree.
Tested on a Juno-r0 board with ACPI bindings patch (Patch 31/30) added on
top of [2]. You would need to make sure that the debug power domain is
turned on before the Linux kernel boots. (e.g, connect the DS-5 to the
Juno board while at UEFI). arm32 code is only compile tested.
[0] ACPI Device Graphs using _DSD (Not available online yet, approved but
awaiting publish and eventually should be linked at).
https://uefi.org/sites/default/files/resources/_DSD-implementation-guide-to…
[1] https://developer.arm.com/docs/den0067/latest/acpi-for-coresighttm-10-platf…
[2] https://github.com/tianocore/edk2-platforms.git
Changes since v2:
- Fix the symlink name for ETM devices under cs_etm PMU (Patch by Mathieu)
- Drop patches merged already in the tree.
- Add the tags from Mathieu
- More documentation with examples of ACPI graph in ACPI bindings support.
- Fix ETM4 error return path (Mathieu)
- Drop the patches exposing device links via sysfs, to be posted as separate
series.
- Drop the generic helper for device search by fwnode for a better cleanup
later.
- Split the ACPI bindings support patch for AMBA and platform devices.
- Return integer error for <platform>_get_platform_data() helpers.
- Fix comment about the return code for acpi_get_coresight_cpu().
- Ensure we don't have devices part of multiple graphs (Mathieu).
Changes since v1:
[ http://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/639963.html ]
- Dropped the replicator driver merge changes as they were pulled already.
- Cleanups for Power management in the drivers.
- Reuse platform description for connection information. Also introduce
routines to clean up the platform description to make sure we drop
the references (fwnode_handle).
- Add RFC patches for exposing the device-links via sysfs.
- Drop tracking the device in favour of coresight_device.
- Name etb10 as "etb"
- Fix other comments in v1.
- Use a generic helper for searching with fwnode_handle rather than adding
one for CoreSight.
Mathieu Poirier (1):
coresight: Use coresight device names for sinks in PMU attribute
Suzuki K Poulose (29):
coresight: funnel: Clean up device book keeping
coresight: replicator: Cleanup device tracking
coresight: tmc: Clean up device specific data
coresight: catu: Cleanup device specific data
coresight: tpiu: Clean up device specific data
coresight: stm: Cleanup device specific data
coresight: etm: Clean up device specific data
coresight: etb10: Clean up device specific data
coresight: Rename of_coresight to coresight-platform
coresight: etm3x: Rearrange cp14 access detection
coresight: stm: Rearrange probing the stimulus area
coresight: tmc-etr: Rearrange probing default buffer size
coresight: platform: Make memory allocation helper generic
coresight: Make sure device uses DT for obsolete compatible check
coresight: Introduce generic platform data helper
coresight: Make device to CPU mapping generic
coresight: Remove cpu field from platform data
coresight: Remove name from platform description
coresight: Cleanup coresight_remove_conns
coresight: Reuse platform data structure for connection tracking
coresight: Rearrange platform data probing
coresight: Add support for releasing platform specific data
coresight: platform: Use fwnode handle for device search
coresight: Use fwnode handle instead of device names
coresight: Use platform agnostic names
coresight: stm: ACPI support for parsing stimulus base
coresight: Support for ACPI bindings
coresight: acpi: Support for AMBA components
coresight: acpi: Support for platform devices
drivers/acpi/acpi_amba.c | 9 +
drivers/hwtracing/coresight/Makefile | 3 +-
drivers/hwtracing/coresight/coresight-catu.c | 40 +-
drivers/hwtracing/coresight/coresight-catu.h | 1 -
drivers/hwtracing/coresight/coresight-cpu-debug.c | 3 +-
drivers/hwtracing/coresight/coresight-etb10.c | 51 +-
drivers/hwtracing/coresight/coresight-etm-perf.c | 8 +-
drivers/hwtracing/coresight/coresight-etm.h | 6 +-
.../hwtracing/coresight/coresight-etm3x-sysfs.c | 12 +-
drivers/hwtracing/coresight/coresight-etm3x.c | 45 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 37 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 2 -
drivers/hwtracing/coresight/coresight-funnel.c | 35 +-
drivers/hwtracing/coresight/coresight-platform.c | 810 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-priv.h | 4 +
drivers/hwtracing/coresight/coresight-replicator.c | 42 +-
drivers/hwtracing/coresight/coresight-stm.c | 118 ++-
drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 44 +-
drivers/hwtracing/coresight/coresight-tmc.c | 96 +--
drivers/hwtracing/coresight/coresight-tmc.h | 2 -
drivers/hwtracing/coresight/coresight-tpiu.c | 24 +-
drivers/hwtracing/coresight/coresight.c | 164 ++++-
drivers/hwtracing/coresight/of_coresight.c | 297 --------
include/linux/coresight.h | 61 +-
25 files changed, 1332 insertions(+), 591 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-platform.c
delete mode 100644 drivers/hwtracing/coresight/of_coresight.c
ACPI bindings for Juno-r0 (applies on [2] above)
Suzuki K Poulose (1):
edk2-platform: juno: Update ACPI CoreSight Bindings
Platform/ARM/JunoPkg/AcpiTables/Dsdt.asl | 241 +++++++++++++++++++++++++++++++
1 file changed, 241 insertions(+)
--
2.7.4
This set fix miscellaneous problems that prevented perf's snapshot
mode from working properly when CoreSight events are used.
Given the late state in the cycle, it is sent out for reviewing
purposes and is not intended for the coming merge window.
Applies cleanly to coresight next[1].
Regards,
Mathieu
[1]. https://git.linaro.org/kernel/coresight.git/log/?h=next
Mathieu Poirier (5):
coresight: Fix buffer size in snapshot mode
coresight: tmc-etf: Fix snapshot mode update function
coresight: perf: Don't set the truncated flag in snapshot mode
perf tools: Properly set the value of 'old' in snapshot mode
docs: coresight: Document snapshot mode
Documentation/trace/coresight.txt | 41 ++++++++++++++++++-
drivers/hwtracing/coresight/coresight-etb10.c | 31 +++++++++++++-
.../hwtracing/coresight/coresight-tmc-etf.c | 34 +++++++++++++--
.../hwtracing/coresight/coresight-tmc-etr.c | 16 ++++++--
tools/perf/arch/arm/util/cs-etm.c | 12 +++++-
5 files changed, 124 insertions(+), 10 deletions(-)
--
2.17.1
Hello,
I'm trying to design a solution to use CoreSight for measuring application execution time,
with granularity of specific ranges of instructions.
I have some idea how this may be achieved and I'd like to know your opinion.
Great inspiration comes from this patch set by Leo Yan, especially from the disassembly script:
https://lists.linaro.org/pipermail/coresight/2018-May/001325.html
Analyzing this, I learned that perf-script is capable of understanding perf.data AUXTRACE section
and parsing some of the trace elements to branch samples, which illustrate how the IP moved around.
These pieces of information are available for the built-in python interpreter, so we can script it to get
assembly from the program image.
If I understand perf-script in its current shape correctly, it ignores all the non-branching events
(so everything that's not an ATOM, EXCEPTION or TRACE_ON packet) - specifically, timestamping
is lost during the process. I'd like to modify perf-script to generate samples on such timing events,
so later I can have them in between assembly instructions to calculate deltas and be able to tell either:
- how much time and/or CPU cycles have been spent between two arbitrary instructions (ideally), or
- what instructions have been executed between timestamp T and T+1 (this seems to be more
in-line with how timestamping in CS works, I think)
Brief analysis of tools/perf/util.cs-etm.c and cs-etm-decoder/cs-etm-decoder.c suggests that
timestamp events are not turned into packets, but merely recorded as a packet queue parameter
(I'm not sure why this is needed, though). The cycacc event is not processed further at all,
beside being later decoded to plaintext by OpenCSD. I think it may be worth to give them both
a dedicated `enum cs_etm_sample_type` value and packet generator functions.
Then, I think, it should be possible to generate samples (not sure what type though, perhaps not
'branch' this time) for timestamp/cycacc packets, analogically to what has been done for TRACE_ON
https://lists.linaro.org/pipermail/coresight/2018-May/001327.html
and then expose it in the python interface.
I'd be grateful for any opinion about this idea, especially about usefulness of such feature
for the general audience, as well as any possible compatibility issues. If you are aware
of another approach to achieve timestamp correlation with branch samples, it would
also be very welcome.
I hope the idea is not completely pointless. I'm still making my way through
the perf subsystem, so I might have missed some crucial details.
Thank you for your time,
Wojciech