### Introduction ###
Embedded Trace Buffer (ETB) provides on-chip storage of trace data,
usually has buffer size from 2KB to 8KB. These data has been used for
profiling and this has been well implemented in coresight driver.
This patch set is to explore ETB RAM data for postmortem debugging.
We could consider ETB RAM data is quite useful for postmortem debugging,
especially if the hardware design with local ETB buffer (ARM DDI 0461B)
chapter 1.2.7. 'Local ETF', with this kind design every CPU has one
dedicated ETB RAM. So it's quite handy that we can use alive CPU to help
dump the hang CPU ETB RAM. Then we can quickly get to know what's the
exact execution flow before its hang.
Due ETB RAM buffer has small size, if all CPUs shared one ETB buffer
then the trace data for causing error is easily to be overwritten by
other PEs; but even so sometimes we still have chance to go through the
trace data to assist debugging panic issues.
### Implementation ###
Firstly we need provide a unified APIs for panic dump functionality, so
it can be easily extended to enable panic dump for multiple drivers. This
is finished by patch 0001, it registers panic notifier, and provide the
general APIs {coresight_dump_add|coresight_dump_del} as helper functions
so any coresight device can add into dump list or delete itself
as needed.
Generally coresight devices can add itself into panic dump when
registration, if the coresight device wants to do dump it will set its
'panic_cb' in the ops structure. So patch 0002 is to add and delete panic
dump node for devices.
Patch 0003 and 0004 are to add panic callback functions for tmc and etm4x
drivers; so tmc dirver can save specific trace data for ETB/ETF when panic
happens, and etm4x driver can save metadata for offline analysis.
### Usage ###
Below are the example for how to use panic dump functionality on 96boards
Hikey, the brief flow is: when the panic happens the ETB panic callback
function saves trace data into memory, then relies on kdump to use
recovery kernel to save DDR content as kernel core dump file; after we
transfer kernel core dump file from board to host PC, use 'crash' tool +
extension program to extract trace data and generate 'perf' format
compatible file.
- Enable tracing on Hikey; in theory there have two methods to enable
tracing:
The first method is to use sysfs interface to enable coresight tracing:
echo 1 > /sys/bus/coresight/devices/f6402000.etf/enable_sink
echo 1 > /sys/bus/coresight/devices/f659c000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f659d000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f659e000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f659f000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f65dc000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f65dd000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f65de000.etm/enable_source
echo 1 > /sys/bus/coresight/devices/f65df000.etm/enable_source
The second method is to use tool 'perf' with snapshot method, this
command is expected to enable tracing and wait for specific event happen
and capture the snapshot trace data, this method also can be smoothly
used for panic dump. This command currently is failure on Hikey due
now coresight only support '--per-thread' method with perf tool:
./perf record --snapshot -S8196 -e cs_etm/(a)f6402000.etf/ -- sleep 1000 &
- Load recovery kernel for kdump:
ARM64's kdump supports to use the same kernel image both for main
kernel and dump-capture kernel; so we can simply to load dump-capture
kernel with below command:
./kexec -p vmlinux --dtb=hi6220-hikey.dtb --append="root=/dev/mmcblk0p9
rw maxcpus=1 reset_devices earlycon=pl011,0xf7113000 nohlt
initcall_debug console=tty0 console=ttyAMA3,115200 clk_ignore_unused"
- Download kernel dump file:
After kernel panic happens, the kdump launches dump-capture kernel;
so we need save kernel's dump file on target:
cp /proc/vmcore ./vmcore
Finally we can copy 'vmcore' file onto PC.
- Use 'crash' tool + csdump.so extension to extract trace data:
After we download vmcore file from Hikey board to host PC, we can
use 'crash' tool + csdump.so to generate 'perf.data' file:
./crash vmcore vmlinux
crash> extend csdump.so
crash> csdump output_dir
We can see in the 'output_dir' there will generate out three files:
output_dir/
├── cstrace.bin -> trace raw data
├── metadata.bin -> meta data
└── perf.data -> 'perf' format compatible file
The source code of 'csdump.so' will be sent to mailing list sepeartely.
- User 'perf' tool for offline analysis:
On Hikey board:
./perf script -v -F cpu,event,ip -i perf_2.data -k vmlinux
[001] instructions: ffff000008559ad0
[001] instructions: ffff000008559230
[001] instructions: ffff00000855924c
[001] instructions: ffff000008559ae0
[001] instructions: ffff000008559ad0
[001] instructions: ffff000008559230
[001] instructions: ffff00000855924c
[001] instructions: ffff000008559ae0
[001] instructions: ffff000008559ad0
Changes from v1:
* Add support to dump ETMv4 meta data.
* Wrote 'crash' extension csdump.so so rely on it to generate 'perf'
format compatible file.
* Refactored panic dump driver to support pre & post panic dump.
Changes from RFC:
* Follow Mathieu's suggestion, use general framework to support dump
functionality.
* Changed to use perf to analyse trace data.
Leo Yan (4):
coresight: Support panic dump functionality
coresight: Add and delete dump node for registration/unregistration
coresight: tmc: Hook panic dump callback for ETB/ETF
coresight: etm4x: Hook panic dump callback for etmv4
drivers/hwtracing/coresight/Kconfig | 9 +
drivers/hwtracing/coresight/Makefile | 1 +
drivers/hwtracing/coresight/coresight-etm4x.c | 22 +++
drivers/hwtracing/coresight/coresight-etm4x.h | 15 ++
drivers/hwtracing/coresight/coresight-panic-dump.c | 211 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-priv.h | 17 ++
drivers/hwtracing/coresight/coresight-tmc-etf.c | 29 +++
drivers/hwtracing/coresight/coresight.c | 7 +
include/linux/coresight.h | 7 +
9 files changed, 318 insertions(+)
create mode 100644 drivers/hwtracing/coresight/coresight-panic-dump.c
--
2.7.4
These patches fix some issues with the branch stacks generated from
CoreSight ETM trace.
The main issues addressed are:
- The branch stack should only contain taken branches.
- The instruction samples are generated using the period specified by the
--itrace option to perf inject. Currently, the period can only be
specified as an instruction count - further work is required to specify
the period as a cycle count or time interval.
- The ordering of the branch stack should have newest branch first.
- Some minor fixes to the address calculations.
With these fixes, the branch stacks are more similar to the last branch
records produced by 'perf record -b' and Intel-PT on x86. There are
similar improvements in the autofdo profiles generated from these traces.
The patches apply to the autoFDO branch of
https://github.com/Linaro/perf-opencsd.git (d3fa0f7)
Regards
Robert Walker
Robert Walker (2):
Revert "perf inject: record branches in chronological order"
perf: Fix branch stack records from CoreSight ETM decode
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 4 +-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +-
tools/perf/util/cs-etm.c | 134 +++++++++++++-----------
3 files changed, 73 insertions(+), 67 deletions(-)
--
1.9.1
The TMC-ETR supports routing the Coresight trace data to the
System memory. It supports two different modes in which the memory
could be used.
1) Contiguous memory - The memory is assumed to be physically
contiguous.
2) Scatter Gather list - The memory can be chunks of 4K pages,
which are specified in a table of pointers which itself could be
multiple 4K size pages.
To avoid the complications of the managing the buffer, this series
adds a layer for managing the ETR buffer, which makes the best possibly
choice based on what is available. The allocation can be tuned by passing
in flags, existing pages (e.g, perf ring buffer) etc.
Towards supporting ETR Scatter Gather mode, we introduce a generic TMC
scatter-gather table which can be used to manage the data and table pages.
The table can be filled in the format expected by the Scatter-Gather
mode.
The TMC ETR-SG mechanism doesn't allow starting the trace at non-zero
offset (required by perf). So we make some tricky changes to the table
at run time to allow starting at any "Page aligned" offset and then
wrap around to the beginning of the buffer with very less overhead.
See patches for more description.
The series also improves the way the ETR is controlled by different modes
(sysfs vs. perf) by keeping mode specific data. This allows access
to the trace data collected in sysfs mode, even when the ETR is
operated in perf mode. Also with the transparent management of the
buffer and scatter-gather mechanism, we can allow the user to
request for larger trace buffers for sysfs mode. This is supported
by providing a sysfs file, "buffer_size" which accepts a page aligned
size, which will be used by the ETR when allocating a buffer.
Finally, it cleans up the etm perf sink callbacks a little bit and
then adds the support for ETR sink. For the ETR, we try our best to
use the perf ring buffer as the target hardware buffer, provided :
1) The ETR is dma coherent (since the pages will be shared with
userspace perf tool).
2) The perf is used in snapshot mode (The ETR cannot be stopped
based on the size of the data written hence we could easily
overwrite the buffer. We may be able to fix this in the future)
3) The ETR supports the Scatter-Gather mode.
If we can't use the perf buffers directly, we fallback to using
software buffering where we have to copy the trace data back
to the perf ring buffer.
Suzuki K Poulose (17):
coresight etr: Disallow perf mode temporarily
coresight tmc: Hide trace buffer handling for file read
coresight: Add helper for inserting synchronization packets
coresight: Add generic TMC sg table framework
coresight: Add support for TMC ETR SG unit
coresight: tmc: Make ETR SG table circular
coresight: tmc etr: Add transparent buffer management
coresight: tmc: Add configuration support for trace buffer size
coresight: Convert driver messages to dev_dbg
coresight: etr: Track if the device is coherent
coresight etr: Handle driver mode specific ETR buffers
coresight etr: Relax collection of trace from sysfs mode
coresight etr: Do not clean ETR trace buffer
coresight: etr: Add support for save restore buffers
coresight: etr_buf: Add helper for padding an area of trace data
coresight: perf: Remove reset_buffer call back for sinks
coresight perf: Add ETR backend support for etm-perf
.../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 +
.../coresight/coresight-dynamic-replicator.c | 4 +-
drivers/hwtracing/coresight/coresight-etb10.c | 72 +-
drivers/hwtracing/coresight/coresight-etm-perf.c | 9 +-
drivers/hwtracing/coresight/coresight-etm3x.c | 4 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 4 +-
drivers/hwtracing/coresight/coresight-funnel.c | 4 +-
drivers/hwtracing/coresight/coresight-priv.h | 8 +
drivers/hwtracing/coresight/coresight-replicator.c | 4 +-
drivers/hwtracing/coresight/coresight-stm.c | 4 +-
drivers/hwtracing/coresight/coresight-tmc-etf.c | 109 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 1665 ++++++++++++++++++--
drivers/hwtracing/coresight/coresight-tmc.c | 75 +-
drivers/hwtracing/coresight/coresight-tmc.h | 128 +-
drivers/hwtracing/coresight/coresight-tpiu.c | 4 +-
include/linux/coresight.h | 5 +-
16 files changed, 1837 insertions(+), 270 deletions(-)
--
2.13.6
Hi, I’ve recently acquired a ZedBoard with the Zynq-7000 SoC and was interested in finding out if I could use `perf` as described on https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md to grab trace data.
Unfortunately, zynq-7000.dtsi on (recent) Linux kernels does not yet contain the necessary device definitions, and zynq-zed.dts wasn’t even syntactically correct (but it was just the syntax for the include-statement, so easy to fix).
Based on Muhammad Wahab’s patch floating around the interwebs and studying the Zynq manual, I enabled support for some more of the devices (like the tpiu) in the devicetree.
But most crucially (I guess), I can’t identify what “etr” in the HOWTO corresponds to on the Zynq. This means that the sample line from the HOWTO above
$ ./tools/perf/perf record -e cs_etm/(a)20070000.etr/ --per-thread uname
won’t work.
Does anyone have experience in configuring the devicetree correctly for the Zynq? Should the perf-incantation on the Zynq also use .etr, or is there some other mechanism that perf can use on the Zynq?
Sincerely,
Volker Stolz
Good day all,
The kernel branches on the openCSD repository[1] have been moved to
their new living quarters [2] and the HOWTO.md on [1] modified to
reflect that. All we need to do is remove the kernel branches from
[1], something I'm planning to do by end of business on Monday.
Please get back to me if you need more time so that we can sketch out
a plan.
Thanks,
Mathieu
[1]. https://github.com/Linaro/OpenCSD
[2]. https://github.com/Linaro/perf-opencsd