== Progress ==
* Catching up after holidays
* [GlobalISel] Refactor CallLowering [LLVM-568]
- Still in progress, but I fixed the AMDGPU failure
- Decided to go a bit deeper than initially intended, since it gets
really awkward otherwise
* IR SVE Reviews [ LLVM-545]
- New version of the patch was posted, had a quick look
== Plan ==
* Continue LLVM-568
* Start LLVM-134 (LLVM SPEC2k6 Performance Analysis Leftovers from BUD17)
* LLVM-545 (IR SVE Reviews) - Have a look at the new, revised patch
hi all,considering the big progress achieved in coresight drivers, perf, as well as opencsd, the prerequisites for making a move towards developing branch tracing in gdb for arm processors, based on etm are now available. Therefore I am publishing this request for comment, and looking forward for your feedback on this proposal.
Non intrusive execution recording for GDB using ARM CoreSight
Status of this Memo
This memo provides information for Linaro coresight and toolchain communities. Distribution of this memo is unlimited.
Abstract
A method of realizing execution recording in GDB in a non-intrusive way. This method is based on the use of CoreSight hardware tracing, available on ARM Cortex devices.
Table of Contents
1 Introduction 2 State of the art 3 Use cases 3.1 Self hosted debug monitor 3.2 Remote debug monitor 3.3 External debugger 4 Implementation needs 4.1 Self hosted debug monitor 4.2 Remote Debug monitor 4.3 External debugger 5 Remote protocol execution sequence 6 Remote protocol extensions 7 Solutions and alternatives 7.1 Scope definition 7.2 CoreSight infrastructure exposure to the user 7.3 Parameters needed for parsing traces
1. Introduction
CoreSight technology offers a toolset for tracing the execution of a program on a CPU, as well as routing the traces to an external trace port analyzer or storing it in a dedicated internal memory. Those traces do not affects system performance, and can be used as a record for program execution. GDB offers reverse debugging by recording program execution and storing it. GDB offers either full record or program flow (branch) record. Records can be replayed later-on for forwards or backwards debugging. This request for comments is about realizing GDB record and replay functionality using CoreSight technology. it presents typical use cases and discuss different alternatives for realizing above mentioned feature. 2. State of the art
GDB currently supports two execution recording variants: - full record: where registers as well as memory are recorded for each instruction. in this case GDB collects the registers as well as involved memory area after each instruction. currently this has no support for hardware accelerators - branch record: where only program flow is recorded. in this case GDB collects a list of linear execution called blocks. each branch will terminate previous block and start a new one. currently branch is implemented either without hardware acceleration or using Intel branch trace store "bts" and Intel processor trace "pt" hardware accelerator on supported cpus.
3. Use cases
Programs running on ARM processors can be be debugged in many configurations. three of them are selected in this RFC as base for discussion : 3.1. Self hosted debug monitor Those are systems where the debugger program runs on the same cpu as the debugged program and monitors it. user interacts with the debugging session on the target host itself. Linux gdb is an example of such systems. in such a system following setup is considered - Target: a process running on an ARM cortex A - Debugger: gnu gdb via ptrace API (arm-linux-gnueabihf-gdb)
+-----------------------------------+ | Target | | +------------+ | | +------+ | Coresight | | | | | | components:| | | | GDB |<--------->| | | | | | ^ | DWT, ETM, | | | +------+ | | ITM, TPIU | | | ^ | | TMC, ETB | | | | | +------------+ | +----|---------|--------------------+ | | | | arm-linux- | gnueabihf- | gdb | debug: ptrace trace: perf/CoreSight drivers
3.2. Remote debug monitor
Those are usually systems where the debugger program runs on the same cpu as the debugged program and monitors it. user interacts with the debugging session remotely from a PC Linux gdb is an example of such systems. in such a system following setup is considered - Target: a process running on an ARM cortex A - Gdb server: gnu gdbserver (arm-linux-gnueabihf-gdbserver) - Gdb client: gnu gdb (arm-linux-gnueabihf-gdb) - UI: eclipse with needed plugins, MI interface is used.
+--------------------------+ +---------------------------------------+ | Host | | Target | | | | +------------+ | | +-----+ +------+ | | +------+ | Coresight | | | | | | GDB | | | | GDB | | components:| | | | UI |<--->| |<--->|<--->|<--->| |<--------->| | | | | | ^ |Client| ^ | ^ | |Server| ^ | DWT, ETM, | | | +-----+ | +------+ | | | | +------+ | | ITM, TPIU | | | ^ | ^ | | | | ^ | | TMC, ETB | | | | | | | | | | | | +------------+ | +----|-----|-----|------|--+ | +--------|---------|--------------------+ | | | | | | | | | | | | | | Eclipse | arm-linux- | | arm-linux- | | gnueabihf- | TCP/IP gnueabihf- | | gdb | UART gdbserver | GDB MI GDB remote debug: ptrace protocol trace: perf/CoreSight drivers
3.3. External debugger
Those are systems where an external debugger is used. It accesses the target using JTAG or SWD. Target is usually a bare metal embedded systems or systems with an rtos. as an example, following setup is considered: - Target: firmware running on ARM cortex M. - Debugger: external debug and trace device. - Gdb server: OpenOcd. - Gdb Client: arm-none-eabi-gdb. - UI: eclipse with needed plugins, MI interface is used.
+--------------------------------------+ +-------+ +-------------+ | Host | | dbggr | | Target | | | | | | | | +-----+ +------+ +------+ | | | | Coresight | | | | | GDB | | GDB | | | Debug | | components: | | | UI |<--->| |<--->| |<-->|<--->| + |<--->| | | | | ^ |Client| ^ |Server| | ^ | Trace | ^ | DWT, ETM, | | +-----+ | +------+ | +------+ | | | | | | ITM, TPIU | | ^ | ^ | ^ | | | | | | | | | | | | | | | | | | | | +----|-----|-----|------|-----|--------+ | +-------+ | +-------------+ | | | | | | | | | | | | | | Eclipse | arm-none- | OpenOcd | | | eabi-gdb | PyOcd | | | | | | GDB MI GDB remote Ethernet debug: JTAG/SWD protocol USB trace: Serial/Parallel
4. Implementation needs
4.1 Self hosted debug monitor
gdb : arm-linux-gnueabihf-gdb the interface defined in btrace.h for capturing and processing traces has to be implemented for arm CoreSight needed actions: - in btrace-common.h: add needed structures for capturing and handling etm traces - in linux-btrace.h: - add btrace_tinfo_etm - amend btrace_target_info - in linux-btrace.c: change following functions to support etm traces - linux_enable_btrace - linux_disable_btrace - linux_read_btrace - linux_btrace_conf - in arm-linux-nat.c:add an api to - configure btrace - enable btrace - disable btrace - read btrace - in btrace.c - btrace_add_pc btrace_fetch has to be implemented for Coresight this means using opencsd library to parse etms and then reconstruct executed instructions accordingly (btrace_compute_ftrace_1) - in record-btrace.c - add command for showing record btrace etm options - add command for starting tracing with CoreSight and its handler (cmd_record_btrace_etm_start) - adapt cmd_show_record_btrace_cpu ... perf: needed actions: - make sure that perf can start/stop tracing a process with its threads, collect etm traces and deliver them to the user
4.2 Remote Debug monitor
changes described in 7.1 are needed. in addition, and to support remote protocol following changes are needed gdb server: arm-linux-gnueabihf-gdbserver needed actions: - in linux-low - linux_low_read_btrace: add support for etm traces formatting. - linux_low_btrace_conf: :add support for etm configuration formatting. gdb client: arm-linux-gnueabihf-gdb needed actions: - in remote.c - adapt enable_btrace - adapt disable_btrace - in btrace.c - parse_xml_btrace: update btrace.dtd [2] and related data structures btrace_xxx - parse_xml_btrace_conf: update btrace-conf.dtd [3] and related data structures btrace_conf_xxx - extend Remote protocol handling to support coresight etm traces UI: eclipse needed actions make sure that the plugin for recoding execution and replaying it is coping well in case of arm-linux
Remote protocol needs to be extended by -1- Adding Qbtrace:CoreSight (or etm) to start collecting etm traces -2- Amending 'Branch Trace Format' xml specification to consider etm traces transfer -3- Amending 'Branch Trace Configuration Format' xml specification to consider parameters needed for etm
4.3 External debugger
changes described in 4.2 are needed. in addition, and to support tracing a remote dealing with an external debugger (bare metal embedded system) following changes are needed gdb server: OpenOcd needed actions: - rework etm driver to make it up to date. - add a driver for configuring trace interconnect IPs - rework the driver for TPIU. - integrate support for a Trace port analyzer. -Extend remote protocol implementation to support recording Coresight infrastructure of the SoC is to be set in OpenOcd through configuration files. Parameters that are not relevant for gdb are also specified in configuration files (trace sink, trace protocol, port size, trace synch frequency, cycle accurate tracing etc ...) gdb client: arm-none-eabi-gdb needed actions: - extend Remote protocol to support coresight etm traces - integrate etm trace parsing library - interface the parser to record_btrace_target Remote protocol needs -in addition to 4.2- to be extended by - Adding Qbtrace-conf:CoreSight:core=value to support multicore SoC - Adding btrace-conf:CoreSight:id=value to support demultiplexing multiple trace sources - Adding Qbtrace-conf:CoreSight:filter:context=value to support filtering traces belonging to a given process/thread - Adding Qbtrace-conf:CoreSight:filter:start-address=value and Qbtrace-conf:CoreSight:filter:end-address=value to support filtering traces for given functions/blocks/lib - Adding Qbtrace-conf:CoreSight:trigger:on-address=value and Qbtrace-conf:CoreSight:trigger:off-address=value to support triggering tracing or stop tracing if a certain function/block/lib is executed alternatively some of configurations related to filtering and triggering can be delegated to the GDB server. UI: eclipse test and verify that existing plugins cope well with gdb extensions
5. Remote protocol execution sequence
gdb and gdbserver are communicating using the gdb remote protocol. on a semantic level a tracing session runs though following sequence (1) gdb client queries gdb server support for branch trace (2) gdb server answers with - qXfer:btrace:read - qXfer:btrace-conf:read - Qbtrace:off - Qbtrace:CoreSight - Qbtrace-conf:CoreSight:xxx where xxx is the parameter name (3) gdb client sends command to let start emitting and collecting traces (Qbtrace:CoreSight) (4) gdb server executes the commands (5) gdb client sends command to stop emitting and collecting traces (Qbtrace:off) (6) gdb server exectues the command (7) gdb client sends command to get collected traces from trace sink (qXfer:btrace:read:annex:offset,length) (8) gdb server executes the command and sends back collected traces (9) gdb client parses the traces and reconstructs target states
6. Remote protocol extensions
the remote protocol needs be extended with following primitives to support CoreSight tracing - start tracing and traces capture using CoreSight (Qbtrace:CoreSight) the remote protocol can be extended with following primitives to take advantages of etm functionalities. - select the core to trace on in the case of a multicore system gdb client sends command to select the core to trace (Qbtrace-conf:CoreSight:core=value) - set the trace id for the traces gdb client sends command to set trace id (Qbtrace-conf:CoreSight:id=value) - select the context to trace gdb client sends command to select the context to trace (Qbtrace-conf:CoreSight:filter:context=value) - select the address range to trace gdb client sends command to select the address range to trace (Qbtrace-conf:CoreSight:filter:start-address=value) (Qbtrace-conf:CoreSight:filter:end-address=value) - set triggers for starting and stopping tracing gdb client sends command to select the address to trigger tracing (Qbtrace-conf:CoreSight:trigger:on-address=value) (Qbtrace-conf:CoreSight:trigger:off-address=value)
7. alternatives and discussions
7.1. Scope definition
Coresight ETM IP comes in many versions and many implementations. According to its capabilities, it can trace instructions only or instructions and involved data/data address. All ETMs variants support instructions tracing and can therefore be used for for branch tracing.
7.2. CoreSight infrastructure exposure to the user
it is here about assigning the responsibility of configuring Coresight infrastructure to generate and route traces. two alternatives are possible: - coresight infrastructure exposed to gdb client (and UI): in this alternative the user or the UI is responsible for configuring coresight IPs in the SoC, by accessing their registers directly or via coresigh driver. Remote protocol is used to configure trace sink (ETB or TPA) to start/stop collecting traces - coresight infrastructure is not exposed outside of gdbserver. in this case high level commands can be provided by gdbserver remote protocol to setup and configure coresight IPs in the SoC. My recommendation is to extend remote protocol to provide high level commands to setup and configure coresight IPs in the SoC, or to use a different channel to pass configuration parameters to gdb server
7.3. parameters needed for parsing traces Some configuration parameters like etm version, trace id ... (content of registers ETMCR, ETMIDR, ETMCCER, ETMTRACEIDR) are needed for extracting and parsing etm trace, those parameters needs to be exchanged between gdb server and client. following alternatives are possible: - extend the remote protocol to get those params with explicit queries - add them to the content of the response to qXfer:btrace-conf:read - add them to the content of the response to qXfer:btrace:read
Best RegardsZied Guermazi
o LLVM
* Machine outliner:
- Rebased on upstream.
- Improved stack alignment handling
- More cleanup before submission
o Misc
* Various meetings and discussions.
[VIRT-339 # ARMv8.5-BTI, Branch Target Identification ]
Posted v6. Some review from Dave Martin; will need at least
one further revision, and to wait til the kernel patches land.
[VIRT-327 # Richard's upstream QEMU work ]
Fix a reported bug in pauth Auth results.
Fix a reported bug in vector variable shift.
Review Peter's vfp decodetree patch set.
Review of v18 of target/rx. Never-ending, it seems...
Review of some target/ppc vector patches.
Posted v4 of CPUNegativeOffsetState. This looks ready to pull.
r~
Upstream Work ([VIRT-109])
==========================
- spent ages tracking down 64-on-32 cputlb errors which led to:
- adding x86_64 support to TCG system tests
- {PATCH v1 0/4} softmmu de-macro fix with tests Message-Id:
<20190605162326.13896-1-alex.bennee(a)linaro.org>
- {PATCH} cputlb: cast size_t to target_ulong before using for
address masks Message-Id:
<20190606154310.15830-1-alex.bennee(a)linaro.org>
- took over maintainership of orphaned gdbstub
- posted {PULL 00/52} testing, gdbstub and cputlb fixes Message-Id:
<20190607090552.12434-1-alex.bennee(a)linaro.org>
- problems on hackbox lead to {PATCH} tests/vm: favour the locally
built QEMU for bootstrapping Message-Id:
<20190607185337.14524-1-alex.bennee(a)linaro.org>
[VIRT-109] https://projects.linaro.org/browse/VIRT-109
Other Tasks
===========
- Did some initial reading on RPMB
- Half day fire training
Absences
========
- May 27th is a Bank Holiday
- May 31st working on train in the afternoon
Current Review Queue
====================
* {Qemu-devel} {PATCH v4 00/39} tcg: Move the softmmu tlb to CPUNegativeOffsetState
Message-Id: <20190604203351.27778-1-richard.henderson(a)linaro.org>
* {Qemu-devel} {PATCH v2} target/arm: Vectorize USHL and SSHL
Message-Id: <20190603232209.20704-1-richard.henderson(a)linaro.org>
* {Qemu-devel} {PATCH resend} test-thread-pool: be more reliable
Message-Id: <20190530093417.23370-1-pbonzini(a)redhat.com>
* {PATCH 0/4} tests/docker: add podman support
Message-Id: <20190523234011.583-1-marcandre.lureau(a)redhat.com>
* {Qemu-devel} {PATCH 0/2} Implement PowerPC FPSCR flag Fraction Rounded
Message-Id: <20190525022008.24788-1-programmingkidx(a)gmail.com>
* {Qemu-devel} {PATCH for-4.1 v2 00/36} tcg: Move the softmmu tlb to CPUNegativeOffsetState
Message-Id: <20190328230404.12909-1-richard.henderson(a)linaro.org>
--
Alex Bennée
[LLVM-122] BTI and PAC support
Committed the LLD work. Modulo bugs this should now be done.
[LLVM-542] LLVM/GCC code size investigation
- Revisited my Zephy build with clang patches and updated so that it
works with trunk.
- Work out next steps of work.
- Work out to build an embedded gcc toolchain using the linaro infrastructure.
[Misc]
Reported bug in gold whereupon it would generate v4t veneers for v8-a CPUs
Still waiting for TK-1 board to finish building clang so that it can
run the testsuite. Hopefully finished over the weekend.
Progress:
* VIRT-65 [QEMU upstream maintainership]
+ Got the VFP decodetree conversion patchset out for review
(42 patches, 8 files changed, 3024 insertions(+), 1476 deletions(-))
+ sent a patchset which does the (easy) first step in my plan
for converting QEMU's documentation to Sphinx; sadly all the other
steps are much trickier...
thanks
-- PMM
== Progress ==
* FDPIC
- GCC: No progress, still waiting for feedback
* GCC upstream validation:
- reported a few regressions.
* GCC:
- UBSAN/bare-metal: added sync primitives implementation for low-end
cores (eg cortex-m0) Seems OK
- noinit attribute: started work
* Infra
- Fixed Dejagnu configuration issue in ABE which prevented us from
using target variant specifications
- Investigated problems with ABE and failure to cross-build "recent"
glibc (trouble with C++ compiler detection). ABE patches under review.
- Reduced load on APM machines to avoid depending on them too much
== Next ==
GCC:
- handle feedback on FDPIC patches
- noinit attribute
- UBSAN/bare-metal: do more testing
Hi,
Food for thought for today's sync up. I've been writting QEMU plugins to
exercise the plugin system and see what sort of useful information you
can extract when you can control the instruction stream.
For example I now have a plugin that can break down instruction counts
for any given run, for example a kernel boot:
Instruction Classes:
Class: UDEF not counted
Class: SVE (68 hits)
Class: Reserved (0 hits)
Class: PCrel addr (4589078 hits)
Class: Add/Sub (imm,tags) (0 hits)
Class: Add/Sub (imm) (26832113 hits)
Class: Logical (imm) (74304974 hits)
Class: Move Wide (imm) (10933759 hits)
Class: Bitfield (71470957 hits)
Class: Extract (85655 hits)
Class: Data Proc Imm (0 hits)
Class: Cond Branch (imm) (37227632 hits)
Class: Exception Gen (6 hits)
Class: NOP not counted
Class: Hints (244825554 hits)
Class: Barriers (1668558 hits)
Class: PSTATE (202144 hits)
Class: System Insn (7132992 hits)
Class: System Reg (2268308 hits)
Class: Branch (reg) (6280976 hits)
Class: Branch (imm) (18347905 hits)
Class: Cmp & Branch (180167025 hits)
Class: Tst & Branch (4092972 hits)
Class: Branches (0 hits)
Class: AdvSimd ldstmult (0 hits)
Class: AdvSimd ldstmult++ (0 hits)
Class: AdvSimd ldst (0 hits)
Class: AdvSimd ldst++ (0 hits)
Class: ldst excl (160861365 hits)
Class: Prefetch (0 hits)
Class: Load Reg (lit) (12828544 hits)
Class: ldst noalloc pair (0 hits)
Class: ldst pair (60381349 hits)
Class: ldst reg (0 hits)
Class: Atomic ldst (0 hits)
Class: ldst reg (reg off) (0 hits)
Class: ldst reg (pac) (0 hits)
Class: ldst reg (imm) (119597941 hits)
Class: Loads & Stores (0 hits)
Class: Data Proc Reg (113586343 hits)
Class: Scalar FP (0 hits)
Class: Unclassified (0 hits)
You can break down each class to individual instructions. For example
the Hints are mostly:
Individual Instructions:
Instr: wfe (132400072 hits) (op=0xd503205f/ Hints)
Instr: sevl (66433640 hits) (op=0xd50320bf/ Hints)
Instr: yield (29619246 hits) (op=0xd503203f/ Hints)
Instr: wfi (2865 hits) (op=0xd503207f/ Hints)
So I'm looking for a similar experiment that would be useful for the
memory sub-system. When I chatted to Maxim we thought maybe a simplified
cache line simulator might be useful. The aim wouldn't be to simulate
what a real cache might do but to be useful say for identifying regions
of code which might be susceptible to cache line bouncing. So as
compiler writers what sort of run time memory behaviour would you like
to track? What sort of information would be useful to extract with such
a tool?
I'm open to ideas ;-)
--
Alex Bennée
Four day week.
[VIRT-327 # Richard's upstream QEMU work ]
Reviewed s390 fp vector patch set.
Posted v16 rx. This seemed so close to being ready
last week, but now I don't know. I think I should
quit pushing it myself and let Yoshinori do more of
the lifting here.
Reviewed avr v20 patch set.
Reviewed Alex's testing patch set.
Submitted patches to constify upstream capstone
(500k from .data to .rodata).
r~