On 30/05/2019 07:27, Alex Bennée wrote:
Hi,
Food for thought for today's sync up. I've been writting QEMU plugins to exercise the plugin system and see what sort of useful information you can extract when you can control the instruction stream.
For example I now have a plugin that can break down instruction counts for any given run, for example a kernel boot:
Instruction Classes: Class: UDEF not counted Class: SVE (68 hits) Class: Reserved (0 hits) Class: PCrel addr (4589078 hits) Class: Add/Sub (imm,tags) (0 hits) Class: Add/Sub (imm) (26832113 hits) Class: Logical (imm) (74304974 hits) Class: Move Wide (imm) (10933759 hits) Class: Bitfield (71470957 hits) Class: Extract (85655 hits) Class: Data Proc Imm (0 hits) Class: Cond Branch (imm) (37227632 hits) Class: Exception Gen (6 hits) Class: NOP not counted Class: Hints (244825554 hits) Class: Barriers (1668558 hits) Class: PSTATE (202144 hits) Class: System Insn (7132992 hits) Class: System Reg (2268308 hits) Class: Branch (reg) (6280976 hits) Class: Branch (imm) (18347905 hits) Class: Cmp & Branch (180167025 hits) Class: Tst & Branch (4092972 hits) Class: Branches (0 hits) Class: AdvSimd ldstmult (0 hits) Class: AdvSimd ldstmult++ (0 hits) Class: AdvSimd ldst (0 hits) Class: AdvSimd ldst++ (0 hits) Class: ldst excl (160861365 hits) Class: Prefetch (0 hits) Class: Load Reg (lit) (12828544 hits) Class: ldst noalloc pair (0 hits) Class: ldst pair (60381349 hits) Class: ldst reg (0 hits) Class: Atomic ldst (0 hits) Class: ldst reg (reg off) (0 hits) Class: ldst reg (pac) (0 hits) Class: ldst reg (imm) (119597941 hits) Class: Loads & Stores (0 hits) Class: Data Proc Reg (113586343 hits) Class: Scalar FP (0 hits) Class: Unclassified (0 hits)
You can break down each class to individual instructions. For example the Hints are mostly:
Individual Instructions: Instr: wfe (132400072 hits) (op=0xd503205f/ Hints) Instr: sevl (66433640 hits) (op=0xd50320bf/ Hints) Instr: yield (29619246 hits) (op=0xd503203f/ Hints) Instr: wfi (2865 hits) (op=0xd503207f/ Hints)
So I'm looking for a similar experiment that would be useful for the memory sub-system. When I chatted to Maxim we thought maybe a simplified cache line simulator might be useful. The aim wouldn't be to simulate what a real cache might do but to be useful say for identifying regions of code which might be susceptible to cache line bouncing. So as compiler writers what sort of run time memory behaviour would you like to track? What sort of information would be useful to extract with such a tool?
I'm open to ideas ;-)
Back at IBM one internal project we usually regularly was an instruction tracer based on a out-of-tree patch to valgrind. The idea was to get precise instruction sequence for a specific text segment boundary so we could it loaded it later on a powerpc simulator to post-analyse the code behaviour regarding instruction latency, op-ports utilization, cpu stalls etc.
Not sure if would be that useful without a post-analysis tool, but I think it might be useful to some arch-specific optimization. What do you think?