Hello,
While testing SMS on Crotex-A9 I see that the latency of load instruction is 1 cycle when compiling with -mcpu=cortex-a9 -mthumb -mtune=cortex-a9 -O3.
Below is a snippet from the SMS dump file showing the DDG, created for the loop in foo function, which depicts the edge between the load of input[i] (insn 181) and the mult instruction (insn 184). [181 -(T,1,0)-> 184] is the true dependence edge created between the two insns; with latency of 1. On Crotex-A8 the latency of the load is 3 as expected. I've read in crotex-a9.md file that loads should have a latency of 4 cycles so I just wanted to check if I should have used other combination of flags for Crotex-A9 or the load latency should indeed be of 1 cycle here.
Thanks, Revital
int foo (int max, signed short *input, int y) { int i, accum;
for (i = 0; i < max; i++) { accum += (signed int) input[i] * (signed int) input[i+y]; } return accum; }
The snippet from the DDG:
Node num: 2 (insn 181 178 184 13 (set (reg:SI 216 [ D.2019 ]) (zero_extend:SI (mem:HI (plus:SI (reg:SI 319 [ ivtmp.34 ]) (reg:SI 345)) [2 MEM[base: D.2076_257, index: D.2079_226, offset: 0B]+0 S2 A16]))) tmp.c:7 714 {*thumb2_zero_extendhisi2_v6} (nil)) OUT ARCS: [181 -(A,0,1)-> 176] [181 -(T,1,0)-> 184] IN ARCS: [184 -(A,0,1)-> 181] [176 -(T,1,0)-> 181] Node num: 3 (insn 184 181 234 13 (set (reg/v:SI 209 [ accum ]) (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 212 [ D.2013 ]) 0)) (sign_extend:SI (subreg/s/u:HI (reg:SI 216 [ D.2019 ]) 0))) (reg/v:SI 209 [ accum ]))) tmp.c:7 64 {maddhisi4} (expr_list:REG_DEAD (reg:SI 216 [ D.2019 ]) (expr_list:REG_DEAD (reg:SI 212 [ D.2013 ]) (nil))))