I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words: cmp x2, 8 <<< (1) sub sp, sp, #64 <<< (2) bhi .L2 cmp w2, 8 <<< (1) bls .L15 .L2: add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
Agreed. This could probably be done by the mid-end based on value range propagation. Please can you file a report in gcc bugzilla?
Not sure how we can do this in VRP. It seems that this is generated during the RTL expansion time. Maybe,it has to be done during expansion. optimized tree looks like:
;; Function pd_disjoint_words (pd_disjoint_words, funcdef_no=0, decl_uid=2763, cgraph_uid=0, symbol_order=0)
Removing basic block 13 pd_disjoint_words (HeapWord * from, HeapWord * to, size_t count) { long int t$b; long int t$a; struct unit t; struct unit t; struct unit t; struct unit t; struct unit t; struct unit t; long int _5;
<bb 2>: switch (count_2(D)) <default: <L16>, case 0: <L18>, case 1: <L1>, case 2: <L2>, case 3: <L4>, case 4: <L6>, case 5: <L8>, case 6: <L10>, case 7: <L12>, case 8: <L14>>
<L1>: _5 = *from_4(D); *to_6(D) = _5; goto <bb 12> (<L18>);
<L2>: t$a_8 = MEM[(struct unit *)from_4(D)]; t$b_9 = MEM[(struct unit *)from_4(D) + 8B]; MEM[(struct unit *)to_6(D)] = t$a_8; MEM[(struct unit *)to_6(D) + 8B] = t$b_9; goto <bb 12> (<L18>);
<L4>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L6>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L8>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L10>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L12>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L14>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L16>: _Copy_disjoint_words (from_4(D), to_6(D), count_2(D)); [tail call]
<L18>: return;
}
Thanks, Kugan