On 03/03/16 00:44, kugan wrote:
I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words: cmp x2, 8 <<< (1) sub sp, sp, #64 <<< (2) bhi .L2 cmp w2, 8 <<< (1) bls .L15 .L2: add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
Agreed. This could probably be done by the mid-end based on value range propagation. Please can you file a report in gcc bugzilla?
Not sure how we can do this in VRP. It seems that this is generated during the RTL expansion time. Maybe,it has to be done during expansion. optimized tree looks like:
Ramana and I looked further into thsi last night. It turns out this is due to the way we expand switch tables. The ARM and AArch64 back-ends both use the casesi pattern which is defined to do a range check and a branch into the table. The range check is based on a 32-bit value.
Because this example uses a 64-bit type as the controlling expression, the mid-end has to insert another check that the original value is within range; this renders the second check redundant but there's then no way to remove that. You're correct that VRP isn't going to help here.
We're looking at whether we can adjust things to use the tablejump expander, since that should eliminate the need for the second check.
;; Function pd_disjoint_words (pd_disjoint_words, funcdef_no=0, decl_uid=2763, cgraph_uid=0, symbol_order=0)
Removing basic block 13 pd_disjoint_words (HeapWord * from, HeapWord * to, size_t count) { long int t$b; long int t$a; struct unit t; struct unit t; struct unit t; struct unit t; struct unit t; struct unit t; long int _5;
<bb 2>: switch (count_2(D)) <default: <L16>, case 0: <L18>, case 1: <L1>, case 2: <L2>, case 3: <L4>, case 4: <L6>, case 5: <L8>, case 6: <L10>, case 7: <L12>, case 8: <L14>>
<L1>: _5 = *from_4(D); *to_6(D) = _5; goto <bb 12> (<L18>);
<L2>: t$a_8 = MEM[(struct unit *)from_4(D)]; t$b_9 = MEM[(struct unit *)from_4(D) + 8B]; MEM[(struct unit *)to_6(D)] = t$a_8; MEM[(struct unit *)to_6(D) + 8B] = t$b_9; goto <bb 12> (<L18>);
<L4>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L6>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L8>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L10>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L12>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L14>: t = MEM[(struct unit *)from_4(D)]; MEM[(struct unit *)to_6(D)] = t; t ={v} {CLOBBER}; goto <bb 12> (<L18>);
<L16>: _Copy_disjoint_words (from_4(D), to_6(D), count_2(D)); [tail call]
<L18>: return;
}
Thanks, Kugan
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.