On Tue, 26 Apr 2011, Michael Hope wrote:
Hi Barry. I think the toolchain is operating correctly here. The current version recognises a divide followed by a modulo and optimises this into a call to the standard EABI function __aeabi__uldivmod(). Note the code:
do_div(Kpart, source); K = Kpart & 0xFFFFFFFF; /* Check if we need to round */ if ((K % 10) >= 5) K += 5;
This function is provided by libgcc for normal applications. The kernel provides it's own versions in arch/arm/lib/lib1funcs.s but is missing __aeabi_uldivmod (note the 'l' for 64 bit).
The kernel is omitting this function on purpose. The idea is to prevent people from ever using 64-bit by 64-bit divisions since they are always costly and avoidable.
This is why the kernel provides a do_div() macro: to allow for 64-bit dividend by only 32-bit divisors. And this stems from the fact that gcc has no (or used not to have) patterns to match a division with a 64-bit dividend and a 32-bit divisor, hence it promotes the divisor to a 64-bit value and perform the costly division that the kernel wants to avoid.
Worse, gcc isn't smart enough to optimize the operation even when the divisor is constant, which is quite a common operation in the kernel. This is why many years ago I wrote the code for the do_div() version you can find in arch/arm/include/asm/div64.h where the division is turned into a reciprocal multiplication. For example, despite the amount of added C code, do_div(x, 10000) now produces the following assembly code (where x is assigned to r0-r1):
adr r4, .L0 ldmia r4, {r4-r5} umull r2, r3, r4, r0 mov r2, #0 umlal r3, r2, r5, r0 umlal r3, r2, r4, r1 mov r3, #0 umlal r2, r3, r5, r1 mov r0, r2, lsr #11 orr r0, r0, r3, lsl #21 mov r1, r3, lsr #11 ... .L0: .word 948328779 .word 879609302
But I digress. This is just to say that gcc shouldn't pull __aeabi_uldivmod in this case because:
1) the division and the modulus are not performed on the same operands;
2) the modulus is performed on a 32-bit variable;
3) the do_div() implementation looks like nothing that gcc could recognize as being a division.
Therefore I don't see how the right pattern could have been matched.
Nicolas