Re: [PATCH] MIPS: Fix a longstanding error in div64.h

9 Apr 2021


      Huacai Chen chenhuacai@kernel.org 于2021年4月8日周四 下午12:56写道：
...
Hi, Maciej,
On Wed, Apr 7, 2021 at 9:38 PM Maciej W. Rozycki macro@orcam.me.uk wrote:
...
On Wed, 7 Apr 2021, Huacai Chen wrote:
...
...
This code is rather broken in an obvious way, starting from:
    unsigned long long __n;                                         \
                                                                    \
    __high = *__n >> 32;                                            \
    __low = __n;                                                    \


where `__n' is used uninitialised.  Since this is my code originally I'll
look into it; we may want to reinstate `do_div' too, which didn't have to
be removed in the first place.
I think we can reuse the generic do_div().
We can, but it's not clear to me if this is optimal.  We have a DIVMOD
instruction which original code took advantage of (although I can see
potential in reusing bits from include/asm-generic/div64.h).  The two
implementations would have to be benchmarked against each other across a
couple of different CPUs.
The original MIPS do_div() has "h" constraint, and this is also the
reason why Ralf rewrote this file. How can we reintroduce do_div()
without "h" constraint?
I try to figure out a new version:
uint32_t __attribute__ ((noinline)) div64_32n(uint64_t *x, uint32_t b) {
        uint64_t a = *x;
uint64_t t1 = ((a>>32)/b)<<32;
        uint32_t t2 = (a>>32)%b;
uint32_t res = (uint32_t)a;
        uint32_t t1lo = 0;
uint32_t t3 = 0xffffffffu/b;
        uint32_t t4 = t3*b;
        uint32_t hi, lo;
while(t2>0) {
                __asm__ (
                        "multu %2, %3\n"
                        "mfhi %0\n"
                        "mflo %1\n"
                        : "=r" (hi), "=r"(lo)
                        : "r" (t4), "r"(t2)
                );
// yes, we are sure that t2*t3 will not overflow
                t1lo += (t3*t2);
                t2 -= hi;
                if (lo > 0) {
                        t2 --; // we are sure that t2 > 0
                        lo = 0xffffffff - lo + 1;
                        unsigned tmp = lo + res;
                        // overflow
                        if (tmp < lo || tmp < res) {
                                t2 ++;
                        }
                        res = tmp;
                }
        }
        if (res >= b) {
                t1lo += (res/b);
                res = (res%b);
        }
t1 += t1lo;
        *x = t1;
        return res;
}
With some test the performace: ((uint64_t)(-1))/3 with 0xfffff times
GCC: 5555555555555555, 0, seconds: 5
SYQ: 5555555555555555, 0, seconds: 4
KER: 5555555555555555, 0, seconds: 8
RAL: ffffffff, 2, seconds: 4
1. the MIPS current asm version cost 4s (and wrong result)
2. the simplest C code : a/b && a % b, cost 5s
3. the asm-generic version cost 8s.
4. my version cost 4s.
And the question is why asm-generic version exists
since it has bad performance than the code generated by GCC?
...
Huacai
...
...
...
Huacai, thanks for your investigation!  Please be more careful in
verifying your future submissions however.
Sorry, I thought there is only one bug in div64.h, but in fact there
are three...
This just shows the verification you made was not good enough, hence my
observation.
Maciej
-- 
YunQiang Su

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] MIPS: Fix a longstanding error in div64.h