On Tue, Sep 14, 2021 at 11:55 AM Linus Torvalds torvalds@linux-foundation.org wrote:
Btw, these kinds of issues is exactly why I've been hardnosed about 64-bit divides for decades. 64-bit divides on 32-bit machines are *expensive*. It's why I don't like saying "just use '/' and we'll pick up the routines from libgcc".
I was going to ask about the history there; not to derail the thread further, but this is a question whose answer is important to me.
Are the helpers from libgcc insufficient? Working through https://github.com/ClangBuiltLinux/linux/issues/1438 which all came about because LLVM's equivalent of libgcc, "compiler-rt," had a nice helper for builtin multiply with overflow check that libgcc does not. As such, llvm cannot assume compiler-rt is being linked against, so llvm must expand these inline every time. And the code in line is HUGE: https://godbolt.org/z/MM4hPGxTE. IMO we could do a much much better job on code size (and thus probably I$ performance improvements) had we just linked against the compiler runtime.
Perhaps the concern is of the quality of implementations of the compiler runtime routines; that we may have arch specific implementations that are better? 64b division on 32b targets is expensive either way; I'd rather have the compiler generate a libcall than try to expand these inline. I'm not sure if it's the case, but I can't help but wonder if there are other optimization decisions being based on whether the compiler runtime is being linked against or not; it's hard for the compiler to know what will happen at link time. Vaguely reminiscent of the issues we face against using -ffreestanding.
Switching that now (so that we did link in the compiler runtimes) would be a massive yak shave, for sure.
In almost all real-life cases - at least in a kernel - the full divide is unnecessary. It's almost always people being silly and lazy, and the very expensive operation can be avoided entirely (or at least minimized to something like 64/32).
At least when dealing in powers of two, sure.