All,
In the Toolchain Working Group Mans has been doing some examination of SPEC 2000 and SPEC 2006 to see what C Library (glibc) routines impact performance the most, and are worth tuning.
This has come up with two areas we consider worthy of further investigation: 1) malloc performance 2) Floating-point rounding functions.
This email is interested with the first of these.
Analysis of malloc shows large amounts of time is spent in executing synchronization primitives even when the program under test is single-threaded.
The obvious 'fix' is to remove the synchronization primitives which will give a performance boost. This is, of course, not safe and will require reworking malloc's algorithms to be (substantially) synchronization free.
A quick Google suggests that there are better performing algorithms available (TCMalloc, Lockless, Hoard, &c), and so changing glibc's algorithm is something well worth investigating.
Currently we see around 4.37% of time being spent in libc for the whole of SPEC CPU 2006. Around 75% of that is in malloc related functions (so about 3.1% of the total). One benchmark however spends around 20% of its time in malloc. So overall we are looking at maybe 1% improvement in the SPEC 2006 score, which is not large given the amount of effort I estimate this is going to require (as we have to convince the community we have made everyone's life better).
So before we go any further I would like to see what the view of LEG is about a better malloc. My questions boil down to:
* Is malloc important - or do server applications just implement their own? * Do you have any benchmarks that stress malloc and would provide us with some more data points?
But any and all comments on the subject are welcome.
Thanks,
Matt