Hello,
- Is SMS arch-specific, or is this implementation in particular ARM-specific?
SMS and the GCC implementation are not arch-specific. In general, SMS should be beneficial when applied on in-order machines (or machines with limited OOO capabilities).
- What's the expected benefit out of this? Is it a vague "should make things a bit faster" or are there indications that it is a significant win for certain scenarios?
It is not always beneficial to turn on SMS as it increases code size and can increase register pressure. I'm looking for those cases where the register pressure increases so we could construct a cost model based on it. Due to the fact there is currently no memory dependence analysis in the phase that SMS is applied (Richard Sandifrod is working on this); we have seen in the past that it is beneficial for loops with no read/write dependency as for loops with the later we are conservative when constructing the dependencies between instructions which eventually limits the algorithm. So in the past we have seen that kernels with simple accumulator showed improvements with SMS.
Thanks, Revital
Thanks,
Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs