[CC: Richard S.]
Hi Wilco,
We use Nvidia TK1s (Cortex-A15) for benchmarking on 32-bit ARM.
LTO tends to increase functions due to additional inlining, which increases scheduling regions, which increases opportunities for the 1st scheduler for inter-block instruction moves, which increases register pressure.
SCHED_PRESSURE_MODEL handles cases with high register pressure well, and switching it off caused a few additional spills in the hot blocks, which caused the slow-down.
It may be worthwhile to bring SCHED_PRESSURE_MODEL back when LTO is enabled.
-- Maxim Kuvyrkov https://www.linaro.org
On 12 Jul 2021, at 13:25, Wilco Dijkstra Wilco.Dijkstra@arm.com wrote:
Hi Maxim,
That sounds rather strange, huge differences due to scheduling are very rare. Which micro architecture was this run on? I can try running it on trunk and see what difference it makes with those options.
Cheers, Wilco IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.