Hi Matt. I've fleshed out the Etherpad for the PGO and LTO session at: http://pad.linaro.org/GzRj35tXFt
It's a topic list that needs some specifics. Could you make sure we have basic answers to any correctness or performance questions?
Ramana, could you add the specifics from the performance call? I can't seem to find the right meeting in the logs.
-- Michael
On 15 August 2012 05:36, Michael Hope michael.hope@linaro.org wrote:
Hi Matt. I've fleshed out the Etherpad for the PGO and LTO session at: http://pad.linaro.org/GzRj35tXFt
I've expanded it a bit further.
It's a topic list that needs some specifics. Could you make sure we have basic answers to any correctness or performance questions?
'We are as correct as upstream - I found 101 PRs which are in the LTO/gcov-profile components. There are others.
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.
Ramana, could you add the specifics from the performance call? I can't seem to find the right meeting in the logs.
Ramana and I had a discussion this afternoon - I think my strawman proposal for a blueprint for the next 3-6 months. goes something like the following:
* Get Hot/Cold partitioning working * Pick a benchmark/application and ensure PGO/LTO works for it * Then look at how the performance changes, and find the changes * I-I into the next app to look at, and what needs doing.
Candidates for the benchmark/application: * Spec * Firefox * Linux Kernel * OpenOffice/LibreOffice
SPEC is because Ramana's current Hot/Cold partitioning breaks it. Firefox, Linux, and OOo all have bugs raised against them in the GCC BugZilla - indicating people are interested in them. But a discussion at the session at what others would be interested in would be good.
Do we need a slide as Kiko suggested or will the EtherPad be good enough?
I plan to be lurking at the VC session into Android later on today so we can have a side discussion then if you want.
Thanks,
Matt
On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.
On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.
On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.
On 15 August 2012 22:38, Mans Rullgard mans.rullgard@linaro.org wrote:
On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.
On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.
On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.
Sorry for crying wolf. I redid the LTO build and the huge performance drop is gone. Now I'm getting minor gains on most tests and 5-8% loss on a few. More worrying is that it is now failing a few tests. I'll look into both issues and report back.
On 16 August 2012 13:04, Mans Rullgard mans.rullgard@linaro.org wrote:
On 15 August 2012 22:38, Mans Rullgard mans.rullgard@linaro.org wrote:
On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.
On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.
On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.
Sorry for crying wolf. I redid the LTO build and the huge performance drop is gone. Now I'm getting minor gains on most tests and 5-8% loss on a few. More worrying is that it is now failing a few tests. I'll look into both issues and report back.
The test failures are caused by a known bug in 4.8 trunk (54132).
After hacking the build system to make sure exactly the same optimisation flags are used when compiling and linking, I'm getting a 4% gain as best result and 1% loss on a couple of tests. Most tests change less than 1%.
If the optimisation flags for compiling and linking differ, all kinds of bad things seem to happen.
linaro-toolchain@lists.linaro.org