PGO and LTO session preparation

List overview All Threads
Download

newer

older

[ACTIVITY] Aug 13-Aug 17

[ACTIVITY] Aug 13 - Aug 17

Michael Hope

15 Aug 2012 15 Aug '12

4:36 a.m.

Hi Matt. I've fleshed out the Etherpad for the PGO and LTO session at: http://pad.linaro.org/GzRj35tXFt

It's a topic list that needs some specifics. Could you make sure we have basic answers to any correctness or performance questions?

Ramana, could you add the specifics from the performance call? I can't seem to find the right meeting in the logs.

-- Michael

Show replies by date

Matthew Gretton-Dann

15 Aug 15 Aug

4:17 p.m.

On 15 August 2012 05:36, Michael Hope michael.hope@linaro.org wrote:

...

Hi Matt. I've fleshed out the Etherpad for the PGO and LTO session at: http://pad.linaro.org/GzRj35tXFt

I've expanded it a bit further.

...

It's a topic list that needs some specifics. Could you make sure we have basic answers to any correctness or performance questions?

'We are as correct as upstream - I found 101 PRs which are in the LTO/gcov-profile components. There are others.

The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.

...

Ramana, could you add the specifics from the performance call? I can't seem to find the right meeting in the logs.

Ramana and I had a discussion this afternoon - I think my strawman proposal for a blueprint for the next 3-6 months. goes something like the following:

* Get Hot/Cold partitioning working * Pick a benchmark/application and ensure PGO/LTO works for it * Then look at how the performance changes, and find the changes * I-I into the next app to look at, and what needs doing.

Candidates for the benchmark/application: * Spec * Firefox * Linux Kernel * OpenOffice/LibreOffice

SPEC is because Ramana's current Hot/Cold partitioning breaks it. Firefox, Linux, and OOo all have bugs raised against them in the GCC BugZilla - indicating people are interested in them. But a discussion at the session at what others would be interested in would be good.

Do we need a slide as Kiko suggested or will the EtherPad be good enough?

I plan to be lurking at the VC session into Android later on today so we can have a side discussion then if you want.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Mans Rullgard

9:38 p.m.

On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...

The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.

On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.

On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.

-- Mans Rullgard / mru

Mans Rullgard

16 Aug 16 Aug

12:04 p.m.

On 15 August 2012 22:38, Mans Rullgard mans.rullgard@linaro.org wrote:

...

On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.

On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.

On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.

Sorry for crying wolf. I redid the LTO build and the huge performance drop is gone. Now I'm getting minor gains on most tests and 5-8% loss on a few. More worrying is that it is now failing a few tests. I'll look into both issues and report back.

-- Mans Rullgard / mru

Mans Rullgard

3:47 p.m.

On 16 August 2012 13:04, Mans Rullgard mans.rullgard@linaro.org wrote:

...

On 15 August 2012 22:38, Mans Rullgard mans.rullgard@linaro.org wrote:

...
On 15 August 2012 17:17, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
The performance of PGO on 'a popular embedded benchmark' is 14% improvement, LTO is 7%. Don't know both together, or SPEC.

On 'a popular media coding library' PGO gains 2-5% in general, in one case as much as 11%. The relative gains are larger on average with hand-written assembly disabled, but obviously nowhere near the performance with it enabled.

On the same library LTO is 2-3.5 _times_ slower than without on all tests, although it does pass the test suite.

Sorry for crying wolf. I redid the LTO build and the huge performance drop is gone. Now I'm getting minor gains on most tests and 5-8% loss on a few. More worrying is that it is now failing a few tests. I'll look into both issues and report back.

The test failures are caused by a known bug in 4.8 trunk (54132).

After hacking the build system to make sure exactly the same optimisation flags are used when compiling and linking, I'm getting a 4% gain as best result and 1% loss on a couple of tests. Most tests change less than 1%.

If the optimisation flags for compiling and linking differ, all kinds of bad things seem to happen.

-- Mans Rullgard / mru

4703

days inactive

4704

days old

linaro-toolchain@lists.linaro.org

4 comments

participants

tags (0)

participants (3)

Mans Rullgard
Matthew Gretton-Dann
Michael Hope