Maxim Kuvyrkov maxim.kuvyrkov@linaro.org writes:
Hi Richard,
Heads up, our benchmarking CI flagged your commit to cause 23% regression in 549.fotonik3d_r on Cortex-A57 at -O3.
Do you have internal benchmarks for this change?
Yeah, but we don't see any change in fotonik from before this change. That's running on Neoverse V1 with and (separately) without SVE enabled.
Could you look into why the new code is performing more poorly than the old code for your set-up? I can then look into whether the change in output is expected or not.
Thanks, Richard
Thanks!
-- Maxim Kuvyrkov https://www.linaro.org
On Mar 24, 2024, at 03:43, ci_notify@linaro.org wrote:
Dear contributor, our automatic CI has detected problems related to your patch(es). Please find some details below. If you have any questions, please follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's #linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the usual project channel.
We appreciate that it might be difficult to find the necessary logs or reproduce the issue locally. If you can't get what you need from our CI within minutes, let us know and we will be happy to help.
We track this report status in https://linaro.atlassian.net/browse/GNU-1181 , please let us know if you are looking at the problem and/or when you have a fix.
In CI config tcwg_bmk-code_speed-cpu2017rate/gnu-aarch64-master-O3 after:
| commit gcc-14-9157-gff442719cdb | Author: Richard Sandiford richard.sandiford@arm.com | Date: Fri Feb 23 14:12:55 2024 +0000 | | aarch64: Spread out FPR usage between RA regions [PR113613] | | early-ra already had code to do regrename-style "broadening" | of the allocation, to promote scheduling freedom. However, | the pass divides the function into allocation regions | and this broadening only worked within a single region. | This meant that if a basic block contained one subblock | ... 30 lines of the commit log omitted.
the following benchmarks slowed down by more than 3%:
- slowed down by 23% - 549.fotonik3d_r - from 16467 to 20213 perf samples
the following hot functions slowed down by more than 15% (but their benchmarks slowed down by less than 3%):
- slowed down by 88% - 549.fotonik3d_r:[.] __material_mod_MOD_mat_updatee - from 4373 to 8204 perf samples
The configuration of this build is: Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
Configuration:
- Benchmark: SPEC CPU2017
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
-----------------8<--------------------------8<--------------------------8<-------------------------- The information below can be used to reproduce a debug environment:
Current build : https://ci.linaro.org/job/tcwg_bmk-code_speed-cpu2017rate--gnu-aarch64-maste... Reference build : https://ci.linaro.org/job/tcwg_bmk-code_speed-cpu2017rate--gnu-aarch64-maste...
Reproduce last good and first bad builds: https://git-us.linaro.org/toolchain/ci/interesting-commits.git/plain/gcc/sha...
Full commit : https://github.com/gcc-mirror/gcc/commit/ff442719cdb64c9df9d069af88e90d51bee...
List of configurations that regressed due to this commit :
- tcwg_bmk-code_speed-cpu2017rate
** gnu-aarch64-master-O3 *** slowed down by 23% - 549.fotonik3d_r *** https://git-us.linaro.org/toolchain/ci/interesting-commits.git/plain/gcc/sha... *** https://ci.linaro.org/job/tcwg_bmk-code_speed-cpu2017rate--gnu-aarch64-maste...