linaro-toolchain September 2021

linaro-toolchain@lists.linaro.org

18 participants
86 discussions

[TCWG CI] Regression caused by gcc: Daily bump.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Daily bump.: commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Daily bump. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-d4b84aefe696a5783a58a30b3fb8dc4617cd147a/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a cd investigate-gcc-d4b84aefe696a5783a58a30b3fb8dc4617cd147a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d4b84aefe696a5783a58a30b3fb8dc4617cd147a ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d4b84aefe696a5783a58a30b3fb8dc4617cd147a Author: GCC Administrator <gccadmin(a)gcc.gnu.org> Date: Tue Sep 21 00:17:57 2021 +0000 Daily bump. --- gcc/DATESTAMP | 2 +- gcc/fortran/ChangeLog | 5 +++++ gcc/testsuite/ChangeLog | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP index c1155ef2341..ed865cb70ab 100644 --- a/gcc/DATESTAMP +++ b/gcc/DATESTAMP @@ -1 +1 @@ -20210920 +20210921 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index f6863fb900a..3d53ed99f33 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,8 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' + on 'distribute' for combined directives, matching OpenMP 5.0 + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2ea65ee2d7f..7f8d142942a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2021-09-20 Tobias Burnus <tobias(a)codesourcery.com> + + * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. + 2021-09-19 Harald Anlauf <anlauf(a)gmx.de> Backported from master: </cut>

4 years, 10 months

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail

by ci_notify＠linaro.org

[TCWG CI] Regression caused by newlib: Cygwin: allow open_setup to fail: commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Cygwin: allow open_setup to fail Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-e5fcb021cc9dcb1f19d45030457be86b4a226e65/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 cd investigate-newlib-e5fcb021cc9dcb1f19d45030457be86b4a226e65 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /newlib/ ./ ./bisect/baseline/ cd newlib # Reproduce first_bad build git checkout --detach e5fcb021cc9dcb1f19d45030457be86b4a226e65 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 9b0841aa789e74b6778744b89af76b60bd1a78bc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit e5fcb021cc9dcb1f19d45030457be86b4a226e65 Author: Ken Brown <kbrown(a)cornell.edu> Date: Sat Sep 18 08:13:55 2021 -0400 Cygwin: allow open_setup to fail Convert fhandler_base::open_setup to a (virtual) method that returns a bool result. For the moment, it and its overrides always return true. --- winsup/cygwin/fhandler.cc | 3 ++- winsup/cygwin/fhandler.h | 10 +++++----- winsup/cygwin/fhandler_console.cc | 4 ++-- winsup/cygwin/fhandler_pipe.cc | 9 +++++++-- winsup/cygwin/fhandler_tty.cc | 8 ++++---- 5 files changed, 20 insertions(+), 14 deletions(-) diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc index 9dfe70be38..1af469e0c9 100644 --- a/winsup/cygwin/fhandler.cc +++ b/winsup/cygwin/fhandler.cc @@ -789,9 +789,10 @@ fhandler_base::fd_reopen (int, mode_t) return NULL; } -void +bool fhandler_base::open_setup (int) { + return true; } /* states: diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h index 61113e6981..3471e95b97 100644 --- a/winsup/cygwin/fhandler.h +++ b/winsup/cygwin/fhandler.h @@ -355,7 +355,7 @@ class fhandler_base int open_null (int flags); virtual int open (int, mode_t); virtual fhandler_base *fd_reopen (int, mode_t); - virtual void open_setup (int flags); + virtual bool open_setup (int flags); void set_unique_id (int64_t u) { unique_id = u; } void set_unique_id () { NtAllocateLocallyUniqueId ((PLUID) &unique_id); } @@ -1206,7 +1206,7 @@ public: select_record *select_except (select_stuff *); char *get_proc_fd_name (char *buf); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); void fixup_after_fork (HANDLE); int dup (fhandler_base *child, int); void set_close_on_exec (bool val); @@ -2132,7 +2132,7 @@ private: bool use_archetype () const {return true;} int open (int flags, mode_t mode); - void open_setup (int flags); + bool open_setup (int flags); int dup (fhandler_base *, int); void __reg3 read (void *ptr, size_t& len); @@ -2300,7 +2300,7 @@ class fhandler_pty_slave: public fhandler_pty_common HANDLE& get_handle_nat () { return io_handle_nat; } int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int init (HANDLE, DWORD, mode_t); @@ -2399,7 +2399,7 @@ public: void doecho (const void *str, DWORD len); int accept_input (); int open (int flags, mode_t mode = 0); - void open_setup (int flags); + bool open_setup (int flags); ssize_t __stdcall write (const void *ptr, size_t len); void __reg3 read (void *ptr, size_t& len); int close (); diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc index e00f2cdbcc..ee862b17d1 100644 --- a/winsup/cygwin/fhandler_console.cc +++ b/winsup/cygwin/fhandler_console.cc @@ -1366,13 +1366,13 @@ fhandler_console::open (int flags, mode_t) return 1; } -void +bool fhandler_console::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); if (myself->set_ctty (this, flags) && !myself->cygstarted) init_console_handler (true); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } int diff --git a/winsup/cygwin/fhandler_pipe.cc b/winsup/cygwin/fhandler_pipe.cc index 73ace3ac53..590ecf6670 100644 --- a/winsup/cygwin/fhandler_pipe.cc +++ b/winsup/cygwin/fhandler_pipe.cc @@ -191,10 +191,11 @@ out: return 0; } -void +bool fhandler_pipe::open_setup (int flags) { - fhandler_base::open_setup (flags); + if (!fhandler_base::open_setup (flags)) + goto err; if (get_dev () == FH_PIPER && !read_mtx) { SECURITY_ATTRIBUTES *sa = sec_none_cloexec (flags); @@ -211,6 +212,10 @@ fhandler_pipe::open_setup (int flags) } if (get_dev () == FH_PIPEW && !query_hdl) set_pipe_non_blocking (is_nonblocking ()); + return true; + +err: + return false; } off_t diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 1ea9a47ac5..05fe5348af 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -964,13 +964,13 @@ err_no_msg: return 0; } -void +bool fhandler_pty_slave::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); myself->set_ctty (this, flags); report_tty_counts (this, "opened", ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } void @@ -1947,14 +1947,14 @@ fhandler_pty_master::open (int flags, mode_t) return 1; } -void +bool fhandler_pty_master::open_setup (int flags) { set_flags ((flags & ~O_TEXT) | O_BINARY); char buf[sizeof ("opened pty master for ptyNNNNNNNNNNN")]; __small_sprintf (buf, "opened pty master for pty%d", get_minor ()); report_tty_counts (this, buf, ""); - fhandler_base::open_setup (flags); + return fhandler_base::open_setup (flags); } off_t </cut>

4 years, 10 months

gcc-linaro-6.3.1-2017.05-i686_aarch64-elf.tar.xz

by maytte sanchez

I’m trying to import these files into Ds-5. After unzipping files, it still will not show up in ds-5 search. Below is the error that I keep receiving: Sent from my iPhone

4 years, 10 months

[TCWG CI] Regression caused by binutils: Automatic date update in version.in

by ci_notify＠linaro.org

[TCWG CI] Regression caused by binutils: Automatic date update in version.in: commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Automatic date update in version.in Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-27439f0edab99c6870cf7fe042074e47632f3fbd/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd cd investigate-binutils-27439f0edab99c6870cf7fe042074e47632f3fbd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /binutils/ ./ ./bisect/baseline/ cd binutils # Reproduce first_bad build git checkout --detach 27439f0edab99c6870cf7fe042074e47632f3fbd ../artifacts/test.sh # Reproduce last_good build git checkout --detach 6060c2f3373e18f76fa9e3e4d7cf2f3d5983da03 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 27439f0edab99c6870cf7fe042074e47632f3fbd Author: GDB Administrator <gdbadmin(a)sourceware.org> Date: Tue Sep 21 00:00:39 2021 +0000 Automatic date update in version.in --- bfd/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bfd/version.h b/bfd/version.h index 72a41aba322..338c1288a22 100644 --- a/bfd/version.h +++ b/bfd/version.h @@ -16,7 +16,7 @@ In releases, the date is not included in either version strings or sonames. */ -#define BFD_VERSION_DATE 20210920 +#define BFD_VERSION_DATE 20210921 #define BFD_VERSION @bfd_version@ #define BFD_VERSION_STRING @bfd_version_package@ @bfd_version_string@ #define REPORT_BUGS_TO @report_bugs_to@ </cut>

4 years, 10 months

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: GCC11 - Fortran: combined directives - order(concurrent) not on distribute: commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> GCC11 - Fortran: combined directives - order(concurrent) not on distribute Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-b1dc26d3543d79805751c26ba5b142eeeb1f55b8/results_id: 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -8 # build_abe newlib: -6 # build_abe stage2 -- --patch linaro-local/vect-metric-branch --set gcc_override_configure=--disable-libsanitizer --set gcc_override_configure=--disable-multilib --set gcc_override_configure=--with-cpu=cortex-m4 --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--with-float=hard: -5 # true: 0 # benchmark -- -O3_LTO_VECT_mthumb artifacts/build-baseline/results_id: 1 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_eabi_stm32/gnu_eabi-release-arm_eabi-coremark-O3_LTO_VECT First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… Reproduce builds: <cut> mkdir investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 cd investigate-gcc-b1dc26d3543d79805751c26ba5b142eeeb1f55b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu_eabi-bisect-tcwg_bmk_stm32-gnu_ea… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach b1dc26d3543d79805751c26ba5b142eeeb1f55b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 79c523d40de1b7ce1dd0f4865c0855ab2bf6744b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit b1dc26d3543d79805751c26ba5b142eeeb1f55b8 Author: Tobias Burnus <tobias(a)codesourcery.com> Date: Mon Sep 20 17:24:56 2021 +0200 GCC11 - Fortran: combined directives - order(concurrent) not on distribute While OpenMP 5.1 and GCC 12 permits 'order(concurrent)' on distribute, OpenMP 5.0 and GCC 11 don't. This patch for GCC 11 ensures the clause also does not end up on 'distribute' when splitting combined directives. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_split_omp_clauses): Don't put 'order(concurrent)' on 'distribute' for combined directives, matching OpenMP 5.0 gcc/testsuite/ChangeLog: * gfortran.dg/gomp/distribute-order-concurrent.f90: New test. --- gcc/fortran/trans-openmp.c | 2 -- .../gomp/distribute-order-concurrent.f90 | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7e931bf4bc7..973d916b4a2 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -5176,8 +5176,6 @@ gfc_split_omp_clauses (gfc_code *code, /* Duplicate collapse. */ clausesa[GFC_OMP_SPLIT_DISTRIBUTE].collapse = code->ext.omp_clauses->collapse; - clausesa[GFC_OMP_SPLIT_DISTRIBUTE].order_concurrent - = code->ext.omp_clauses->order_concurrent; } if (mask & GFC_OMP_MASK_PARALLEL) { diff --git a/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 new file mode 100644 index 00000000000..9597d913684 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/distribute-order-concurrent.f90 @@ -0,0 +1,25 @@ +! { dg-additional-options "-fdump-tree-original" } +! +! In OpenMP 5.0, 'order(concurrent)' does not apply to distribute +! Ensure that it is rejected in GCC 11. +! +! Note: OpenMP 5.1 allows it; the GCC 12 testcase for it is gfortran.dg/gomp/order-5.f90 + +subroutine f(a) +implicit none +integer :: i, thr +!save :: thr +integer :: a(:) + +!$omp distribute parallel do order(concurrent) private(thr) + do i = 1, 10 + thr = 5 + a(i) = thr + end do +!$omp end distribute parallel do +end + +! { dg-final { scan-tree-dump-not "omp distribute\[^\n\r]*order" "original" } } +! { dg-final { scan-tree-dump "#pragma omp distribute\[\n\r\]" "original" } } +! { dg-final { scan-tree-dump "#pragma omp parallel private\$thr\$" "original" } } +! { dg-final { scan-tree-dump "#pragma omp for nowait order\$concurrent\$" "original" } } </cut>

4 years, 10 months

[TCWG CI] 450.soplex grew in size by 2% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks grew in size by more than 1%: - 450.soplex grew in size by 2% from 207260 to 211436 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

4 years, 10 months

[TCWG CI] 464.h264ref slowed down by 7% after llvm: [JumpThreading] Ignore free instructions

by ci_notify＠linaro.org

After llvm commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> [JumpThreading] Ignore free instructions the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 7% from 10715 to 11434 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff cd investigate-llvm-1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1a6e1ee42a6af255d45e3fd2fe87021dd31f79bb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff Author: Nikita Popov <nikita.ppv(a)gmail.com> Date: Wed Sep 22 21:34:24 2021 +0200 [JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290 --- .../include/llvm/Transforms/Scalar/JumpThreading.h | 8 +-- llvm/lib/Transforms/Scalar/JumpThreading.cpp | 61 ++++++++++------------ .../Transforms/JumpThreading/free_instructions.ll | 24 +++++---- .../inlining-alignment-assumptions.ll | 12 ++--- 4 files changed, 52 insertions(+), 53 deletions(-) diff --git a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h index 816ea1071e52..0ac7d7c62b7a 100644 --- a/llvm/include/llvm/Transforms/Scalar/JumpThreading.h +++ b/llvm/include/llvm/Transforms/Scalar/JumpThreading.h @@ -44,6 +44,7 @@ class PHINode; class SelectInst; class SwitchInst; class TargetLibraryInfo; +class TargetTransformInfo; class Value; /// A private "module" namespace for types and utilities used by @@ -78,6 +79,7 @@ enum ConstantPreference { WantInteger, WantBlockAddress }; /// revectored to the false side of the second if. class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> { TargetLibraryInfo *TLI; + TargetTransformInfo *TTI; LazyValueInfo *LVI; AAResults *AA; DomTreeUpdater *DTU; @@ -99,9 +101,9 @@ public: JumpThreadingPass(bool InsertFreezeWhenUnfoldingSelect = false, int T = -1); // Glue for old PM. - bool runImpl(Function &F, TargetLibraryInfo *TLI, LazyValueInfo *LVI, - AAResults *AA, DomTreeUpdater *DTU, bool HasProfileData, - std::unique_ptr<BlockFrequencyInfo> BFI, + bool runImpl(Function &F, TargetLibraryInfo *TLI, TargetTransformInfo *TTI, + LazyValueInfo *LVI, AAResults *AA, DomTreeUpdater *DTU, + bool HasProfileData, std::unique_ptr<BlockFrequencyInfo> BFI, std::unique_ptr<BranchProbabilityInfo> BPI); PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp index 688902ecb9ff..fe9a7211967c 100644 --- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp +++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp @@ -331,7 +331,7 @@ bool JumpThreading::runOnFunction(Function &F) { BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = Impl.runImpl(F, TLI, LVI, AA, &DTU, F.hasProfileData(), + bool Changed = Impl.runImpl(F, TLI, TTI, LVI, AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { dbgs() << "LVI for function '" << F.getName() << "':\n"; @@ -360,7 +360,7 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, BFI.reset(new BlockFrequencyInfo(F, *BPI, LI)); } - bool Changed = runImpl(F, &TLI, &LVI, &AA, &DTU, F.hasProfileData(), + bool Changed = runImpl(F, &TLI, &TTI, &LVI, &AA, &DTU, F.hasProfileData(), std::move(BFI), std::move(BPI)); if (PrintLVIAfterJumpThreading) { @@ -377,12 +377,14 @@ PreservedAnalyses JumpThreadingPass::run(Function &F, } bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_, - LazyValueInfo *LVI_, AliasAnalysis *AA_, - DomTreeUpdater *DTU_, bool HasProfileData_, + TargetTransformInfo *TTI_, LazyValueInfo *LVI_, + AliasAnalysis *AA_, DomTreeUpdater *DTU_, + bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_, std::unique_ptr<BranchProbabilityInfo> BPI_) { LLVM_DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n"); TLI = TLI_; + TTI = TTI_; LVI = LVI_; AA = AA_; DTU = DTU_; @@ -514,7 +516,8 @@ static void replaceFoldableUses(Instruction *Cond, Value *ToVal) { /// Return the cost of duplicating a piece of this block from first non-phi /// and before StopAt instruction to thread across it. Stop scanning the block /// when exceeding the threshold. If duplication is impossible, returns ~0U. -static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, +static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI, + BasicBlock *BB, Instruction *StopAt, unsigned Threshold) { assert(StopAt->getParent() == BB && "Not an instruction from proper BB?"); @@ -550,26 +553,21 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, if (Size > Threshold) return Size; - // Debugger intrinsics don't incur code size. - if (isa<DbgInfoIntrinsic>(I)) continue; - - // Pseudo-probes don't incur code size. - if (isa<PseudoProbeInst>(I)) - continue; - - // If this is a pointer->pointer bitcast, it is free. - if (isa<BitCastInst>(I) && I->getType()->isPointerTy()) - continue; - - // Freeze instruction is free, too. - if (isa<FreezeInst>(I)) - continue; - // Bail out if this instruction gives back a token type, it is not possible // to duplicate it if it is used outside this BB. if (I->getType()->isTokenTy() && I->isUsedOutsideOfBlock(BB)) return ~0U; + // Blocks with NoDuplicate are modelled as having infinite cost, so they + // are never duplicated. + if (const CallInst *CI = dyn_cast<CallInst>(I)) + if (CI->cannotDuplicate() || CI->isConvergent()) + return ~0U; + + if (TTI->getUserCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) + == TargetTransformInfo::TCC_Free) + continue; + // All other instructions count for at least one unit. ++Size; @@ -578,11 +576,7 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB, // as having cost of 2 total, and if they are a vector intrinsic, we model // them as having cost 1. if (const CallInst *CI = dyn_cast<CallInst>(I)) { - if (CI->cannotDuplicate() || CI->isConvergent()) - // Blocks with NoDuplicate are modelled as having infinite cost, so they - // are never duplicated. - return ~0U; - else if (!isa<IntrinsicInst>(CI)) + if (!isa<IntrinsicInst>(CI)) Size += 3; else if (!CI->getType()->isVectorTy()) Size += 1; @@ -2234,10 +2228,10 @@ bool JumpThreadingPass::maybethreadThroughTwoBasicBlocks(BasicBlock *BB, } // Compute the cost of duplicating BB and PredBB. - unsigned BBCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned BBCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); unsigned PredBBCost = getJumpThreadDuplicationCost( - PredBB, PredBB->getTerminator(), BBDupThreshold); + TTI, PredBB, PredBB->getTerminator(), BBDupThreshold); // Give up if costs are too high. We need to check BBCost and PredBBCost // individually before checking their sum because getJumpThreadDuplicationCost @@ -2345,8 +2339,8 @@ bool JumpThreadingPass::tryThreadEdge( return false; } - unsigned JumpThreadCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned JumpThreadCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (JumpThreadCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not threading BB '" << BB->getName() << "' - Cost is too high: " << JumpThreadCost << "\n"); @@ -2614,8 +2608,8 @@ bool JumpThreadingPass::duplicateCondBranchOnPHIIntoPred( return false; } - unsigned DuplicationCost = - getJumpThreadDuplicationCost(BB, BB->getTerminator(), BBDupThreshold); + unsigned DuplicationCost = getJumpThreadDuplicationCost( + TTI, BB, BB->getTerminator(), BBDupThreshold); if (DuplicationCost > BBDupThreshold) { LLVM_DEBUG(dbgs() << " Not duplicating BB '" << BB->getName() << "' - Cost is too high: " << DuplicationCost << "\n"); @@ -3031,7 +3025,8 @@ bool JumpThreadingPass::threadGuard(BasicBlock *BB, IntrinsicInst *Guard, ValueToValueMapTy UnguardedMapping, GuardedMapping; Instruction *AfterGuard = Guard->getNextNode(); - unsigned Cost = getJumpThreadDuplicationCost(BB, AfterGuard, BBDupThreshold); + unsigned Cost = + getJumpThreadDuplicationCost(TTI, BB, AfterGuard, BBDupThreshold); if (Cost > BBDupThreshold) return false; // Duplicate all instructions before the guard and the guard itself to the diff --git a/llvm/test/Transforms/JumpThreading/free_instructions.ll b/llvm/test/Transforms/JumpThreading/free_instructions.ll index f768ec996779..76392af77d33 100644 --- a/llvm/test/Transforms/JumpThreading/free_instructions.ll +++ b/llvm/test/Transforms/JumpThreading/free_instructions.ll @@ -5,26 +5,28 @@ ; the jump threading threshold, as everything else are free instructions. define i32 @free_instructions(i1 %c, i32* %p) { ; CHECK-LABEL: @free_instructions( -; CHECK-NEXT: br i1 [[C:%.*]], label [[IF:%.*]], label [[ELSE:%.*]] -; CHECK: if: +; CHECK-NEXT: br i1 [[C:%.*]], label [[IF2:%.*]], label [[ELSE2:%.*]] +; CHECK: if2: ; CHECK-NEXT: store i32 -1, i32* [[P:%.*]], align 4 -; CHECK-NEXT: br label [[JOIN:%.*]] -; CHECK: else: -; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 -; CHECK-NEXT: br label [[JOIN]] -; CHECK: join: ; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META0:![0-9]+]]) ; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !0 ; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] ; CHECK-NEXT: store i32 2, i32* [[P]], align 4 +; CHECK-NEXT: [[P21:%.*]] = bitcast i32* [[P]] to i8* +; CHECK-NEXT: [[P32:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P21]]) +; CHECK-NEXT: [[P43:%.*]] = bitcast i8* [[P32]] to i32* +; CHECK-NEXT: store i32 3, i32* [[P43]], align 4, !invariant.group !3 +; CHECK-NEXT: ret i32 0 +; CHECK: else2: +; CHECK-NEXT: store i32 -2, i32* [[P]], align 4 +; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl(metadata [[META4:![0-9]+]]) +; CHECK-NEXT: store i32 1, i32* [[P]], align 4, !noalias !4 +; CHECK-NEXT: call void @llvm.assume(i1 true) [ "align"(i32* [[P]], i64 32) ] +; CHECK-NEXT: store i32 2, i32* [[P]], align 4 ; CHECK-NEXT: [[P2:%.*]] = bitcast i32* [[P]] to i8* ; CHECK-NEXT: [[P3:%.*]] = call i8* @llvm.launder.invariant.group.p0i8(i8* [[P2]]) ; CHECK-NEXT: [[P4:%.*]] = bitcast i8* [[P3]] to i32* ; CHECK-NEXT: store i32 3, i32* [[P4]], align 4, !invariant.group !3 -; CHECK-NEXT: br i1 [[C]], label [[IF2:%.*]], label [[ELSE2:%.*]] -; CHECK: if2: -; CHECK-NEXT: ret i32 0 -; CHECK: else2: ; CHECK-NEXT: ret i32 1 ; br i1 %c, label %if, label %else diff --git a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll index 57014e856a09..f764a59dd8a2 100644 --- a/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll +++ b/llvm/test/Transforms/PhaseOrdering/inlining-alignment-assumptions.ll @@ -32,13 +32,10 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-OFF-NEXT: br label [[COMMON_RET]] ; ; ASSUMPTIONS-ON-LABEL: @caller1( -; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE1:%.*]] -; ASSUMPTIONS-ON: false1: -; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4 -; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] +; ASSUMPTIONS-ON-NEXT: br i1 [[C:%.*]], label [[COMMON_RET:%.*]], label [[FALSE2:%.*]] ; ASSUMPTIONS-ON: common.ret: -; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE1]] ], [ 2, [[TMP0:%.*]] ] -; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR]], i64 8) ] +; ASSUMPTIONS-ON-NEXT: [[DOTSINK:%.*]] = phi i64 [ 3, [[FALSE2]] ], [ 2, [[TMP0:%.*]] ] +; ASSUMPTIONS-ON-NEXT: call void @llvm.assume(i1 true) [ "align"(i64* [[PTR:%.*]], i64 8) ] ; ASSUMPTIONS-ON-NEXT: store volatile i64 0, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 @@ -47,6 +44,9 @@ define void @caller1(i1 %c, i64* align 1 %ptr) { ; ASSUMPTIONS-ON-NEXT: store volatile i64 -1, i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: store volatile i64 [[DOTSINK]], i64* [[PTR]], align 8 ; ASSUMPTIONS-ON-NEXT: ret void +; ASSUMPTIONS-ON: false2: +; ASSUMPTIONS-ON-NEXT: store volatile i64 1, i64* [[PTR]], align 4 +; ASSUMPTIONS-ON-NEXT: br label [[COMMON_RET]] ; br i1 %c, label %true1, label %false1 </cut>

4 years, 10 months

[ACTIVITY] report week ending 24 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Still looking at the mess that is non-unique bus names. Worked through exactly which devices and machine types are affected for the i2c bus. + Sent a patchset which tries to make the "create a bus" function names a bit more regular across different bus types. * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Luis figured out why GDB was crashing when fed the MVE XML by QEMU's gdbstub; this was a combination of QEMU giving GDB some non-standard extra registers in its "vfp" XML feature and GDB not being robust enough against those unexpected extras. Sent out a patchset which cleans up QEMU's XML in this area and also implements the extra XML for MVE. (This will only go into QEMU once the GDB patches have landed and the XML format is nailed down.) -- PMM

4 years, 10 months

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

by Maxim Kuvyrkov

Hi Stanislav, FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels. -- Maxim Kuvyrkov https://www.linaro.org > On 15 Sep 2021, at 12:54, ci_notify(a)linaro.org wrote: > > After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > > Allow rematerialization of virtual reg uses > > the following benchmarks slowed down by more than 2%: > - 456.hmmer slowed down by 6% > - 482.sphinx3 slowed down by 3% > > Benchmark: > Toolchain: Clang + Glibc + LLVM Linker > Version: all components were built from their tip of trunk > Target: arm-linux-gnueabihf > Compiler flags: -O3 -marm > Hardware: NVidia TK1 4x Cortex-A15 > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2 > - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 > Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> > Date: Thu Aug 19 11:42:09 2021 -0700 > > Allow rematerialization of virtual reg uses > > Currently isReallyTriviallyReMaterializableGeneric() implementation > prevents rematerialization on any virtual register use on the grounds > that is not a trivial rematerialization and that we do not want to > extend liveranges. > > It appears that LRE logic does not attempt to extend a liverange of > a source register for rematerialization so that is not an issue. > That is checked in the LiveRangeEdit::allUsesAvailableAt(). > > The only non-trivial aspect of it is accounting for tied-defs which > normally represent a read-modify-write operation and not rematerializable. > > The test for a tied-def situation already exists in the > /CodeGen/AMDGPU/remat-vop.mir, > test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. > > The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets > where I more or less understand the asm it seems to reduce spilling > (as expected) or be neutral. However, it needs a review by all targets' > specialists. > > Differential Revision: https://reviews.llvm.org/D106408 > --- > llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- > llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- > llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + > llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- > llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- > llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- > .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- > llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- > llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- > llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- > llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- > llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- > llvm/test/CodeGen/Mips/tls.ll | 4 +- > llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- > llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- > llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- > llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- > llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- > llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- > llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- > llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- > .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- > llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- > llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- > llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- > llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- > .../tail-pred-disabled-in-loloops.ll | 14 +- > .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- > .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- > llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- > llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- > llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- > llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- > llvm/test/CodeGen/X86/addcarry.ll | 20 +- > llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- > llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- > llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- > llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- > llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- > llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- > 42 files changed, 4217 insertions(+), 4202 deletions(-) > > diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > index 2f853a2c6f9f..1c05afba730d 100644 > --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h > +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h > @@ -117,10 +117,11 @@ public: > const MachineFunction &MF) const; > > /// Return true if the instruction is trivially rematerializable, meaning it > - /// has no side effects and requires no operands that aren't always available. > - /// This means the only allowed uses are constants and unallocatable physical > - /// registers so that the instructions result is independent of the place > - /// in the function. > + /// has no side effects. Uses of constants and unallocatable physical > + /// registers are always trivial to rematerialize so that the instructions > + /// result is independent of the place in the function. Uses of virtual > + /// registers are allowed but it is caller's responsility to ensure these > + /// operands are valid at the point the instruction is beeing moved. > bool isTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA = nullptr) const { > return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || > @@ -140,8 +141,7 @@ protected: > /// set, this hook lets the target specify whether the instruction is actually > /// trivially rematerializable, taking into consideration its operands. This > /// predicate must return false if the instruction has any side effects other > - /// than producing a value, or if it requres any address registers that are > - /// not always available. > + /// than producing a value. > /// Requirements must be check as stated in isTriviallyReMaterializable() . > virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, > AAResults *AA) const { > diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp > index 1eab8e7443a7..fe7d60e0b7e2 100644 > --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp > +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp > @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > const MachineRegisterInfo &MRI = MF.getRegInfo(); > > // Remat clients assume operand 0 is the defined register. > - if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) > + if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || > + MI.getOperand(0).isTied()) > return false; > Register DefReg = MI.getOperand(0).getReg(); > > @@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( > // same virtual register, though. > if (MO.isDef() && Reg != DefReg) > return false; > - > - // Don't allow any virtual-register uses. Rematting an instruction with > - // virtual register uses would length the live ranges of the uses, which > - // is not necessarily a good idea, certainly not "trivial". > - if (MO.isUse()) > - return false; > } > > // Everything checked out. > diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > index ed799bfca028..c9915aaabfde 100644 > --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir > +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir > @@ -51,6 +51,66 @@ body: | > S_NOP 0, implicit %2 > S_ENDPGM 0 > ... > +# The liverange of %0 covers a point of rematerialization, source value is > +# availabe. > +--- > +name: test_remat_s_mov_b32_vreg_src_long_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_NOP 0, implicit %0 > + S_ENDPGM 0 > +... > +# The liverange of %0 does not cover a point of rematerialization, source value is > +# unavailabe and we do not want to artificially extend the liverange. > +--- > +name: test_no_remat_s_mov_b32_vreg_src_short_lr > +tracksRegLiveness: true > +machineFunctionInfo: > + stackPtrOffsetReg: $sgpr32 > +body: | > + bb.0: > + ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr > + ; GCN: renamable $sgpr0 = IMPLICIT_DEF > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) > + ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 > + ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) > + ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) > + ; GCN: S_NOP 0, implicit killed renamable $sgpr1 > + ; GCN: S_NOP 0, implicit killed renamable $sgpr0 > + ; GCN: S_ENDPGM 0 > + %0:sreg_32 = IMPLICIT_DEF > + %1:sreg_32 = S_MOV_B32 %0:sreg_32 > + %2:sreg_32 = S_MOV_B32 %0:sreg_32 > + %3:sreg_32 = S_MOV_B32 %0:sreg_32 > + S_NOP 0, implicit %1 > + S_NOP 0, implicit %2 > + S_NOP 0, implicit %3 > + S_ENDPGM 0 > +... > --- > name: test_remat_s_mov_b64 > tracksRegLiveness: true > diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > index a4243276c70a..175a2069a441 100644 > --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll > @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; ENABLE-NEXT: pophs {r11, pc} > ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; ENABLE-NEXT: movw r12, :lower16:skip > -; ENABLE-NEXT: sub r1, r1, #1 > +; ENABLE-NEXT: sub r3, r1, #1 > ; ENABLE-NEXT: movt r12, :upper16:skip > ; ENABLE-NEXT: .LBB0_4: @ %while.body > ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; ENABLE-NEXT: ldrb r3, [r0] > -; ENABLE-NEXT: ldrb r3, [r12, r3] > -; ENABLE-NEXT: add r0, r0, r3 > -; ENABLE-NEXT: sub r3, r1, #1 > -; ENABLE-NEXT: cmp r3, r1 > +; ENABLE-NEXT: ldrb r1, [r0] > +; ENABLE-NEXT: ldrb r1, [r12, r1] > +; ENABLE-NEXT: add r0, r0, r1 > +; ENABLE-NEXT: sub r1, r3, #1 > +; ENABLE-NEXT: cmp r1, r3 > ; ENABLE-NEXT: bhs .LBB0_6 > ; ENABLE-NEXT: @ %bb.5: @ %while.body > ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; ENABLE-NEXT: cmp r0, r2 > -; ENABLE-NEXT: mov r1, r3 > +; ENABLE-NEXT: mov r3, r1 > ; ENABLE-NEXT: blo .LBB0_4 > ; ENABLE-NEXT: .LBB0_6: @ %if.end29 > ; ENABLE-NEXT: pop {r11, pc} > @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon > ; DISABLE-NEXT: pophs {r11, pc} > ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader > ; DISABLE-NEXT: movw r12, :lower16:skip > -; DISABLE-NEXT: sub r1, r1, #1 > +; DISABLE-NEXT: sub r3, r1, #1 > ; DISABLE-NEXT: movt r12, :upper16:skip > ; DISABLE-NEXT: .LBB0_4: @ %while.body > ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 > -; DISABLE-NEXT: ldrb r3, [r0] > -; DISABLE-NEXT: ldrb r3, [r12, r3] > -; DISABLE-NEXT: add r0, r0, r3 > -; DISABLE-NEXT: sub r3, r1, #1 > -; DISABLE-NEXT: cmp r3, r1 > +; DISABLE-NEXT: ldrb r1, [r0] > +; DISABLE-NEXT: ldrb r1, [r12, r1] > +; DISABLE-NEXT: add r0, r0, r1 > +; DISABLE-NEXT: sub r1, r3, #1 > +; DISABLE-NEXT: cmp r1, r3 > ; DISABLE-NEXT: bhs .LBB0_6 > ; DISABLE-NEXT: @ %bb.5: @ %while.body > ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 > ; DISABLE-NEXT: cmp r0, r2 > -; DISABLE-NEXT: mov r1, r3 > +; DISABLE-NEXT: mov r3, r1 > ; DISABLE-NEXT: blo .LBB0_4 > ; DISABLE-NEXT: .LBB0_6: @ %if.end29 > ; DISABLE-NEXT: pop {r11, pc} > diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > index 55157875d355..ea15fcc5c824 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll > @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: push {r4, r5, r11, lr} > ; SCALAR-NEXT: rsb r3, r2, #0 > ; SCALAR-NEXT: and r4, r2, #63 > -; SCALAR-NEXT: and lr, r3, #63 > -; SCALAR-NEXT: rsb r3, lr, #32 > +; SCALAR-NEXT: and r12, r3, #63 > +; SCALAR-NEXT: rsb r3, r12, #32 > ; SCALAR-NEXT: lsl r2, r0, r4 > -; SCALAR-NEXT: lsr r12, r0, lr > -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 > -; SCALAR-NEXT: subs r12, lr, #32 > -; SCALAR-NEXT: lsrpl r3, r1, r12 > +; SCALAR-NEXT: lsr lr, r0, r12 > +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 > +; SCALAR-NEXT: subs lr, r12, #32 > +; SCALAR-NEXT: lsrpl r3, r1, lr > ; SCALAR-NEXT: subs r5, r4, #32 > ; SCALAR-NEXT: movwpl r2, #0 > ; SCALAR-NEXT: cmp r5, #0 > @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { > ; SCALAR-NEXT: lsr r3, r0, r3 > ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 > ; SCALAR-NEXT: lslpl r3, r0, r5 > -; SCALAR-NEXT: lsr r0, r1, lr > -; SCALAR-NEXT: cmp r12, #0 > +; SCALAR-NEXT: lsr r0, r1, r12 > +; SCALAR-NEXT: cmp lr, #0 > ; SCALAR-NEXT: movwpl r0, #0 > ; SCALAR-NEXT: orr r1, r3, r0 > ; SCALAR-NEXT: mov r0, r2 > @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK: @ %bb.0: > ; CHECK-NEXT: .save {r4, r5, r11, lr} > ; CHECK-NEXT: push {r4, r5, r11, lr} > -; CHECK-NEXT: and lr, r2, #63 > +; CHECK-NEXT: and r12, r2, #63 > ; CHECK-NEXT: rsb r2, r2, #0 > -; CHECK-NEXT: rsb r3, lr, #32 > +; CHECK-NEXT: rsb r3, r12, #32 > ; CHECK-NEXT: and r4, r2, #63 > -; CHECK-NEXT: lsr r12, r0, lr > -; CHECK-NEXT: orr r3, r12, r1, lsl r3 > -; CHECK-NEXT: subs r12, lr, #32 > +; CHECK-NEXT: lsr lr, r0, r12 > +; CHECK-NEXT: orr r3, lr, r1, lsl r3 > +; CHECK-NEXT: subs lr, r12, #32 > ; CHECK-NEXT: lsl r2, r0, r4 > -; CHECK-NEXT: lsrpl r3, r1, r12 > +; CHECK-NEXT: lsrpl r3, r1, lr > ; CHECK-NEXT: subs r5, r4, #32 > ; CHECK-NEXT: movwpl r2, #0 > ; CHECK-NEXT: cmp r5, #0 > @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { > ; CHECK-NEXT: lsr r3, r0, r3 > ; CHECK-NEXT: orr r3, r3, r1, lsl r4 > ; CHECK-NEXT: lslpl r3, r0, r5 > -; CHECK-NEXT: lsr r0, r1, lr > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: lsr r0, r1, r12 > +; CHECK-NEXT: cmp lr, #0 > ; CHECK-NEXT: movwpl r0, #0 > ; CHECK-NEXT: orr r1, r0, r3 > ; CHECK-NEXT: mov r0, r2 > diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll > index 54c93b493c98..6372f9be2ca3 100644 > --- a/llvm/test/CodeGen/ARM/funnel-shift.ll > +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll > @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { > ; CHECK-NEXT: mov r3, #0 > ; CHECK-NEXT: bl __aeabi_uldivmod > ; CHECK-NEXT: add r0, r2, #27 > -; CHECK-NEXT: lsl r6, r6, #27 > -; CHECK-NEXT: and r1, r0, #63 > ; CHECK-NEXT: lsl r2, r7, #27 > +; CHECK-NEXT: and r12, r0, #63 > +; CHECK-NEXT: lsl r6, r6, #27 > ; CHECK-NEXT: orr r7, r6, r7, lsr #5 > +; CHECK-NEXT: rsb r3, r12, #32 > +; CHECK-NEXT: lsr r2, r2, r12 > ; CHECK-NEXT: mov r6, #63 > -; CHECK-NEXT: rsb r3, r1, #32 > -; CHECK-NEXT: lsr r2, r2, r1 > -; CHECK-NEXT: subs r12, r1, #32 > -; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: orr r2, r2, r7, lsl r3 > +; CHECK-NEXT: subs r3, r12, #32 > +; CHECK-NEXT: bic r6, r6, r0 > ; CHECK-NEXT: lsl r5, r9, #1 > -; CHECK-NEXT: lsrpl r2, r7, r12 > +; CHECK-NEXT: lsrpl r2, r7, r3 > +; CHECK-NEXT: subs r1, r6, #32 > ; CHECK-NEXT: lsl r0, r5, r6 > -; CHECK-NEXT: subs r4, r6, #32 > -; CHECK-NEXT: lsl r3, r8, #1 > +; CHECK-NEXT: lsl r4, r8, #1 > ; CHECK-NEXT: movwpl r0, #0 > -; CHECK-NEXT: orr r3, r3, r9, lsr #31 > +; CHECK-NEXT: orr r4, r4, r9, lsr #31 > ; CHECK-NEXT: orr r0, r0, r2 > ; CHECK-NEXT: rsb r2, r6, #32 > -; CHECK-NEXT: cmp r4, #0 > -; CHECK-NEXT: lsr r1, r7, r1 > +; CHECK-NEXT: cmp r1, #0 > ; CHECK-NEXT: lsr r2, r5, r2 > -; CHECK-NEXT: orr r2, r2, r3, lsl r6 > -; CHECK-NEXT: lslpl r2, r5, r4 > -; CHECK-NEXT: cmp r12, #0 > +; CHECK-NEXT: orr r2, r2, r4, lsl r6 > +; CHECK-NEXT: lslpl r2, r5, r1 > +; CHECK-NEXT: lsr r1, r7, r12 > +; CHECK-NEXT: cmp r3, #0 > ; CHECK-NEXT: movwpl r1, #0 > ; CHECK-NEXT: orr r1, r2, r1 > ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} > diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > index 2922e0ed5423..0a0bb62b0a09 100644 > --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll > @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { > ; BE-LABEL: i56_or: > ; BE: @ %bb.0: > ; BE-NEXT: mov r1, r0 > -; BE-NEXT: ldr r12, [r0] > ; BE-NEXT: ldrh r2, [r1, #4]! > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: orr r2, r3, r2, lsl #8 > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: strb r2, [r1, #2] > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: strb r12, [r1, #2] > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > %aa = load i56, i56* %a > @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { > ; BE-NEXT: ldrb r3, [r1, #2] > ; BE-NEXT: strb r2, [r1, #2] > ; BE-NEXT: orr r2, r3, r12, lsl #8 > -; BE-NEXT: ldr r12, [r0] > -; BE-NEXT: orr r2, r2, r12, lsl #24 > -; BE-NEXT: orr r2, r2, #384 > -; BE-NEXT: lsr r3, r2, #8 > -; BE-NEXT: strh r3, [r1] > -; BE-NEXT: bic r1, r12, #255 > -; BE-NEXT: orr r1, r1, r2, lsr #24 > +; BE-NEXT: ldr r3, [r0] > +; BE-NEXT: orr r2, r2, r3, lsl #24 > +; BE-NEXT: orr r12, r2, #384 > +; BE-NEXT: lsr r2, r12, #8 > +; BE-NEXT: strh r2, [r1] > +; BE-NEXT: bic r1, r3, #255 > +; BE-NEXT: orr r1, r1, r12, lsr #24 > ; BE-NEXT: str r1, [r0] > ; BE-NEXT: mov pc, lr > > diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll > index 09a991da2e59..46490efb6631 100644 > --- a/llvm/test/CodeGen/ARM/neon-copy.ll > +++ b/llvm/test/CodeGen/ARM/neon-copy.ll > @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { > ; CHECK-NEXT: .pad #8 > ; CHECK-NEXT: sub sp, sp, #8 > ; CHECK-NEXT: vmov.u16 r1, d0[1] > -; CHECK-NEXT: and r0, r0, #3 > +; CHECK-NEXT: and r12, r0, #3 > ; CHECK-NEXT: vmov.u16 r2, d0[2] > -; CHECK-NEXT: mov r3, sp > -; CHECK-NEXT: vmov.u16 r12, d0[3] > -; CHECK-NEXT: orr r0, r3, r0, lsl #1 > +; CHECK-NEXT: mov r0, sp > +; CHECK-NEXT: vmov.u16 r3, d0[3] > +; CHECK-NEXT: orr r0, r0, r12, lsl #1 > ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] > ; CHECK-NEXT: vldr d0, [sp] > ; CHECK-NEXT: vmov.16 d0[1], r1 > ; CHECK-NEXT: vmov.16 d0[2], r2 > -; CHECK-NEXT: vmov.16 d0[3], r12 > +; CHECK-NEXT: vmov.16 d0[3], r3 > ; CHECK-NEXT: add sp, sp, #8 > ; CHECK-NEXT: bx lr > %tmp = extractelement <8 x i16> %x, i32 0 > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > index 8be7100d368b..a125446b27c3 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll > @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $2, $6 > +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 76($sp) > -; MMR3-NEXT: srlv $4, $7, $16 > -; MMR3-NEXT: not16 $3, $16 > -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill > -; MMR3-NEXT: sll16 $2, $6, 1 > -; MMR3-NEXT: sllv $3, $2, $3 > -; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: srlv $6, $6, $16 > -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: subu16 $7, $2, $16 > +; MMR3-NEXT: srlv $3, $7, $16 > +; MMR3-NEXT: not16 $6, $16 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sll16 $2, $2, 1 > +; MMR3-NEXT: sllv $2, $2, $6 > +; MMR3-NEXT: li16 $6, 64 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: srlv $4, $4, $16 > +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill > +; MMR3-NEXT: subu16 $7, $6, $16 > ; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: andi16 $2, $7, 32 > -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $5, $16, 32 > -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill > -; MMR3-NEXT: move $4, $9 > +; MMR3-NEXT: andi16 $5, $7, 32 > +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill > +; MMR3-NEXT: andi16 $6, $16, 32 > +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill > +; MMR3-NEXT: move $3, $9 > ; MMR3-NEXT: li16 $17, 0 > -; MMR3-NEXT: movn $4, $17, $2 > -; MMR3-NEXT: movn $3, $6, $5 > -; MMR3-NEXT: addiu $2, $16, -64 > -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srlv $5, $5, $2 > -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $17, 1 > -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > -; MMR3-NEXT: not16 $5, $2 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: or16 $3, $4 > -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $4 > -; MMR3-NEXT: srav $1, $17, $2 > -; MMR3-NEXT: andi16 $2, $2, 32 > -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $2 > -; MMR3-NEXT: sllv $2, $17, $7 > -; MMR3-NEXT: not16 $4, $7 > -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $6, $7, 1 > -; MMR3-NEXT: srlv $6, $6, $4 > +; MMR3-NEXT: movn $3, $17, $5 > +; MMR3-NEXT: movn $2, $4, $6 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $4, $17, $4 > +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $6, 1 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: or16 $2, $3 > +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $3 > +; MMR3-NEXT: addiu $3, $16, -64 > +; MMR3-NEXT: srav $1, $6, $3 > +; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: sllv $3, $6, $7 > +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: not16 $3, $7 > +; MMR3-NEXT: srl16 $4, $17, 1 > +; MMR3-NEXT: srlv $3, $4, $3 > ; MMR3-NEXT: sltiu $10, $16, 64 > -; MMR3-NEXT: movn $5, $3, $10 > -; MMR3-NEXT: or16 $6, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload > -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sllv $3, $4, $3 > +; MMR3-NEXT: movn $5, $2, $10 > +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload > ; MMR3-NEXT: or16 $3, $2 > -; MMR3-NEXT: srav $11, $17, $16 > -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $11, $4 > -; MMR3-NEXT: sra $2, $17, 31 > +; MMR3-NEXT: srlv $2, $17, $16 > +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sllv $17, $7, $4 > +; MMR3-NEXT: or16 $17, $2 > +; MMR3-NEXT: srav $11, $6, $16 > +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $17, $11, $2 > +; MMR3-NEXT: sra $2, $6, 31 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: move $8, $2 > -; MMR3-NEXT: movn $8, $3, $10 > -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $6, $9, $3 > -; MMR3-NEXT: li16 $3, 0 > -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $7, $3, $4 > -; MMR3-NEXT: or16 $7, $6 > +; MMR3-NEXT: move $4, $2 > +; MMR3-NEXT: movn $4, $17, $10 > +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $9, $6 > +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $3 > ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movn $1, $2, $3 > ; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $3, $16 > -; MMR3-NEXT: movn $11, $2, $4 > +; MMR3-NEXT: movn $11, $2, $6 > ; MMR3-NEXT: movn $2, $11, $10 > -; MMR3-NEXT: move $3, $8 > +; MMR3-NEXT: move $3, $4 > ; MMR3-NEXT: move $4, $1 > ; MMR3-NEXT: lwp $16, 40($sp) > ; MMR3-NEXT: addiusp 48 > @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { > ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > -; MMR6-NEXT: move $1, $7 > +; MMR6-NEXT: move $12, $7 > ; MMR6-NEXT: lw $3, 44($sp) > ; MMR6-NEXT: li16 $2, 64 > -; MMR6-NEXT: subu16 $7, $2, $3 > -; MMR6-NEXT: sllv $8, $5, $7 > -; MMR6-NEXT: andi16 $2, $7, 32 > -; MMR6-NEXT: selnez $9, $8, $2 > -; MMR6-NEXT: sllv $10, $4, $7 > -; MMR6-NEXT: not16 $7, $7 > -; MMR6-NEXT: srl16 $16, $5, 1 > -; MMR6-NEXT: srlv $7, $16, $7 > -; MMR6-NEXT: or $7, $10, $7 > -; MMR6-NEXT: seleqz $7, $7, $2 > -; MMR6-NEXT: or $7, $9, $7 > -; MMR6-NEXT: srlv $9, $1, $3 > -; MMR6-NEXT: not16 $16, $3 > -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: subu16 $16, $2, $3 > +; MMR6-NEXT: sllv $1, $5, $16 > +; MMR6-NEXT: andi16 $2, $16, 32 > +; MMR6-NEXT: selnez $8, $1, $2 > +; MMR6-NEXT: sllv $9, $4, $16 > +; MMR6-NEXT: not16 $16, $16 > +; MMR6-NEXT: srl16 $17, $5, 1 > +; MMR6-NEXT: srlv $10, $17, $16 > +; MMR6-NEXT: or $9, $9, $10 > +; MMR6-NEXT: seleqz $9, $9, $2 > +; MMR6-NEXT: or $8, $8, $9 > +; MMR6-NEXT: srlv $9, $7, $3 > +; MMR6-NEXT: not16 $7, $3 > +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $17, $6, 1 > -; MMR6-NEXT: sllv $10, $17, $16 > +; MMR6-NEXT: sllv $10, $17, $7 > ; MMR6-NEXT: or $9, $10, $9 > ; MMR6-NEXT: andi16 $17, $3, 32 > ; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: srlv $10, $6, $3 > ; MMR6-NEXT: selnez $11, $10, $17 > ; MMR6-NEXT: seleqz $10, $10, $17 > -; MMR6-NEXT: or $10, $10, $7 > -; MMR6-NEXT: seleqz $12, $8, $2 > -; MMR6-NEXT: or $8, $11, $9 > +; MMR6-NEXT: or $8, $10, $8 > +; MMR6-NEXT: seleqz $1, $1, $2 > +; MMR6-NEXT: or $9, $11, $9 > ; MMR6-NEXT: addiu $2, $3, -64 > -; MMR6-NEXT: srlv $9, $5, $2 > +; MMR6-NEXT: srlv $10, $5, $2 > ; MMR6-NEXT: sll16 $7, $4, 1 > ; MMR6-NEXT: not16 $16, $2 > ; MMR6-NEXT: sllv $11, $7, $16 > ; MMR6-NEXT: sltiu $13, $3, 64 > -; MMR6-NEXT: or $8, $8, $12 > -; MMR6-NEXT: selnez $10, $10, $13 > -; MMR6-NEXT: or $9, $11, $9 > -; MMR6-NEXT: srav $11, $4, $2 > +; MMR6-NEXT: or $1, $9, $1 > +; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $9, $11, $10 > +; MMR6-NEXT: srav $10, $4, $2 > ; MMR6-NEXT: andi16 $2, $2, 32 > -; MMR6-NEXT: seleqz $12, $11, $2 > +; MMR6-NEXT: seleqz $11, $10, $2 > ; MMR6-NEXT: sra $14, $4, 31 > ; MMR6-NEXT: selnez $15, $14, $2 > ; MMR6-NEXT: seleqz $9, $9, $2 > -; MMR6-NEXT: or $12, $15, $12 > -; MMR6-NEXT: seleqz $12, $12, $13 > -; MMR6-NEXT: selnez $2, $11, $2 > -; MMR6-NEXT: seleqz $11, $14, $13 > -; MMR6-NEXT: or $10, $10, $12 > -; MMR6-NEXT: selnez $10, $10, $3 > -; MMR6-NEXT: selnez $8, $8, $13 > +; MMR6-NEXT: or $11, $15, $11 > +; MMR6-NEXT: seleqz $11, $11, $13 > +; MMR6-NEXT: selnez $2, $10, $2 > +; MMR6-NEXT: seleqz $10, $14, $13 > +; MMR6-NEXT: or $8, $8, $11 > +; MMR6-NEXT: selnez $8, $8, $3 > +; MMR6-NEXT: selnez $1, $1, $13 > ; MMR6-NEXT: or $2, $2, $9 > ; MMR6-NEXT: srav $9, $4, $3 > ; MMR6-NEXT: seleqz $4, $9, $17 > -; MMR6-NEXT: selnez $12, $14, $17 > -; MMR6-NEXT: or $4, $12, $4 > -; MMR6-NEXT: selnez $12, $4, $13 > +; MMR6-NEXT: selnez $11, $14, $17 > +; MMR6-NEXT: or $4, $11, $4 > +; MMR6-NEXT: selnez $11, $4, $13 > ; MMR6-NEXT: seleqz $2, $2, $13 > ; MMR6-NEXT: seleqz $4, $6, $3 > -; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $8, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > +; MMR6-NEXT: seleqz $6, $12, $3 > ; MMR6-NEXT: or $1, $1, $2 > -; MMR6-NEXT: or $4, $4, $10 > -; MMR6-NEXT: or $2, $12, $11 > -; MMR6-NEXT: srlv $3, $5, $3 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $5, $7, $5 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: seleqz $3, $3, $17 > -; MMR6-NEXT: selnez $5, $9, $17 > -; MMR6-NEXT: or $3, $5, $3 > -; MMR6-NEXT: selnez $3, $3, $13 > -; MMR6-NEXT: or $3, $3, $11 > +; MMR6-NEXT: selnez $1, $1, $3 > +; MMR6-NEXT: or $1, $6, $1 > +; MMR6-NEXT: or $4, $4, $8 > +; MMR6-NEXT: or $6, $11, $10 > +; MMR6-NEXT: srlv $2, $5, $3 > +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $3, $7, $3 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: seleqz $2, $2, $17 > +; MMR6-NEXT: selnez $3, $9, $17 > +; MMR6-NEXT: or $2, $3, $2 > +; MMR6-NEXT: selnez $2, $2, $13 > +; MMR6-NEXT: or $3, $2, $10 > +; MMR6-NEXT: move $2, $6 > ; MMR6-NEXT: move $5, $1 > ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload > ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > index ed2bfc9fcf60..e4b4b3ae1d0f 100644 > --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll > @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; MMR3-NEXT: .cfi_offset 17, -4 > ; MMR3-NEXT: .cfi_offset 16, -8 > ; MMR3-NEXT: move $8, $7 > -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill > ; MMR3-NEXT: lw $16, 68($sp) > ; MMR3-NEXT: li16 $2, 64 > -; MMR3-NEXT: subu16 $7, $2, $16 > -; MMR3-NEXT: sllv $9, $5, $7 > -; MMR3-NEXT: move $17, $5 > -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill > -; MMR3-NEXT: andi16 $3, $7, 32 > +; MMR3-NEXT: subu16 $17, $2, $16 > +; MMR3-NEXT: sllv $9, $5, $17 > +; MMR3-NEXT: andi16 $3, $17, 32 > ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill > ; MMR3-NEXT: li16 $2, 0 > ; MMR3-NEXT: move $4, $9 > ; MMR3-NEXT: movn $4, $2, $3 > -; MMR3-NEXT: srlv $5, $8, $16 > +; MMR3-NEXT: srlv $5, $7, $16 > ; MMR3-NEXT: not16 $3, $16 > ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sll16 $2, $6, 1 > +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill > ; MMR3-NEXT: sllv $2, $2, $3 > ; MMR3-NEXT: or16 $2, $5 > -; MMR3-NEXT: srlv $5, $6, $16 > -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill > +; MMR3-NEXT: srlv $7, $6, $16 > ; MMR3-NEXT: andi16 $3, $16, 32 > ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $2, $5, $3 > +; MMR3-NEXT: movn $2, $7, $3 > ; MMR3-NEXT: addiu $3, $16, -64 > ; MMR3-NEXT: or16 $2, $4 > -; MMR3-NEXT: srlv $4, $17, $3 > -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload > -; MMR3-NEXT: sll16 $6, $4, 1 > -; MMR3-NEXT: not16 $5, $3 > -; MMR3-NEXT: sllv $5, $6, $5 > -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: or16 $5, $17 > -; MMR3-NEXT: srlv $1, $4, $3 > -; MMR3-NEXT: andi16 $3, $3, 32 > +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR3-NEXT: srlv $3, $6, $3 > ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill > -; MMR3-NEXT: movn $5, $1, $3 > +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload > +; MMR3-NEXT: sll16 $4, $3, 1 > +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill > +; MMR3-NEXT: addiu $5, $16, -64 > +; MMR3-NEXT: not16 $5, $5 > +; MMR3-NEXT: sllv $5, $4, $5 > +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > +; MMR3-NEXT: or16 $5, $4 > +; MMR3-NEXT: addiu $4, $16, -64 > +; MMR3-NEXT: srlv $1, $3, $4 > +; MMR3-NEXT: andi16 $4, $4, 32 > +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill > +; MMR3-NEXT: movn $5, $1, $4 > ; MMR3-NEXT: sltiu $10, $16, 64 > ; MMR3-NEXT: movn $5, $2, $10 > -; MMR3-NEXT: sllv $2, $4, $7 > -; MMR3-NEXT: not16 $3, $7 > -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload > -; MMR3-NEXT: srl16 $4, $7, 1 > +; MMR3-NEXT: sllv $2, $3, $17 > +; MMR3-NEXT: not16 $3, $17 > +; MMR3-NEXT: srl16 $4, $6, 1 > ; MMR3-NEXT: srlv $4, $4, $3 > ; MMR3-NEXT: or16 $4, $2 > -; MMR3-NEXT: srlv $2, $7, $16 > +; MMR3-NEXT: srlv $2, $6, $16 > ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload > +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload > ; MMR3-NEXT: sllv $3, $6, $3 > ; MMR3-NEXT: or16 $3, $2 > ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload > ; MMR3-NEXT: srlv $2, $2, $16 > -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $3, $2, $17 > +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $3, $2, $6 > ; MMR3-NEXT: movz $5, $8, $16 > -; MMR3-NEXT: li16 $6, 0 > -; MMR3-NEXT: movz $3, $6, $10 > -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $4, $9, $7 > -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $6, $7, $17 > -; MMR3-NEXT: or16 $6, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movz $3, $17, $10 > +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload > +; MMR3-NEXT: movn $4, $9, $17 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $7, $17, $6 > +; MMR3-NEXT: or16 $7, $4 > ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload > -; MMR3-NEXT: movn $1, $7, $4 > -; MMR3-NEXT: li16 $7, 0 > -; MMR3-NEXT: movn $1, $6, $10 > +; MMR3-NEXT: movn $1, $17, $4 > +; MMR3-NEXT: li16 $17, 0 > +; MMR3-NEXT: movn $1, $7, $10 > ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload > ; MMR3-NEXT: movz $1, $4, $16 > -; MMR3-NEXT: movn $2, $7, $17 > +; MMR3-NEXT: movn $2, $17, $6 > ; MMR3-NEXT: li16 $4, 0 > ; MMR3-NEXT: movz $2, $4, $10 > ; MMR3-NEXT: move $4, $1 > @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { > ; > ; MMR6-LABEL: lshr_i128: > ; MMR6: # %bb.0: # %entry > -; MMR6-NEXT: addiu $sp, $sp, -32 > -; MMR6-NEXT: .cfi_def_cfa_offset 32 > -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill > -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill > +; MMR6-NEXT: addiu $sp, $sp, -24 > +; MMR6-NEXT: .cfi_def_cfa_offset 24 > +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill > ; MMR6-NEXT: .cfi_offset 17, -4 > ; MMR6-NEXT: .cfi_offset 16, -8 > ; MMR6-NEXT: move $1, $7 > -; MMR6-NEXT: move $7, $5 > -; MMR6-NEXT: lw $3, 60($sp) > +; MMR6-NEXT: move $7, $4 > +; MMR6-NEXT: lw $3, 52($sp) > ; MMR6-NEXT: srlv $2, $1, $3 > -; MMR6-NEXT: not16 $5, $3 > -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill > -; MMR6-NEXT: move $17, $6 > -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $3 > +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill > +; MMR6-NEXT: move $4, $6 > +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill > ; MMR6-NEXT: sll16 $6, $6, 1 > -; MMR6-NEXT: sllv $6, $6, $5 > +; MMR6-NEXT: sllv $6, $6, $16 > ; MMR6-NEXT: or $8, $6, $2 > -; MMR6-NEXT: addiu $5, $3, -64 > -; MMR6-NEXT: srlv $9, $7, $5 > -; MMR6-NEXT: move $6, $4 > -; MMR6-NEXT: sll16 $2, $4, 1 > -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill > -; MMR6-NEXT: not16 $16, $5 > +; MMR6-NEXT: addiu $6, $3, -64 > +; MMR6-NEXT: srlv $9, $5, $6 > +; MMR6-NEXT: sll16 $2, $7, 1 > +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: not16 $16, $6 > ; MMR6-NEXT: sllv $10, $2, $16 > ; MMR6-NEXT: andi16 $16, $3, 32 > ; MMR6-NEXT: seleqz $8, $8, $16 > ; MMR6-NEXT: or $9, $10, $9 > -; MMR6-NEXT: srlv $10, $17, $3 > +; MMR6-NEXT: srlv $10, $4, $3 > ; MMR6-NEXT: selnez $11, $10, $16 > ; MMR6-NEXT: li16 $17, 64 > ; MMR6-NEXT: subu16 $2, $17, $3 > -; MMR6-NEXT: sllv $12, $7, $2 > -; MMR6-NEXT: move $17, $7 > +; MMR6-NEXT: sllv $12, $5, $2 > ; MMR6-NEXT: andi16 $4, $2, 32 > -; MMR6-NEXT: andi16 $7, $5, 32 > -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill > -; MMR6-NEXT: seleqz $9, $9, $7 > +; MMR6-NEXT: andi16 $17, $6, 32 > +; MMR6-NEXT: seleqz $9, $9, $17 > ; MMR6-NEXT: seleqz $13, $12, $4 > ; MMR6-NEXT: or $8, $11, $8 > ; MMR6-NEXT: selnez $11, $12, $4 > -; MMR6-NEXT: sllv $12, $6, $2 > -; MMR6-NEXT: move $7, $6 > -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill > +; MMR6-NEXT: sllv $12, $7, $2 > ; MMR6-NEXT: not16 $2, $2 > -; MMR6-NEXT: srl16 $6, $17, 1 > +; MMR6-NEXT: srl16 $6, $5, 1 > ; MMR6-NEXT: srlv $2, $6, $2 > ; MMR6-NEXT: or $2, $12, $2 > ; MMR6-NEXT: seleqz $2, $2, $4 > -; MMR6-NEXT: srlv $4, $7, $5 > -; MMR6-NEXT: or $11, $11, $2 > -; MMR6-NEXT: or $5, $8, $13 > -; MMR6-NEXT: srlv $6, $17, $3 > -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: selnez $7, $4, $2 > -; MMR6-NEXT: sltiu $8, $3, 64 > -; MMR6-NEXT: selnez $12, $5, $8 > -; MMR6-NEXT: or $7, $7, $9 > -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: addiu $4, $3, -64 > +; MMR6-NEXT: srlv $4, $7, $4 > +; MMR6-NEXT: or $12, $11, $2 > +; MMR6-NEXT: or $6, $8, $13 > +; MMR6-NEXT: srlv $5, $5, $3 > +; MMR6-NEXT: selnez $8, $4, $17 > +; MMR6-NEXT: sltiu $11, $3, 64 > +; MMR6-NEXT: selnez $13, $6, $11 > +; MMR6-NEXT: or $8, $8, $9 > ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload > -; MMR6-NEXT: sllv $9, $2, $5 > +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload > +; MMR6-NEXT: sllv $9, $6, $2 > ; MMR6-NEXT: seleqz $10, $10, $16 > -; MMR6-NEXT: li16 $5, 0 > -; MMR6-NEXT: or $10, $10, $11 > -; MMR6-NEXT: or $6, $9, $6 > -; MMR6-NEXT: seleqz $2, $7, $8 > -; MMR6-NEXT: seleqz $7, $5, $8 > -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload > -; MMR6-NEXT: srlv $9, $5, $3 > -; MMR6-NEXT: seleqz $11, $9, $16 > -; MMR6-NEXT: selnez $11, $11, $8 > +; MMR6-NEXT: li16 $2, 0 > +; MMR6-NEXT: or $10, $10, $12 > +; MMR6-NEXT: or $9, $9, $5 > +; MMR6-NEXT: seleqz $5, $8, $11 > +; MMR6-NEXT: seleqz $8, $2, $11 > +; MMR6-NEXT: srlv $7, $7, $3 > +; MMR6-NEXT: seleqz $2, $7, $16 > +; MMR6-NEXT: selnez $2, $2, $11 > ; MMR6-NEXT: seleqz $1, $1, $3 > -; MMR6-NEXT: or $2, $12, $2 > -; MMR6-NEXT: selnez $2, $2, $3 > -; MMR6-NEXT: or $5, $1, $2 > -; MMR6-NEXT: or $2, $7, $11 > -; MMR6-NEXT: seleqz $1, $6, $16 > -; MMR6-NEXT: selnez $6, $9, $16 > -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $9, $16, $3 > -; MMR6-NEXT: selnez $10, $10, $8 > -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload > -; MMR6-NEXT: seleqz $4, $4, $16 > -; MMR6-NEXT: seleqz $4, $4, $8 > -; MMR6-NEXT: or $4, $10, $4 > +; MMR6-NEXT: or $5, $13, $5 > +; MMR6-NEXT: selnez $5, $5, $3 > +; MMR6-NEXT: or $5, $1, $5 > +; MMR6-NEXT: or $2, $8, $2 > +; MMR6-NEXT: seleqz $1, $9, $16 > +; MMR6-NEXT: selnez $6, $7, $16 > +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload > +; MMR6-NEXT: seleqz $7, $7, $3 > +; MMR6-NEXT: selnez $9, $10, $11 > +; MMR6-NEXT: seleqz $4, $4, $17 > +; MMR6-NEXT: seleqz $4, $4, $11 > </cut>

4 years, 10 months

[TCWG CI] 482.sphinx3 slowed down by 4% after gcc: tree-optimization/65206 - dependence analysis on mixed pointer/array

by ci_notify＠linaro.org

After gcc commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> tree-optimization/65206 - dependence analysis on mixed pointer/array the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach f92901a508305f291fcf2acae0825379477724de ../artifacts/test.sh # Reproduce last_good build git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther(a)suse.de> Date: Wed Sep 8 14:42:31 2021 +0200 tree-optimization/65206 - dependence analysis on mixed pointer/array This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers. The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure. There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native). 2021-09-08 Richard Biener <rguenther(a)suse.de> PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise. --- gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++ gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++ gcc/tree-data-ref.c | 174 +++++++++++++++++++++----------- gcc/tree-data-ref.h | 9 +- gcc/tree-vectorizer.c | 3 +- 5 files changed, 168 insertions(+), 60 deletions(-) diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c new file mode 100644 index 00000000000..0ea6d45e463 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/20210916.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +typedef union tree_node *tree; +struct tree_base { + unsigned : 1; + unsigned lang_flag_2 : 1; +}; +struct tree_type { + tree main_variant; +}; +union tree_node { + struct tree_base base; + struct tree_type type; +}; +tree finish_struct_t, finish_struct_x; +void finish_struct() +{ + for (; finish_struct_t->type.main_variant;) + finish_struct_x->base.lang_flag_2 = 0; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c new file mode 100644 index 00000000000..3b6262622c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */ +/* { dg-additional-options "-mavx" { target avx } } */ + +#define N 1024 + +double a[N], b[N]; + +void foo () +{ + for (int i = 0; i < N; ++i) + if (b[i] < 3.) + a[i] += b[i]; +} + +/* We get a .MASK_STORE because while the load of a[i] does not trap + the store would introduce store data races. Make sure we still + can handle the data dependence with zero distance. */ + +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e061baa7c20..18307a554fc 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "vr-values.h" #include "range-op.h" +#include "tree-ssa-loop-ivopts.h" static struct datadep_stats { @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base) DR, analyzed in LOOP and instantiated before NEST. */ static void -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop) { - vec<tree> access_fns = vNULL; - tree ref, op; - tree base, off, access_fn; - /* If analyzing a basic-block there are no indices to analyze and thus no access functions. */ if (!nest) { - DR_BASE_OBJECT (dr) = DR_REF (dr); - DR_ACCESS_FNS (dr).create (0); + dri->base_object = ref; + dri->access_fns.create (0); return; } - ref = DR_REF (dr); + vec<tree> access_fns = vNULL; /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses into a two element array with a constant index. The base is @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) { if (TREE_CODE (ref) == ARRAY_REF) { - op = TREE_OPERAND (ref, 1); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 1); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); access_fns.safe_push (access_fn); } @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) analyzed nest, add it as an additional independent access-function. */ if (TREE_CODE (ref) == MEM_REF) { - op = TREE_OPERAND (ref, 0); - access_fn = analyze_scalar_evolution (loop, op); + tree op = TREE_OPERAND (ref, 0); + tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC) { - tree orig_type; tree memoff = TREE_OPERAND (ref, 1); - base = initial_condition (access_fn); - orig_type = TREE_TYPE (base); + tree base = initial_condition (access_fn); + tree orig_type = TREE_TYPE (base); STRIP_USELESS_TYPE_CONVERSION (base); + tree off; split_constant_offset (base, &base, &off); STRIP_USELESS_TYPE_CONVERSION (base); /* Fold the MEM_REF offset into the evolutions initial @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) base, memoff); MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old); MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old); - DR_UNCONSTRAINED_BASE (dr) = true; + dri->unconstrained_base = true; access_fns.safe_push (access_fn); } } @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) build_int_cst (reference_alias_ptr_type (ref), 0)); } - DR_BASE_OBJECT (dr) = ref; - DR_ACCESS_FNS (dr) = access_fns; + dri->base_object = ref; + dri->access_fns = access_fns; } /* Extracts the alias analysis information from the memory reference DR. */ @@ -1463,6 +1460,8 @@ void free_data_ref (data_reference_p dr) { DR_ACCESS_FNS (dr).release (); + if (dr->alt_indices.base_object) + dr->alt_indices.access_fns.release (); free (dr); } @@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt, dr_analyze_innermost (&DR_INNERMOST (dr), memref, nest != NULL ? loop : NULL, stmt); - dr_analyze_indices (dr, nest, loop); + dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop); dr_analyze_alias (dr); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b) TREE_TYPE (TREE_OPERAND (ref_b, 0))); } -/* Initialize a data dependence relation between data accesses A and - B. NB_LOOPS is the number of loops surrounding the references: the - size of the classic distance/direction vectors. */ +/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES + is true when the main indices of A and B were not comparable so we try again + with alternate indices computed on an indirect reference. */ struct data_dependence_relation * -initialize_data_dependence_relation (struct data_reference *a, - struct data_reference *b, - vec<loop_p> loop_nest) +initialize_data_dependence_relation (struct data_dependence_relation *res, + vec<loop_p> loop_nest, + bool use_alt_indices) { - struct data_dependence_relation *res; + struct data_reference *a = DDR_A (res); + struct data_reference *b = DDR_B (res); unsigned int i; - res = XCNEW (struct data_dependence_relation); - DDR_A (res) = a; - DDR_B (res) = b; - DDR_LOOP_NEST (res).create (0); - DDR_SUBSCRIPTS (res).create (0); - DDR_DIR_VECTS (res).create (0); - DDR_DIST_VECTS (res).create (0); - - if (a == NULL || b == NULL) + struct indices *indices_a = &a->indices; + struct indices *indices_b = &b->indices; + if (use_alt_indices) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (TREE_CODE (DR_REF (a)) != MEM_REF) + indices_a = &a->alt_indices; + if (TREE_CODE (DR_REF (b)) != MEM_REF) + indices_b = &b->alt_indices; } - - /* If the data references do not alias, then they are independent. */ - if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) - { - DDR_ARE_DEPENDENT (res) = chrec_known; - return res; - } - - unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); - unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + unsigned int num_dimensions_a = indices_a->access_fns.length (); + unsigned int num_dimensions_b = indices_b->access_fns.length (); if (num_dimensions_a == 0 || num_dimensions_b == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; @@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a, the a and b accesses have a single ARRAY_REF component reference [0] but have two subscripts. */ - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) num_dimensions_a -= 1; - if (DR_UNCONSTRAINED_BASE (b)) + if (indices_b->unconstrained_base) num_dimensions_b -= 1; /* These structures describe sequences of component references in @@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a, B: [3, 4] (i.e. s.e) */ while (index_a < num_dimensions_a && index_b < num_dimensions_b) { + /* The alternate indices form always has a single dimension + with unconstrained base. */ + gcc_assert (!use_alt_indices); + /* REF_A and REF_B must be one of the component access types allowed by dr_analyze_indices. */ gcc_checking_assert (access_fn_component_p (ref_a)); @@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a, /* See whether FULL_SEQ ends at the base and whether the two bases are equal. We do not care about TBAA or alignment info so we can use OEP_ADDRESS_OF to avoid false negatives. */ - tree base_a = DR_BASE_OBJECT (a); - tree base_b = DR_BASE_OBJECT (b); + tree base_a = indices_a->base_object; + tree base_b = indices_b->base_object; bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b - && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && (indices_a->unconstrained_base + == indices_b->unconstrained_base) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && (types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) @@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a, both lvalues are distinct from the object's declared type. */ if (same_base_p) { - if (DR_UNCONSTRAINED_BASE (a)) + if (indices_a->unconstrained_base) full_seq.length += 1; } else @@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a, /* Punt if we didn't find a suitable sequence. */ if (full_seq.length == 0) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (use_alt_indices + || (TREE_CODE (DR_REF (a)) == MEM_REF + && TREE_CODE (DR_REF (b)) == MEM_REF) + || may_be_nonaddressable_p (DR_REF (a)) + || may_be_nonaddressable_p (DR_REF (b))) + { + /* Fully exhausted possibilities. */ + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* Try evaluating both DRs as dereferences of pointers. */ + if (!a->alt_indices.base_object + && TREE_CODE (DR_REF (a)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)), + build_int_cst + (reference_alias_ptr_type (DR_REF (a)), 0)); + dr_analyze_indices (&a->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (a))); + } + if (!b->alt_indices.base_object + && TREE_CODE (DR_REF (b)) != MEM_REF) + { + tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)), + build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)), + build_int_cst + (reference_alias_ptr_type (DR_REF (b)), 0)); + dr_analyze_indices (&b->alt_indices, alt_ref, + loop_preheader_edge (loop_nest[0]), + loop_containing_stmt (DR_STMT (b))); + } + return initialize_data_dependence_relation (res, loop_nest, true); } if (!same_base_p) @@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a, struct subscript *subscript; subscript = XNEW (struct subscript); - SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); - SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); + SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i]; + SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i]; SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a, return res; } +/* Initialize a data dependence relation between data accesses A and + B. NB_LOOPS is the number of loops surrounding the references: the + size of the classic distance/direction vectors. */ + +struct data_dependence_relation * +initialize_data_dependence_relation (struct data_reference *a, + struct data_reference *b, + vec<loop_p> loop_nest) +{ + data_dependence_relation *res = XCNEW (struct data_dependence_relation); + DDR_A (res) = a; + DDR_B (res) = b; + DDR_LOOP_NEST (res).create (0); + DDR_SUBSCRIPTS (res).create (0); + DDR_DIR_VECTS (res).create (0); + DDR_DIST_VECTS (res).create (0); + + if (a == NULL || b == NULL) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* If the data references do not alias, then they are independent. */ + if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL)) + { + DDR_ARE_DEPENDENT (res) = chrec_known; + return res; + } + + return initialize_data_dependence_relation (res, loop_nest, false); +} + + /* Frees memory used by the conflict function F. */ static void diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 685f33d85ae..74f579c9f3f 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -166,14 +166,19 @@ struct data_reference and runs to completion. */ bool is_conditional_in_stmt; + /* Alias information for the data reference. */ + struct dr_alias alias; + /* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost; /* Subscripts of this data reference. */ struct indices indices; - /* Alias information for the data reference. */ - struct dr_alias alias; + /* Alternate subscripts initialized lazily and used by data-dependence + analysis only when the main indices of two DRs are not comparable. + Keep last to keep vec_info_shared::check_datarefs happy. */ + struct indices alt_indices; }; #define DR_STMT(DR) (DR)->stmt diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3aa3e2a6783..20daa31187d 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs () return; gcc_assert (datarefs.length () == datarefs_copy.length ()); for (unsigned i = 0; i < datarefs.length (); ++i) - if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0) + if (memcmp (&datarefs_copy[i], datarefs[i], + offsetof (data_reference, alt_indices)) != 0) gcc_unreachable (); } </cut>

4 years, 10 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain September 2021