linaro-toolchain

linaro-toolchain@lists.linaro.org

6 participants
5605 discussions

[TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks slowed down by more than 2%: - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O3 -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

3 years, 9 months

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions: commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> tree-optimization/102570 - teach VN about internal functions Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18360 # First few build errors in logs: # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 make[2]: *** [scripts/Makefile.build:288: arch/arm64/hyperv/hv_core.o] Error 1 # 00:01:22 crypto/asymmetric_keys/asymmetric_type.c:481:15: error: ‘restrict_method’ is used uninitialized [-Werror=uninitialized] # 00:01:22 make[2]: *** [scripts/Makefile.build:288: crypto/asymmetric_keys/asymmetric_type.o] Error 1 # 00:01:22 ./include/trace/perf.h:38:25: error: ‘entry’ is used uninitialized [-Werror=uninitialized] # 00:01:22 ./include/trace/perf.h:44:13: error: ‘__entry_size’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:660:19: error: ‘mkey’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:905:19: error: ‘epayload’ is used uninitialized [-Werror=uninitialized] from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21404 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 cd investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 22d34a2a50651d01669b6fbcdb9677c18d2197c5 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> Date: Mon Oct 4 10:57:45 2021 +0200 tree-optimization/102570 - teach VN about internal functions We're now using internal functions for a lot of stuff but there's still missing VN support out of laziness. The following instantiates support and adds testcases for FRE and PRE (hoisting). 2021-10-04 Richard Biener <rguenther(a)suse.de> PR tree-optimization/102570 * tree-ssa-sccvn.h (vn_reference_op_struct): Document we are using clique for the internal function code. * tree-ssa-sccvn.c (vn_reference_op_eq): Compare the internal function code. (print_vn_reference_ops): Print the internal function code. (vn_reference_op_compute_hash): Hash it. (copy_reference_ops_from_call): Record it. (visit_stmt): Remove the restriction around internal function calls. (fully_constant_vn_reference_p): Use fold_const_call and handle internal functions. (vn_reference_eq): Compare call return types. * tree-ssa-pre.c (create_expression_by_pieces): Handle generating calls to internal functions. (compute_avail): Remove the restriction around internal function calls. * gcc.dg/tree-ssa/ssa-fre-96.c: New testcase. * gcc.dg/tree-ssa/ssa-pre-33.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c | 14 +++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c | 15 +++++ gcc/tree-ssa-pre.c | 27 +++++---- gcc/tree-ssa-sccvn.c | 91 ++++++++++++++++++------------ gcc/tree-ssa-sccvn.h | 3 +- 5 files changed, 103 insertions(+), 47 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c new file mode 100644 index 00000000000..fd1d5713b5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-fre1" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res) +{ + _Bool t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return t==t1; +} + +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "fre1" } } */ +/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c new file mode 100644 index 00000000000..3b3bd629bc2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res, int flag, _Bool *t) +{ + if (flag) + *t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return *t==t1; +} + +/* We should hoist the .ADD_OVERFLOW to before the check. */ +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "pre" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 08755847f66..1cc1aae694f 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2855,9 +2855,13 @@ create_expression_by_pieces (basic_block block, pre_expr expr, unsigned int operand = 1; vn_reference_op_t currop = &ref->operands[0]; tree sc = NULL_TREE; - tree fn = find_or_generate_expression (block, currop->op0, stmts); - if (!fn) - return NULL_TREE; + tree fn = NULL_TREE; + if (currop->op0) + { + fn = find_or_generate_expression (block, currop->op0, stmts); + if (!fn) + return NULL_TREE; + } if (currop->op1) { sc = find_or_generate_expression (block, currop->op1, stmts); @@ -2873,12 +2877,19 @@ create_expression_by_pieces (basic_block block, pre_expr expr, return NULL_TREE; args.quick_push (arg); } - gcall *call = gimple_build_call_vec (fn, args); + gcall *call; + if (currop->op0) + { + call = gimple_build_call_vec (fn, args); + gimple_call_set_fntype (call, currop->type); + } + else + call = gimple_build_call_internal_vec ((internal_fn)currop->clique, + args); gimple_set_location (call, expr->loc); - gimple_call_set_fntype (call, currop->type); if (sc) gimple_call_set_chain (call, sc); - tree forcedname = make_ssa_name (TREE_TYPE (currop->type)); + tree forcedname = make_ssa_name (ref->type); gimple_call_set_lhs (call, forcedname); /* There's no CCP pass after PRE which would re-compute alignment information so make sure we re-materialize this here. */ @@ -4004,10 +4015,6 @@ compute_avail (function *fun) vn_reference_s ref1; pre_expr result = NULL; - /* We can value number only calls to real functions. */ - if (gimple_call_internal_p (stmt)) - continue; - vn_reference_lookup_call (as_a <gcall *> (stmt), &ref, &ref1); /* There is no point to PRE a call without a value. */ if (!ref || !ref->result) diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 416a5252144..0d942218279 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-loop-niter.h" #include "builtins.h" +#include "fold-const-call.h" #include "tree-ssa-sccvn.h" /* This algorithm is based on the SCC algorithm presented by Keith @@ -212,7 +213,8 @@ vn_reference_op_eq (const void *p1, const void *p2) TYPE_MAIN_VARIANT (vro2->type)))) && expressions_equal_p (vro1->op0, vro2->op0) && expressions_equal_p (vro1->op1, vro2->op1) - && expressions_equal_p (vro1->op2, vro2->op2)); + && expressions_equal_p (vro1->op2, vro2->op2) + && (vro1->opcode != CALL_EXPR || vro1->clique == vro2->clique)); } /* Free a reference operation structure VP. */ @@ -264,15 +266,18 @@ print_vn_reference_ops (FILE *outfile, const vec<vn_reference_op_s> ops) && TREE_CODE_CLASS (vro->opcode) != tcc_declaration) { fprintf (outfile, "%s", get_tree_code_name (vro->opcode)); - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { fprintf (outfile, "<"); closebrace = true; } } - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { - print_generic_expr (outfile, vro->op0); + if (!vro->op0) + fprintf (outfile, internal_fn_name ((internal_fn)vro->clique)); + else + print_generic_expr (outfile, vro->op0); if (vro->op1) { fprintf (outfile, ","); @@ -684,6 +689,8 @@ static void vn_reference_op_compute_hash (const vn_reference_op_t vro1, inchash::hash &hstate) { hstate.add_int (vro1->opcode); + if (vro1->opcode == CALL_EXPR && !vro1->op0) + hstate.add_int (vro1->clique); if (vro1->op0) inchash::add_expr (vro1->op0, hstate); if (vro1->op1) @@ -769,11 +776,16 @@ vn_reference_eq (const_vn_reference_t const vr1, const_vn_reference_t const vr2) if (vr1->type != vr2->type) return false; } + else if (vr1->type == vr2->type) + ; else if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type) || (COMPLETE_TYPE_P (vr1->type) && !expressions_equal_p (TYPE_SIZE (vr1->type), TYPE_SIZE (vr2->type)))) return false; + else if (vr1->operands[0].opcode == CALL_EXPR + && !types_compatible_p (vr1->type, vr2->type)) + return false; else if (INTEGRAL_TYPE_P (vr1->type) && INTEGRAL_TYPE_P (vr2->type)) { @@ -1270,6 +1282,8 @@ copy_reference_ops_from_call (gcall *call, temp.type = gimple_call_fntype (call); temp.opcode = CALL_EXPR; temp.op0 = gimple_call_fn (call); + if (gimple_call_internal_p (call)) + temp.clique = gimple_call_internal_fn (call); temp.op1 = gimple_call_chain (call); if (stmt_could_throw_p (cfun, call) && (lr = lookup_stmt_eh_lp (call)) > 0) temp.op2 = size_int (lr); @@ -1459,9 +1473,11 @@ fully_constant_vn_reference_p (vn_reference_t ref) a call to a builtin function with at most two arguments. */ op = &operands[0]; if (op->opcode == CALL_EXPR - && TREE_CODE (op->op0) == ADDR_EXPR - && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL - && fndecl_built_in_p (TREE_OPERAND (op->op0, 0)) + && (!op->op0 + || (TREE_CODE (op->op0) == ADDR_EXPR + && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL + && fndecl_built_in_p (TREE_OPERAND (op->op0, 0), + BUILT_IN_NORMAL))) && operands.length () >= 2 && operands.length () <= 3) { @@ -1481,13 +1497,17 @@ fully_constant_vn_reference_p (vn_reference_t ref) anyconst = true; if (anyconst) { - tree folded = build_call_expr (TREE_OPERAND (op->op0, 0), - arg1 ? 2 : 1, - arg0->op0, - arg1 ? arg1->op0 : NULL); - if (folded - && TREE_CODE (folded) == NOP_EXPR) - folded = TREE_OPERAND (folded, 0); + combined_fn fn; + if (op->op0) + fn = as_combined_fn (DECL_FUNCTION_CODE + (TREE_OPERAND (op->op0, 0))); + else + fn = as_combined_fn ((internal_fn) op->clique); + tree folded; + if (arg1) + folded = fold_const_call (fn, ref->type, arg0->op0, arg1->op0); + else + folded = fold_const_call (fn, ref->type, arg0->op0); if (folded && is_gimple_min_invariant (folded)) return folded; @@ -5648,28 +5668,27 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = false) && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL) extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0)); } - if (!gimple_call_internal_p (call_stmt) - && (/* Calls to the same function with the same vuse - and the same operands do not necessarily return the same - value, unless they're pure or const. */ - ((gimple_call_flags (call_stmt) | extra_fnflags) - & (ECF_PURE | ECF_CONST)) - /* If calls have a vdef, subsequent calls won't have - the same incoming vuse. So, if 2 calls with vdef have the - same vuse, we know they're not subsequent. - We can value number 2 calls to the same function with the - same vuse and the same operands which are not subsequent - the same, because there is no code in the program that can - compare the 2 values... */ - || (gimple_vdef (call_stmt) - /* ... unless the call returns a pointer which does - not alias with anything else. In which case the - information that the values are distinct are encoded - in the IL. */ - && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) - /* Only perform the following when being called from PRE - which embeds tail merging. */ - && default_vn_walk_kind == VN_WALK))) + if (/* Calls to the same function with the same vuse + and the same operands do not necessarily return the same + value, unless they're pure or const. */ + ((gimple_call_flags (call_stmt) | extra_fnflags) + & (ECF_PURE | ECF_CONST)) + /* If calls have a vdef, subsequent calls won't have + the same incoming vuse. So, if 2 calls with vdef have the + same vuse, we know they're not subsequent. + We can value number 2 calls to the same function with the + same vuse and the same operands which are not subsequent + the same, because there is no code in the program that can + compare the 2 values... */ + || (gimple_vdef (call_stmt) + /* ... unless the call returns a pointer which does + not alias with anything else. In which case the + information that the values are distinct are encoded + in the IL. */ + && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) + /* Only perform the following when being called from PRE + which embeds tail merging. */ + && default_vn_walk_kind == VN_WALK)) changed = visit_reference_op_call (lhs, call_stmt); else changed = defs_to_varying (call_stmt); diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h index 96100596d2e..8a1b649c726 100644 --- a/gcc/tree-ssa-sccvn.h +++ b/gcc/tree-ssa-sccvn.h @@ -106,7 +106,8 @@ typedef const struct vn_phi_s *const_vn_phi_t; typedef struct vn_reference_op_struct { ENUM_BITFIELD(tree_code) opcode : 16; - /* Dependence info, used for [TARGET_]MEM_REF only. */ + /* Dependence info, used for [TARGET_]MEM_REF only. For internal + function calls clique is also used for the internal function code. */ unsigned short clique; unsigned short base; unsigned reverse : 1; </cut>

3 years, 9 months

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Infer flags from add/gep in any block

by ci_notify＠linaro.org

After llvm commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Infer flags from add/gep in any block the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 11124 to 11783 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 41% from 1504 to 2116 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 cd investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 0658bab870c89d81678f1f37aac0396ddd0913b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2ced9a42be8aba4533225fdb8ed02fe6f50060b6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> Date: Wed Oct 6 10:35:01 2021 -0700 [SCEV] Infer flags from add/gep in any block This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*. Differential Revision: https://reviews.llvm.org/D111186 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 --------- .../Analysis/DependenceAnalysis/Preliminary.ll | 2 +- .../Analysis/ScalarEvolution/flags-from-poison.ll | 8 +++---- .../SLPVectorizer/X86/consecutive-access.ll | 25 +++++++++------------- 4 files changed, 15 insertions(+), 30 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 6683b1a5205c..4fb2266b6e89 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6645,16 +6645,6 @@ bool ScalarEvolution::isGuaranteedToTransferExecutionTo(const Instruction *A, bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { - // Here we check that I is in the header of the innermost loop containing I, - // since we only deal with instructions in the loop header. The actual loop we - // need to check later will come from an add recurrence, but getting that - // requires computing the SCEV of the operands, which can be expensive. This - // check we can do cheaply to rule out some cases early. - Loop *InnermostContainingLoop = LI.getLoopFor(I->getParent()); - if (InnermostContainingLoop == nullptr || - InnermostContainingLoop->getHeader() != I->getParent()) - return false; - // Only proceed if we can prove that I does not yield poison. if (!programUndefinedIfPoison(I)) return false; diff --git a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll index 0899f67d6914..91827f3231ba 100644 --- a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll +++ b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll @@ -623,7 +623,7 @@ entry: ; CHECK-LABEL: p9 ; CHECK: da analyze - none! -; CHECK: da analyze - flow [|<]! +; CHECK: da analyze - none! ; CHECK: da analyze - confused! ; CHECK: da analyze - none! ; CHECK: da analyze - confused! diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index f0bda26edb38..0423854bbc3b 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -1628,9 +1628,9 @@ define noundef i32 @add-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'add-basic' ; CHECK-NEXT: Classifying expressions for: @add-basic ; CHECK-NEXT: %res = add nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a + %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a + %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a + %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a + %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @add-basic ; %res = add nuw nsw i32 %a, %b @@ -1656,9 +1656,9 @@ define noundef i32 @mul-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'mul-basic' ; CHECK-NEXT: Classifying expressions for: @mul-basic ; CHECK-NEXT: %res = mul nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a * %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a * %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a * %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a * %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @mul-basic ; %res = mul nuw nsw i32 %a, %b diff --git a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll index e4000b52c4a9..8f57fe6866bd 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll @@ -8,10 +8,6 @@ target triple = "x86_64-apple-macosx10.9.0" @C = common global [2000 x float] zeroinitializer, align 16 @D = common global [2000 x float] zeroinitializer, align 16 -; Currently SCEV isn't smart enough to figure out that accesses -; A[3*i], A[3*i+1] and A[3*i+2] are consecutive, but in future -; that would hopefully be fixed. For now, check that this isn't -; vectorized. ; Function Attrs: nounwind ssp uwtable define void @foo_3double(i32 %u) #0 { ; CHECK-LABEL: @foo_3double( @@ -21,26 +17,25 @@ define void @foo_3double(i32 %u) #0 { ; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3 ; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP0:%.*]] = load double, double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP1:%.*]] = load double, double* [[ARRAYIDX4]], align 8 -; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP0]], [[TMP1]] -; CHECK-NEXT: store double [[ADD5]], double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1 ; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64 ; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP2:%.*]] = load double, double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP0:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8 ; CHECK-NEXT: [[ARRAYIDX17:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP3:%.*]] = load double, double* [[ARRAYIDX17]], align 8 -; CHECK-NEXT: [[ADD18:%.*]] = fadd double [[TMP2]], [[TMP3]] -; CHECK-NEXT: store double [[ADD18]], double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP2:%.*]] = bitcast double* [[ARRAYIDX4]] to <2 x double>* +; CHECK-NEXT: [[TMP3:%.*]] = load <2 x double>, <2 x double>* [[TMP2]], align 8 +; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]] +; CHECK-NEXT: [[TMP5:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8 ; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2 ; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64 ; CHECK-NEXT: [[ARRAYIDX26:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP4:%.*]] = load double, double* [[ARRAYIDX26]], align 8 +; CHECK-NEXT: [[TMP6:%.*]] = load double, double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: [[ARRAYIDX30:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP5:%.*]] = load double, double* [[ARRAYIDX30]], align 8 -; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP4]], [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = load double, double* [[ARRAYIDX30]], align 8 +; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP6]], [[TMP7]] ; CHECK-NEXT: store double [[ADD31]], double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: ret void ; </cut>

3 years, 9 months

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Use full logic when infering flags on add and gep

by ci_notify＠linaro.org

After llvm commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Use full logic when infering flags on add and gep the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 10842 to 11545 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd cd investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d02db32644b7360bcda54cdf739fa42abe450fcd ../artifacts/test.sh # Reproduce last_good build git checkout --detach f39978b84f1d3a1da6c32db48f64c8daae64b3ad ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> Date: Sun Oct 3 15:32:15 2021 -0700 [SCEV] Use full logic when infering flags on add and gep This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path. We can still do much better here (on both paths), but this is our first step. Differential Revision: https://reviews.llvm.org/D111003 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 ++-------- .../Delinearization/multidim_ivs_and_integer_offsets_3d.ll | 2 +- .../Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll | 2 +- llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll | 4 ++-- llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll | 8 ++++---- llvm/test/Analysis/ScalarEvolution/load.ll | 2 +- llvm/test/Analysis/ScalarEvolution/ptrtoint.ll | 2 +- polly/test/IstAstInfo/simple-run-time-condition.ll | 2 +- 8 files changed, 13 insertions(+), 19 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 75cecbf48c08..70bf9aee6e0a 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6657,14 +6657,8 @@ bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { // TODO: We can do better here in some cases. if (!isSCEVable(Op->getType())) return false; - // TODO: the following two lines should be: - // if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) - // if (isGuaranteedToTransferExecutionTo(DefI, I)) - // We use the following instead for the purposes of seperating a bugfix - // change from an optimization change. Once pr51817 is fully addressed, - // we should unlock this power. - if (auto *AddRecS = dyn_cast<SCEVAddRecExpr>(getSCEV(Op))) - if (isGuaranteedToExecuteForEveryIteration(I, AddRecS->getLoop())) + if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) + if (isGuaranteedToTransferExecutionTo(DefI, I)) return true; } return false; diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll index 712a52927dcb..77982c786e6e 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] +; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nsw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A) { entry: diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll index e3fdb0642211..8ecd498ea211 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>] +; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nsw><%for.j>][{%r,+,1}<nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) { entry: diff --git a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll index 821513199546..1f1515435e1a 100644 --- a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll +++ b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll @@ -11,8 +11,8 @@ target triple = "powerpc64le-unknown-linux-gnu" ; } ; } -; CHECK-DAG: Loop 'for.i' has cost = 20300 -; CHECK-DAG: Loop 'for.j' has cost = 700 +; CHECK-DAG: Loop 'for.i' has cost = 20600 +; CHECK-DAG: Loop 'for.j' has cost = 800 define void @foo(i64 %n, i64 %m, i32* %A, i32* %B, i32* %C) { entry: diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index c8d3137f8dc9..5ab24159c250 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -273,9 +273,9 @@ define void @test-add-scope-bound-unkn-header(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. @@ -307,9 +307,9 @@ define void @test-add-scope-bound-unkn-header2(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header2 ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. diff --git a/llvm/test/Analysis/ScalarEvolution/load.ll b/llvm/test/Analysis/ScalarEvolution/load.ll index c0d671342af7..e95a093b2a8b 100644 --- a/llvm/test/Analysis/ScalarEvolution/load.ll +++ b/llvm/test/Analysis/ScalarEvolution/load.ll @@ -73,7 +73,7 @@ define i32 @test2() nounwind uwtable readonly { ; CHECK-NEXT: %n.01 = phi %struct.ListNode* [ bitcast ({ %struct.ListNode*, i32, [4 x i8] }* @node5 to %struct.ListNode*), %entry ], [ %1, %for.body ] ; CHECK-NEXT: --> %n.01 U: full-set S: full-set Exits: @node1 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %i = getelementptr inbounds %struct.ListNode, %struct.ListNode* %n.01, i64 0, i32 1 -; CHECK-NEXT: --> (4 + %n.01) U: full-set S: full-set Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } +; CHECK-NEXT: --> (4 + %n.01)<nuw> U: [4,0) S: [4,0) Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %0 = load i32, i32* %i, align 4 ; CHECK-NEXT: --> %0 U: full-set S: full-set Exits: 0 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %add = add nsw i32 %0, %sum.02 diff --git a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll index 93d8782f373e..cb40ddda9369 100644 --- a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll +++ b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll @@ -502,7 +502,7 @@ define void @pr46786_c26_int(i32* %arg, i32* %arg1, i32* %arg2) { ; X32-NEXT: %i11 = ashr exact i64 %i10, 2 ; X32-NEXT: --> %i11 U: [-2147483648,2147483648) S: [-2147483648,2147483648) Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i12 = getelementptr inbounds i32, i32* %arg2, i64 %i11 -; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32)) + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } +; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32))<nsw> + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i13 = load i32, i32* %i12, align 4 ; X32-NEXT: --> %i13 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i14 = add nsw i32 %i13, %i8 diff --git a/polly/test/IstAstInfo/simple-run-time-condition.ll b/polly/test/IstAstInfo/simple-run-time-condition.ll index aba5d9e34f50..0d167566291b 100644 --- a/polly/test/IstAstInfo/simple-run-time-condition.ll +++ b/polly/test/IstAstInfo/simple-run-time-condition.ll @@ -20,7 +20,7 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 ; for the delinearization is simplified such that conditions that would not ; cause any code to be executed are not generated. -; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((m >= 1 && n + p >= 9223372036854775809) || (o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) +; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) ; CHECK: if (o <= 0) { ; CHECK: for (int c0 = 0; c0 < n; c0 += 1) </cut>

3 years, 9 months

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].: commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Enhance -Waddress to detect more suspicious expressions [PR102103]. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21397 # First few build errors in logs: # 00:02:12 sound/core/oss/mixer_oss.c:1035:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:02:15 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/mixer_oss.o] Error 1 # 00:02:15 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2475:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2985:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:16 make[4]: *** [scripts/Makefile.build:288: sound/core/seq/oss/seq_oss_init.o] Error 1 # 00:02:22 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/pcm_oss.o] Error 1 # 00:02:28 make[3]: *** [scripts/Makefile.build:571: sound/core/seq/oss] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21403 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 cd investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f1710910087fb1f4a7706e9ce838163ffcbc50b4 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Oct 1 11:50:25 2021 -0600 Enhance -Waddress to detect more suspicious expressions [PR102103]. Resolves: PR c/102103 - missing warning comparing array address to null gcc/ChangeLog: PR c/102103 * doc/invoke.texi (-Waddress): Update. * gengtype.c (write_types): Avoid -Waddress. * poly-int.h (POLY_SET_COEFF): Avoid using null. gcc/c-family/ChangeLog: PR c/102103 * c-common.c (decl_with_nonnull_addr_p): Handle members. Check and perform warning suppression. (c_common_truthvalue_conversion): Enhance warning suppression. gcc/c/ChangeLog: PR c/102103 * c-typeck.c (maybe_warn_for_null_address): New function. (build_binary_op): Call it. gcc/cp/ChangeLog: PR c/102103 * typeck.c (warn_for_null_address): Enhance. (cp_build_binary_op): Call it also for member pointers. gcc/fortran/ChangeLog: PR c/102103 * array.c: Remove an unnecessary test. * trans-array.c: Same. gcc/testsuite/ChangeLog: PR c/102103 * g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning. * g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast. * gcc.dg/Waddress.c: Expect a warning. * c-c++-common/Waddress-3.c: New test. * c-c++-common/Waddress-4.c: New test. * g++.dg/warn/Waddress-5.C: New test. * g++.dg/warn/Waddress-6.C: New test. * g++.dg/warn/pr101219.C: Expect a warning. * gcc.dg/Waddress-3.c: New test. --- gcc/c-family/c-common.c | 29 +++-- gcc/c/c-typeck.c | 140 ++++++++++++++++----- gcc/cp/typeck.c | 94 ++++++++++++-- gcc/doc/invoke.texi | 48 +++++-- gcc/fortran/array.c | 2 +- gcc/fortran/trans-array.c | 1 - gcc/gengtype.c | 4 +- gcc/poly-int.h | 4 +- gcc/testsuite/c-c++-common/Waddress-3.c | 125 ++++++++++++++++++ gcc/testsuite/c-c++-common/Waddress-4.c | 106 ++++++++++++++++ gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C | 5 +- gcc/testsuite/g++.dg/warn/Waddress-5.C | 115 +++++++++++++++++ gcc/testsuite/g++.dg/warn/Waddress-6.C | 79 ++++++++++++ gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C | 4 +- gcc/testsuite/g++.dg/warn/pr101219.C | 4 +- gcc/testsuite/gcc.dg/Waddress-3.c | 35 ++++++ gcc/testsuite/gcc.dg/Waddress.c | 2 +- 17 files changed, 722 insertions(+), 75 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 5845c675e85..9d19e352725 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3393,14 +3393,16 @@ c_wrap_maybe_const (tree expr, bool non_const) return expr; } -/* Return whether EXPR is a declaration whose address can never be - NULL. */ +/* Return whether EXPR is a declaration whose address can never be NULL. + The address of the first struct member could be NULL only if it were + accessed through a NULL pointer, and such an access would be invalid. */ bool decl_with_nonnull_addr_p (const_tree expr) { return (DECL_P (expr) - && (TREE_CODE (expr) == PARM_DECL + && (TREE_CODE (expr) == FIELD_DECL + || TREE_CODE (expr) == PARM_DECL || TREE_CODE (expr) == LABEL_DECL || !DECL_WEAK (expr))); } @@ -3488,13 +3490,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) case ADDR_EXPR: { tree inner = TREE_OPERAND (expr, 0); - if (decl_with_nonnull_addr_p (inner)) + if (decl_with_nonnull_addr_p (inner) + /* Check both EXPR and INNER for suppression. */ + && !warning_suppressed_p (expr, OPT_Waddress) + && !warning_suppressed_p (inner, OPT_Waddress)) { - /* Common Ada programmer's mistake. */ + /* Common Ada programmer's mistake. */ warning_at (location, OPT_Waddress, "the address of %qD will always evaluate as %<true%>", inner); + suppress_warning (inner, OPT_Waddress); return truthvalue_true_node; } break; @@ -3627,8 +3633,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) break; /* If this isn't narrowing the argument, we can ignore it. */ if (TYPE_PRECISION (totype) >= TYPE_PRECISION (fromtype)) - return c_common_truthvalue_conversion (location, - TREE_OPERAND (expr, 0)); + { + tree op0 = TREE_OPERAND (expr, 0); + if ((TREE_CODE (fromtype) == POINTER_TYPE + && TREE_CODE (totype) == INTEGER_TYPE) + || warning_suppressed_p (expr, OPT_Waddress)) + /* Suppress -Waddress for casts to intptr_t, propagating + any suppression from the enclosing expression to its + operand. */ + suppress_warning (op0, OPT_Waddress); + return c_common_truthvalue_conversion (location, op0); + } } break; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index c74f876e667..33963d7555a 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -11554,6 +11554,110 @@ build_vec_cmp (tree_code code, tree type, return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } +/* Possibly warn about an address of OP never being NULL in a comparison + operation CODE involving null. */ + +static void +maybe_warn_for_null_address (location_t loc, tree op, tree_code code) +{ + if (!warn_address || warning_suppressed_p (op, OPT_Waddress)) + return; + + if (TREE_CODE (op) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (op); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + op = TREE_OPERAND (op, 0); + } + + if (TREE_CODE (op) == POINTER_PLUS_EXPR) + { + /* Allow a cast to void* to suppress the warning. */ + tree type = TREE_TYPE (TREE_TYPE (op)); + if (VOID_TYPE_P (type)) + return; + + /* Adding any value to a null pointer, including zero, is undefined + in C. This includes the expression &p[0] where p is the null + pointer, although &p[0] will have been folded to p by this point + and so not diagnosed. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the pointer operand in %qE must not be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the pointer operand in %qE must not be NULL", + op); + + return; + } + + if (TREE_CODE (op) != ADDR_EXPR) + return; + + op = TREE_OPERAND (op, 0); + + if (TREE_CODE (op) == IMAGPART_EXPR + || TREE_CODE (op) == REALPART_EXPR) + { + /* The address of either complex part may not be null. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + return; + } + + /* Set to true in the loop below if OP dereferences is operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (op)) + { + if (TREE_CODE (op) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + op = TREE_OPERAND (op, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + op = TREE_OPERAND (op, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (op)) + || from_macro_expansion_at (loc)) + return; + + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + + if (DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); +} + /* Build a binary-operation expression without default conversions. CODE is the kind of expression to build. LOCATION is the operator's location. @@ -12189,44 +12293,12 @@ build_binary_op (location_t location, enum tree_code code, short_compare = 1; else if (code0 == POINTER_TYPE && null_pointer_constant_p (orig_op1)) { - if (TREE_CODE (op0) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op0, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - } + maybe_warn_for_null_address (location, op0, code); result_type = type0; } else if (code1 == POINTER_TYPE && null_pointer_constant_p (orig_op0)) { - if (TREE_CODE (op1) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op1, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - } + maybe_warn_for_null_address (location, op1, code); result_type = type1; } else if (code0 == POINTER_TYPE && code1 == POINTER_TYPE) diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index cd130f16a66..e880d34dcfe 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -4603,25 +4603,93 @@ warn_for_null_address (location_t location, tree op, tsubst_flags_t complain) || warning_suppressed_p (op, OPT_Waddress)) return; + if (TREE_CODE (op) == NON_DEPENDENT_EXPR) + op = TREE_OPERAND (op, 0); + tree cop = fold_for_warn (op); - if (TREE_CODE (cop) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0)) - && !warning_suppressed_p (cop, OPT_Waddress)) - warning_at (location, OPT_Waddress, "the address of %qD will never " - "be NULL", TREE_OPERAND (cop, 0)); + if (TREE_CODE (cop) == NON_LVALUE_EXPR) + /* Unwrap the expression for C++ 98. */ + cop = TREE_OPERAND (cop, 0); - if (CONVERT_EXPR_P (op) + if (TREE_CODE (cop) == PTRMEM_CST) + { + /* The address of a nonstatic data member is never null. */ + warning_at (location, OPT_Waddress, + "the address %qE will never be NULL", + cop); + return; + } + + if (TREE_CODE (cop) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (cop); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + + STRIP_NOPS (cop); + } + + bool warned = false; + if (TREE_CODE (cop) == ADDR_EXPR) + { + cop = TREE_OPERAND (cop, 0); + + /* Set to true in the loop below if OP dereferences its operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be necessarily constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (cop)) + { + if (TREE_CODE (cop) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + cop = TREE_OPERAND (cop, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + cop = TREE_OPERAND (cop, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (cop)) + || from_macro_expansion_at (location) + || warning_suppressed_p (cop, OPT_Waddress)) + return; + + warned = warning_at (location, OPT_Waddress, + "the address of %qD will never be NULL", cop); + op = cop; + } + else if (TREE_CODE (cop) == POINTER_PLUS_EXPR) + { + /* Adding zero to the null pointer is well-defined in C++. When + the offset is unknown (i.e., not a constant) warn anyway since + it's less likely that the pointer operand is null than not. */ + tree off = TREE_OPERAND (cop, 1); + if (!integer_zerop (off) + && !warning_suppressed_p (cop, OPT_Waddress)) + warning_at (location, OPT_Waddress, "comparing the result of pointer " + "addition %qE and NULL", cop); + return; + } + else if (CONVERT_EXPR_P (op) && TYPE_REF_P (TREE_TYPE (TREE_OPERAND (op, 0)))) { - tree inner_op = op; - STRIP_NOPS (inner_op); + STRIP_NOPS (op); - if (DECL_P (inner_op)) - warning_at (location, OPT_Waddress, - "the compiler can assume that the address of " - "%qD will never be NULL", inner_op); + if (DECL_P (op)) + warned = warning_at (location, OPT_Waddress, + "the compiler can assume that the address of " + "%qD will never be NULL", op); } + + if (warned && DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); } /* Warn about [expr.arith.conv]/2: If one operand is of enumeration type and @@ -5411,6 +5479,8 @@ cp_build_binary_op (const op_location_t &location, op1 = cp_convert (TREE_TYPE (op0), op1, complain); } result_type = TREE_TYPE (op0); + + warn_for_null_address (location, orig_op0, complain); } else if (TYPE_PTRMEMFUNC_P (type1) && null_ptr_cst_p (orig_op0)) return cp_build_binary_op (location, code, op1, op0, complain); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5f39b208049..d35114c0727 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8551,17 +8551,43 @@ by @option{-Wall}. @item -Waddress @opindex Waddress @opindex Wno-address -Warn about suspicious uses of memory addresses. These include using -the address of a function in a conditional expression, such as -@code{void func(void); if (func)}, and comparisons against the memory -address of a string literal, such as @code{if (x == "abc")}. Such -uses typically indicate a programmer error: the address of a function -always evaluates to true, so their use in a conditional usually -indicate that the programmer forgot the parentheses in a function -call; and comparisons against string literals result in unspecified -behavior and are not portable in C, so they usually indicate that the -programmer intended to use @code{strcmp}. This warning is enabled by -@option{-Wall}. +Warn about suspicious uses of address expressions. These include comparing +the address of a function or a declared object to the null pointer constant +such as in +@smallexample +void f (void); +void g (void) +@{ + if (!func) // warning: expression evaluates to false + abort (); +@} +@end smallexample +comparisons of a pointer to a string literal, such as in +@smallexample +void f (const char *x) +@{ + if (x == "abc") // warning: expression evaluates to false + puts ("equal"); +@} +@end smallexample +and tests of the results of pointer addition or subtraction for equality +to null, such as in +@smallexample +void f (const int *p, int i) +@{ + return p + i == NULL; +@} +@end smallexample +Such uses typically indicate a programmer error: the address of most +functions and objects necessarily evaluates to true (the exception are +weak symbols), so their use in a conditional might indicate missing +parentheses in a function call or a missing dereference in an array +expression. The subset of the warning for object pointers can be +suppressed by casting the pointer operand to an integer type such +as @code{inptr_t} or @code{uinptr_t}. +Comparisons against string literals result in unspecified behavior +and are not portable, and suggest the intent was to call @code{strcmp}. +@option{-Waddress} warning is enabled by @option{-Wall}. @item -Wno-address-of-packed-member @opindex Waddress-of-packed-member diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c index a4d1cb4c72d..6552eaf3b0c 100644 --- a/gcc/fortran/array.c +++ b/gcc/fortran/array.c @@ -2581,7 +2581,7 @@ gfc_array_dimen_size (gfc_expr *array, int dimen, mpz_t *result) } } - if (array->shape && array->shape[dimen]) + if (array->shape) { mpz_init_set (*result, array->shape[dimen]); return true; diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index b8061f37772..e2f59e0823c 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -5104,7 +5104,6 @@ set_loop_bounds (gfc_loopinfo *loop) if (info->shape) { - gcc_assert (info->shape[dim]); /* The frontend has worked out the size for us. */ if (!loopspec[n] || !specinfo->shape diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 31d4bf4e5d0..a77cfd92bfa 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -3685,8 +3685,8 @@ write_types (outf_p output_header, type_p structures, output_mangled_typename (output_header, s); oprintf (output_header, "(X) do { \\\n"); oprintf (output_header, - " if (X != NULL) gt_%sx_%s (X);\\\n", wtd->prefix, - s_id_for_tag); + " if ((intptr_t)(X) != 0) gt_%sx_%s (X);\\\n", + wtd->prefix, s_id_for_tag); oprintf (output_header, " } while (0)\n"); for (opt = s->u.s.opt; opt; opt = opt->next) diff --git a/gcc/poly-int.h b/gcc/poly-int.h index f47f9e436a8..94e7b701f64 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -324,10 +324,10 @@ struct poly_result<T1, T2, 2> routine can take the address of RES rather than the address of a temporary. - The dummy comparison against a null C * is just a way of checking + The dummy self-comparison against C * is just a way of checking that C gives the right type. */ #define POLY_SET_COEFF(C, RES, I, VALUE) \ - ((void) (&(RES).coeffs[0] == (C *) 0), \ + ((void) (&(RES).coeffs[0] == (C *) (void *) &(RES).coeffs[0]), \ wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \ ? (void) ((RES).coeffs[I] = VALUE) \ : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) diff --git a/gcc/testsuite/c-c++-common/Waddress-3.c b/gcc/testsuite/c-c++-common/Waddress-3.c new file mode 100644 index 00000000000..9a13a444045 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-3.c @@ -0,0 +1,125 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __UINTPTR_TYPE__ uintptr_t; + +#ifndef __cplusplus +# define bool _Bool +#endif + +struct S { void *p, *a1[2], *a2[2][2]; } s, *p; + +extern const void *a1[2]; +extern void *a2[2][2], *ax[]; + +void T (bool); + +void test_array_eq_0 (int i) +{ + // Verify that casts intptr_t suppress the warning. + T ((intptr_t)a1 == 0); + T ((uintptr_t)a1 == 0); + T (a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + // Verify that casts to other pointer types don't suppress it. + T ((void *)a1 == 0); // { dg-warning "-Waddress" } + T ((char *)a1 == 0); // { dg-warning "-Waddress" } + T (a1[0] == 0); + T (0 == (intptr_t)&a1[0]); + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a1[i] == 0); + T (0 == (uintptr_t)&a1[i]); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((intptr_t)a2 == 0); + T (a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T (a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } + T (&ax == 0); // { dg-warning "-Waddress" } + T (0 == &ax); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (0 == ax[0]); +} + + +void test_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)a1); + + T (a1); // { dg-warning "-Waddress" } + T ((void *)a1); // { dg-warning "-Waddress" } + T (&a1 != 0); // { dg-warning "-Waddress" } + T (a1[0]); + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a1[i]); + T (&a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)a2); + T (a2); // { dg-warning "-Waddress" } + T ((void *)a2); // { dg-warning "-Waddress" } + T ((char *)a2); // { dg-warning "-Waddress" } + T (&a2 != 0); // { dg-warning "-Waddress" } + T (a2[0]); // { dg-warning "-Waddress" } + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a2[i]); // { dg-warning "-Waddress" } + T (&a2[i] != 0); // { dg-warning "-Waddress" } + T (a2[0][0]); + T (&a2[0][0] != 0); // { dg-warning "-Waddress" } +} + + +void test_member_array_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)s.a1 == 0); + T (s.a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + T (s.a1[0] == 0); + T ((void*)s.a1); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a1[i] == 0); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((uintptr_t)s.a2 == 0); + T (s.a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T ((void *)s.a2 == 0);// { dg-warning "-Waddress" } + T (s.a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (s.a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } +} + + +void test_member_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)s.a1); + T (s.a1); // { dg-warning "-Waddress" } + T (&s.a1 != 0); // { dg-warning "-Waddress" } + T ((void *)&s.a1[0]); // { dg-warning "-Waddress" } + T (s.a1[0]); + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a1[i]); + T (&s.a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)s.a2); + T (s.a2); // { dg-warning "-Waddress" } + T (&s.a2 != 0); // { dg-warning "-Waddress" } + T (s.a2[0]); // { dg-warning "-Waddress" } + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a2[i]); // { dg-warning "-Waddress" } + T (&s.a2[i] != 0); // { dg-warning "-Waddress" } + T (s.a2[0][0]); + T (&s.a2[0][0] != 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/c-c++-common/Waddress-4.c b/gcc/testsuite/c-c++-common/Waddress-4.c new file mode 100644 index 00000000000..489a0cd717c --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-4.c @@ -0,0 +1,106 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __INTPTR_TYPE__ uintptr_t; + +extern char *ax[], *a2[][2]; + +void T (int); + +void test_ax_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(ax + 0) == 0); + T ((uintptr_t)(ax + 1) == 0); + + T (ax + 0 == 0); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (ax - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-1]); // { dg-warning "-Waddress" } + T ((void *)(&ax[0] + 2) == 0); // { dg-warning "-Waddress" } + T (&ax[0] + 2 == 0); // { dg-warning "-Waddress" } + T (ax + 3 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-4]); // { dg-warning "-Waddress" } + T (ax - i == 0); // { dg-warning "-Waddress" } + T (&ax[i] == 0); // { dg-warning "-Waddress" } + T (0 == &ax[1] + i); // { dg-warning "-Waddress" } +} + +void test_a2_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(a2 + 0) == 0); + T ((uintptr_t)(a2 + 1) == 0); + + T (a2 + 0 == 0); // { dg-warning "-Waddress" } + // Verify that a cast to another pointer type doesn't suppress it. + T ((void*)(a2 + 0) == 0); // { dg-warning "-Waddress" } + T ((char*)a2 + 1 == 0); // { dg-warning "-Waddress" } + T (&a2[0] == 0); // { dg-warning "-Waddress" } + T (a2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-1]); // { dg-warning "-Waddress" } + T (a2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-2]); // { dg-warning "-Waddress" } + T (a2 - i == 0); // { dg-warning "-Waddress" } + T (&a2[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise a pointer. +void test_p_plus_eq_0 (int *p, int i) +{ + /* P + 0 and equivalently &P[0] are invalid for a null P but they're + folded to p before the warning has a chance to trigger. */ + T (p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&p[0] == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-1]); // { dg-warning "-Waddress" } + T (p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-2]); // { dg-warning "-Waddress" } + T (p - i == 0); // { dg-warning "-Waddress" } + T (&p[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise pointer to array. +void test_pa_plus_eq_0 (int (*p)[], int (*p2)[][2], int i) +{ + // The array pointer may be null. + T (*p == 0); + /* &**P is equivalent to *P and might be the result od macro expansion. + Verify it doesn't cause a warning. */ + T (&**p == 0); + + /* *P + 0 is invalid but folded to *P before the warning has a chance + to trigger. */ + T (*p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (&(*p)[0] == 0); // { dg-warning "-Waddress" } + T (*p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-1]); // { dg-warning "-Waddress" } + T (*p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-2]); // { dg-warning "-Waddress" } + T (*p - i == 0); // { dg-warning "-Waddress" } + T (&(*p)[i] == 0); // { dg-warning "-Waddress" } + + + /* Similar to the above but for a pointer to a two-dimensional array, + referring to the higher-level element (i.e., an array itself). */ + T (*p2 == 0); + T (**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&***p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); + + T (*p2 + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&(*p2)[0] == 0); // { dg-warning "-Waddress" } + T (&(*p2)[0][1] == 0); // { dg-warning "-Waddress" } + T (*p2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-1]); // { dg-warning "-Waddress" } + T (0 == &(*p2)[1][2]); // { dg-warning "-Waddress" } + T (*p2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-2]); // { dg-warning "-Waddress" } + T (*p2 - i == 0); // { dg-warning "-Waddress" } + T (&(*p2)[i] == 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C index 5224bb14234..63295230d51 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C @@ -85,8 +85,11 @@ extern __attribute__ ((weak)) int i; constexpr int *p1 = &i + 1; #pragma GCC diagnostic push +// Suppress warning: ordered comparison of pointer with integer zero. #pragma GCC diagnostic ignored "-Wextra" -// Suppress warning: ordered comparison of pointer with integer zero +// Also suppress -Waddress for comparisons of constant addresses to +// to null. +#pragma GCC diagnostic ignored "-Waddress" constexpr bool b0 = p1; // { dg-error "not a constant expression" } constexpr bool b1 = p1 == 0; // { dg-error "not a constant expression" } diff --git a/gcc/testsuite/g++.dg/warn/Waddress-5.C b/gcc/testsuite/g++.dg/warn/Waddress-5.C new file mode 100644 index 00000000000..b1ad38a8112 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-5.C @@ -0,0 +1,115 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +struct A +{ + void f (); + virtual void vf (); + virtual void pvf () = 0; + + void sf (); + + int *p; + int a[2]; +}; + +void T (bool); + +void warn_memptr_if () +{ + // Exercise warnings for addresses of nonstatic member functions. + if (&A::f == 0) // { dg-warning "the address '&A::f'" } + T (0); + + if (&A::vf) // { dg-warning "-Waddress" } + T (0); + + if (&A::pvf != 0) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of static member functions. + if (&A::sf == 0) // { dg-warning "-Waddress" } + T (0); + + if (&A::sf) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of nonstatic data members. + if (&A::p == 0) // { dg-warning "the address '&A::p'" } + T (0); + + if (&A::a == nullptr) // { dg-warning "-Waddress" } + T (0); +} + +void warn_memptr_bool () +{ + // Exercise warnings for addresses of nonstatic member functions. + T (&A::f == 0); // { dg-warning "-Waddress" } + T (&A::vf); // { dg-warning "-Waddress" } + T (&A::pvf != 0); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of static member functions. + T (&A::sf == 0); // { dg-warning "-Waddress" } + T (&A::sf); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of nonstatic data members. + T (&A::p == 0); // { dg-warning "-Waddress" } + T (&A::a == nullptr); // { dg-warning "-Waddress" } +} + + +/* Verify that no warnings are issued for a dependent expression in + a template. */ + +template <int> +struct B +{ + // This is why. + struct F { void* operator& () const { return 0; } } f; +}; + +template <class Type, int N> +void nowarn_dependent (Type targ) +{ + T (&Type::x == 0); + T (&targ == 0); + + Type tarr[1]; + T (&tarr[0] == nullptr); + + T (&B<N>::f == 0); + + /* Like in the case above, the address-of operator could be a member + of B<N>::vf that returns zero. */ + T (&B<N>::vf); + T (&B<N>::pvf != 0); + T (&B<N>::p == 0); + T (&B<N>::a == 0); +} + + +/* Verify that in an uninstantiated template warnings are not issued + for dependent expressions but are issued otherwise. */ + +template <class Type> +void warn_non_dependent (Type targ, Type *tptr, int i) +{ + /* The address of a pointer to a dependent type cannot be null but + the warning doesn't have a chance to see it. */ + T (&tptr == 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&i == 0); // { dg-warning "-Waddress" } + + int iarr[1]; + T (&iarr == 0); // { dg-warning "-Waddress" } + T (&*iarr != 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&iarr[0] == 0); // { dg-warning "-Waddress" } + + Type tarr[1]; + T (&tarr == nullptr); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } +} diff --git a/gcc/testsuite/g++.dg/warn/Waddress-6.C b/gcc/testsuite/g++.dg/warn/Waddress-6.C new file mode 100644 index 00000000000..c22a83a0dd7 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-6.C @@ -0,0 +1,79 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + Verify -Waddress for member arrays of structs and notes. + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +void T (bool); + +struct A +{ + int n; + int ia[]; // { dg-message "'A::ia' declared here" } +}; + +struct B +{ + A a[3]; // { dg-message "'B::a' declared here" } +}; + +struct C +{ + B b[3]; // { dg-message "'C::b' declared here" } +}; + +struct D +{ + C c[3]; // { dg-message "'D::c' declared here" } +}; + + +void test_waddress_1d () +{ + D d[2]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (d->c); // { dg-warning "address of 'D::c'" } + T (d->c != nullptr); // { dg-warning "address of 'D::c'" } + T (d->c->b); // { dg-warning "address of 'C::b'" } + T (d->c[1].b->a); // { dg-warning "address of 'B::a'" } + T (d->c->b[2].a->ia); // { dg-warning "address of 'A::ia'" } + + if (d->c->b[2].a[1].ia) // { dg-warning "address of 'A::ia'" } + T (0); + + if (bool b = d->c->b[1].a) // { dg-warning "address of 'B::a'" } + T (b); + + /* The following is represented as a declaration of P followed + by an if statement and so it isn't diagnosed. It's not clear + that it should be since the pointer is then used. + void *p = d->c->b[2].a; + if (p) ... + */ + if (void *p = d->c->b[2].a) // { dg-warning "address of 'A::ia'" "" { xfail *-*-* } } + T (p); +} + + +void test_waddress_2d (int i) +{ + D d[2][3]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (*d); // { dg-warning "address of 'd'" } + T (d[1] != nullptr); // { dg-warning "address of 'd'" } + T (&d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c == nullptr); // { dg-warning "address of 'D::c'" } + T (d[i]->c[1].b); // { dg-warning "address of 'C::b'" } + T ((*(d + i))->c->b->a); // { dg-warning "address of 'B::a'" } + T (d[1][2].c->b->a->ia); // { dg-warning "address of 'A::ia'" } +} </cut>

3 years, 9 months

[TCWG CI] 458.sjeng grew in size by 4% after gcc: aarch64: Improve size heuristic for cpymem expansion

by ci_notify＠linaro.org

After gcc commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> aarch64: Improve size heuristic for cpymem expansion the following benchmarks grew in size by more than 1%: - 458.sjeng grew in size by 4% from 105780 to 109944 bytes - 459.GemsFDTD grew in size by 2% from 247504 to 251468 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 cd investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a459ee44c0a74b0df0485ed7a56683816c02aae9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8f95e3c04d659d541ca4937b3df2f1175a1c5f05 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> Date: Wed Sep 29 11:21:45 2021 +0100 aarch64: Improve size heuristic for cpymem expansion Similar to my previous patch for setmem this one does the same for the cpymem expansion. We count the number of ops emitted and compare it against the alternative of just calling the library function when optimising for size. For the code: void cpy_127 (char *out, char *in) { __builtin_memcpy (out, in, 127); } void cpy_128 (char *out, char *in) { __builtin_memcpy (out, in, 128); } we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of: cpy_127(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldr q0, [x1, 96] str q0, [x0, 96] ldr q0, [x1, 111] str q0, [x0, 111] ret cpy_128(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldp q0, q1, [x1, 96] stp q0, q1, [x0, 96] ret which is a clear code size win. Speed optimisation heuristics remain unchanged. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of emitted operations and adjust heuristic for code size. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * gcc.target/aarch64/cpymem-size.c: New test. --- gcc/config/aarch64/aarch64.c | 36 ++++++++++++++++++-------- gcc/testsuite/gcc.target/aarch64/cpymem-size.c | 29 +++++++++++++++++++++ 2 files changed, 54 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac17c1c88fb..a9a1800af53 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -23390,7 +23390,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, } /* Expand cpymem, as if from a __builtin_memcpy. Return true if - we succeed, otherwise return false. */ + we succeed, otherwise return false, indicating that a libcall to + memcpy should be emitted. */ bool aarch64_expand_cpymem (rtx *operands) @@ -23407,11 +23408,13 @@ aarch64_expand_cpymem (rtx *operands) unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - /* Inline up to 256 bytes when optimizing for speed. */ + /* Try to inline up to 256 bytes. */ unsigned HOST_WIDE_INT max_copy_size = 256; - if (optimize_function_for_size_p (cfun)) - max_copy_size = 128; + bool size_p = optimize_function_for_size_p (cfun); + + if (size > max_copy_size) + return false; int copy_bits = 256; @@ -23421,13 +23424,14 @@ aarch64_expand_cpymem (rtx *operands) || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) - { - copy_bits = 128; - max_copy_size = max_copy_size / 2; - } + copy_bits = 128; - if (size > max_copy_size) - return false; + /* Emit an inline load+store sequence and count the number of operations + involved. We use a simple count of just the loads and stores emitted + rather than rtx_insn count as all the pointer adjustments and reg copying + in this function will get optimized away later in the pipeline. */ + start_sequence (); + unsigned nops = 0; base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -23456,7 +23460,8 @@ aarch64_expand_cpymem (rtx *operands) cur_mode = V4SImode; aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); - + /* A single block copy is 1 load + 1 store. */ + nops += 2; n -= mode_bits; /* Emit trailing copies using overlapping unaligned accesses - this is @@ -23471,7 +23476,16 @@ aarch64_expand_cpymem (rtx *operands) n = n_bits; } } + rtx_insn *seq = get_insns (); + end_sequence (); + + /* A memcpy libcall in the worst case takes 3 instructions to prepare the + arguments + 1 for the call. */ + unsigned libcall_cost = 4; + if (size_p && libcall_cost < nops) + return false; + emit_insn (seq); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-size.c b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c new file mode 100644 index 00000000000..4d488b74301 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#include <stdlib.h> + +/* +** cpy_127: +** mov x2, 127 +** b memcpy +*/ +void +cpy_127 (char *out, char *in) +{ + __builtin_memcpy (out, in, 127); +} + +/* +** cpy_128: +** mov x2, 128 +** b memcpy +*/ +void +cpy_128 (char *out, char *in) +{ + __builtin_memcpy (out, in, 128); +} + +/* { dg-final { check-function-bodies "**" "" "" } } */ + </cut>

3 years, 9 months

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.: commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> [PR102546] X << Y being non-zero implies X is also non-zero. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18603 # First few build errors in logs: # 00:01:53 arch/arm/vfp/vfpdouble.c:1206:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 arch/arm/vfp/vfpsingle.c:1246:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpdouble.o] Error 1 # 00:01:54 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpsingle.o] Error 1 # 00:01:55 make[1]: *** [scripts/Makefile.build:514: arch/arm/vfp] Error 2 # 00:01:56 arch/arm/nwfpe/softfloat.c:3432:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:57 make[2]: *** [scripts/Makefile.build:271: arch/arm/nwfpe/softfloat.o] Error 1 # 00:01:57 make[1]: *** [scripts/Makefile.build:514: arch/arm/nwfpe] Error 2 # 00:02:14 arch/arm/kernel/smp.c:857:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:02:15 make[2]: *** [scripts/Makefile.build:271: arch/arm/kernel/smp.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19709 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-stable-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Reproduce builds: <cut> mkdir investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 cd investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5f9ccf17de7f7581412c6bffd4a37beca9a79836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 257d2890a769a8aa564d079170377e637e07acb1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Fri Oct 1 13:05:36 2021 +0200 [PR102546] X << Y being non-zero implies X is also non-zero. This patch teaches this to range-ops. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102546 * range-op.cc (operator_lshift::op1_range): Teach range-ops that X << Y is non-zero implies X is also non-zero. --- gcc/range-op.cc | 18 ++++++++++++++---- gcc/testsuite/gcc.dg/tree-ssa/pr102546.c | 23 +++++++++++++++++++++++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 5e37133026d..2baca4a197f 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2078,6 +2078,12 @@ operator_lshift::op1_range (irange &r, relation_kind rel ATTRIBUTE_UNUSED) const { tree shift_amount; + + if (!lhs.contains_p (build_zero_cst (type))) + r.set_nonzero (type); + else + r.set_varying (type); + if (op2.singleton_p (&shift_amount)) { wide_int shift = wi::to_wide (shift_amount); @@ -2089,21 +2095,24 @@ operator_lshift::op1_range (irange &r, return false; if (shift == 0) { - r = lhs; + r.intersect (lhs); return true; } // Work completely in unsigned mode to start. tree utype = type; + int_range_max tmp_range; if (TYPE_SIGN (type) == SIGNED) { int_range_max tmp = lhs; utype = unsigned_type_for (type); range_cast (tmp, utype); - op_rshift.fold_range (r, utype, tmp, op2); + op_rshift.fold_range (tmp_range, utype, tmp, op2); } else - op_rshift.fold_range (r, utype, lhs, op2); + op_rshift.fold_range (tmp_range, utype, lhs, op2); + + r.intersect (tmp_range); // Start with ranges which can produce the LHS by right shifting the // result by the shift amount. @@ -2128,7 +2137,8 @@ operator_lshift::op1_range (irange &r, range_cast (r, type); return true; } - return false; + + return !r.varying_p (); } bool diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c new file mode 100644 index 00000000000..4bd98747732 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-O3 -fdump-tree-optimized" } + +static int a; +static char b, c, d; +void bar(void); +void foo(void); + +int main() { + int f = 0; + for (; f <= 5; f++) { + bar(); + b = b && f; + d = f << f; + if (!(a >= d || f)) + foo(); + c = 1; + for (; c; c = 0) + ; + } +} + +// { dg-final { scan-tree-dump-not "foo" "optimized" } } </cut>

3 years, 9 months

Re: [TCWG CI] 400.perlbench slowed down by 6% after llvm: [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest

by Maxim Kuvyrkov

Hi Arthur, Thanks for looking into this! The flags to compile regexec.c were: -O3 --target=aarch64-linux-gnu -fgnu89-inline Clang was configured with (on x86_64-linux-gnu host): cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64 Please let me know if the above doesn’t work for you. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 29 Sep 2021, at 20:47, Arthur Eubanks <aeubanks(a)google.com> wrote: > > Do you know the flags passed to Clang to compile the sources? I tried compiling the preprocessed sources but ran into the below, and couldn't find the flags in any of the logs. > > In file included from regexec.c:93: > In file included from ./perl.h:384: > In file included from /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144: > /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: error: typedef redefinition with different types ('unsigned long' vs 'unsigned long long') > typedef long unsigned int size_t; > ^ > 1 error generated. > > > > And yeah just moving the code around could cause major performance regressions, I've had other patches do the same for various benchmarks, there's not much we can do about that if that's actually the root cause. If I can compile the file I can check if the optimization actually created worse IR or not. > > > On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: > Hi Arthur, > > Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i . > > The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1. > > Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”. > > Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way? > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeubanks(a)google.com> wrote: >> >> Could I get the source file with S_regmatch()? >> >> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: >> Hi Arthur, >> >> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of its hot function S_regmatch() by 14%. >> >> Could you take a look if this is easily fixable, please? >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> > On 24 Sep 2021, at 15:07, ci_notify(a)linaro.org wrote: >> > >> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc >> > Author: Arthur Eubanks <aeubanks(a)google.com> >> > >> > [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest >> > >> > the following benchmarks slowed down by more than 2%: >> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples >> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples >> > >> > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. >> > >> > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: >> > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > >> > Configuration: >> > - Benchmark: SPEC CPU2006 >> > - Toolchain: Clang + Glibc + LLVM Linker >> > - Version: all components were built from their tip of trunk >> > - Target: aarch64-linux-gnu >> > - Compiler flags: -O3 >> > - Hardware: NVidia TX1 4x Cortex-A57 >> > >> > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>

3 years, 9 months

[TCWG CI] 456.hmmer slowed down by 5% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 5% from 7649 to 8028 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O2 -flto -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

3 years, 9 months

[ACTIVITY] report week ending 1 Oct

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Worked through my code-review backlog + Noticed that we never got round to making our emulated GICv3 support having redistributors in more than one contiguous region; this prevents using more than 123 CPUs with the virt board. Sent out a patchset which adds the necessary handling. + Generally trying to tie off loose ends pre-holiday :-) -- PMM

3 years, 9 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain