linaro-toolchain September 2021

linaro-toolchain@lists.linaro.org

18 participants
86 discussions

[TCWG CI] Regression caused by gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632

by ci_notify＠linaro.org

Identified regression caused by *gcc:5485bbebb3679245dd4bc7c149bbc940f8b2e632*: commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Refactor jump_thread_path_registry. Results regressed to (for first_bad == 5485bbebb3679245dd4bc7c149bbc940f8b2e632) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-5485bbebb3679245dd4bc7c149bbc940f8b2e632/results_id: 1 # 458.sjeng,sjeng_base.default regressed by 104 from (for last_good == 3fca63b0b6faf6a30ed735b86b8eb59944701fc1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-3fca63b0b6faf6a30ed735b86b8eb59944701fc1/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 cd investigate-gcc-5485bbebb3679245dd4bc7c149bbc940f8b2e632 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5485bbebb3679245dd4bc7c149bbc940f8b2e632 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3fca63b0b6faf6a30ed735b86b8eb59944701fc1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5485bbebb3679245dd4bc7c149bbc940f8b2e632 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Sat Sep 11 09:37:39 2021 +0200 Refactor jump_thread_path_registry. In an attempt to refactor thread_through_all_blocks(), I've realized that there is a mess of code dealing with coexisting forward and backward thread types. However, this is an impossible scenario, as the registry contains either forward/old-style threads, or backward threads (EDGE_FSM_THREADs), never both. The fact that both types of threads cannot coexist, simplifies the code considerably. For that matter, it splits things up nicely because there are some common bits that can go into a base class, and some differing code that can go into derived classes. Diving things in this way makes it very obvious which parts belong in the old-style copier and which parts belong to the generic copier. Doing all this provided some nice cleanups, as well as fixing a latent bug in adjust_paths_after_duplication. The diff is somewhat hard to read, so perhaps looking at the final output would be easier. A general overview of what this patch achieves can be seen by just looking at this simplified class layout: // Abstract class for the jump thread registry. class jt_path_registry { public: jt_path_registry (); virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); protected: vec<vec<jump_thread_edge *> *> m_paths; unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; }; // Forward threader path registry using a custom BB copier. class fwd_jt_path_registry : public jt_path_registry { public: fwd_jt_path_registry (); ~fwd_jt_path_registry (); void remove_jump_threads_including (edge); private: bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); hash_table<struct removed_edges> *m_removed_edges; hash_table<redirection_data> *m_redirection_data; }; // Backward threader path registry using a generic BB copier. class back_jt_path_registry : public jt_path_registry { private: bool update_cfg (bool peel_loop_headers) override; void adjust_paths_after_duplication (unsigned curr_path_num); bool duplicate_thread_path (edge entry, edge exit, basic_block *region, unsigned n_region, unsigned current_path_no); bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; That is, the forward and backward bits have been completely split, while deriving from a base class for the common functionality. Most everything is mechanical, but there are a few gotchas: a) back_jt_path_registry::update_cfg(), which contains the backward threading specific bits, is rather simple, since most of the code in the original thread_through_all_blocks() only applied to the forward threader: removed edges, mark_threaded_blocks, thread_through_loop_header, the copy tables (*). (*) The back threader has its own copy tables in duplicate_thread_path. b) In some cases, adjust_paths_after_duplication() was commoning out so many blocks that it was removing the initial EDGE_FSM_THREAD marker. I've fixed this. c) AFAICT, when run from the forward threader, thread_through_all_blocks() attempts to remove threads starting with an edge already seen, but it would never see anything because the loop doing the checking only has a visited_starting_edges.contains(), and no corresponding visited_starting_edges.add(). The add() method in thread_through_all_blocks belongs to the backward threading bits, and as I've explained, both types cannot coexist. I've removed the checks in the forward bits since they don't appear to do anything. If this was an oversight, and we want to avoid threading already seen edges in the forward threader, I can move this functionality to the base class. Ultimately I would like to move all the registry code to tree-ssa-threadregistry.*. I've avoided this in this patch to aid in review. My apologies for this longass explanation, but I want to make sure we're covering all of our bases. Tested on x86-64 Linux by a very tedious process of moving chunks around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and repeating ad-nauseum. And of course, by running a full bootstrap and tests. OK? p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD type. gcc/ChangeLog: * tree-ssa-threadbackward.c (class back_threader_registry): Use back_jt_path_registry. * tree-ssa-threadedge.c (jump_threader::jump_threader): Use fwd_jt_path_registry. * tree-ssa-threadedge.h (class jump_threader): Same.. * tree-ssa-threadupdate.c (jump_thread_path_registry::jump_thread_path_registry): Rename... (jt_path_registry::jt_path_registry): ...to this. (jump_thread_path_registry::~jump_thread_path_registry): Rename... (jt_path_registry::~jt_path_registry): ...this. (fwd_jt_path_registry::fwd_jt_path_registry): New. (fwd_jt_path_registry::~fwd_jt_path_registry): New. (jump_thread_path_registry::allocate_thread_edge): Rename... (jt_path_registry::allocate_thread_edge): ...to this. (jump_thread_path_registry::allocate_thread_path): Rename... (jt_path_registry::allocate_thread_path): ...to this. (jump_thread_path_registry::lookup_redirection_data): Rename... (fwd_jt_path_registry::lookup_redirection_data): ...to this. (jump_thread_path_registry::thread_block_1): Rename... (fwd_jt_path_registry::thread_block_1): ...to this. (jump_thread_path_registry::thread_block): Rename... (fwd_jt_path_registry::thread_block): ...to this. (jt_path_registry::thread_through_loop_header): Rename... (fwd_jt_path_registry::thread_through_loop_header): ...to this. (jump_thread_path_registry::mark_threaded_blocks): Rename... (fwd_jt_path_registry::mark_threaded_blocks): ...to this. (jump_thread_path_registry::debug_path): Rename... (jt_path_registry::debug_path): ...to this. (jump_thread_path_registry::dump): Rename... (jt_path_registry::debug): ...to this. (jump_thread_path_registry::rewire_first_differing_edge): Rename... (back_jt_path_registry::rewire_first_differing_edge): ...to this. (jump_thread_path_registry::adjust_paths_after_duplication): Rename... (back_jt_path_registry::adjust_paths_after_duplication): ...to this. (jump_thread_path_registry::duplicate_thread_path): Rename... (back_jt_path_registry::duplicate_thread_path): ...to this. Also, drop ill-formed candidates. (jump_thread_path_registry::remove_jump_threads_including): Rename... (fwd_jt_path_registry::remove_jump_threads_including): ...to this. (jt_path_registry::thread_through_all_blocks): New. (back_jt_path_registry::update_cfg): New. (fwd_jt_path_registry::update_cfg): New. (jump_thread_path_registry::register_jump_thread): Rename... (jt_path_registry::register_jump_thread): ...to this. * tree-ssa-threadupdate.h (class jump_thread_path_registry): Abstract to... (class jt_path_registry): ...here. (class fwd_jt_path_registry): New. (class back_jt_path_registry): New. --- gcc/tree-ssa-threadbackward.c | 2 +- gcc/tree-ssa-threadedge.c | 2 +- gcc/tree-ssa-threadedge.h | 2 +- gcc/tree-ssa-threadupdate.c | 213 +++++++++++++++++++++--------------------- gcc/tree-ssa-threadupdate.h | 60 +++++++----- 5 files changed, 149 insertions(+), 130 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index e72992328de..7ff5cecbdab 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -56,7 +56,7 @@ public: bool register_path (const vec<basic_block> &, edge taken); bool thread_through_all_blocks (bool may_peel_loop_headers); private: - jump_thread_path_registry m_lowlevel_registry; + back_jt_path_registry m_lowlevel_registry; const int m_max_allowable_paths; int m_threaded_paths; }; diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 3c7cdc58b93..422cb89401b 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -71,7 +71,7 @@ jump_threader::jump_threader (jump_threader_simplifier *simplifier, dummy_cond = gimple_build_cond (NE_EXPR, integer_zero_node, integer_zero_node, NULL, NULL); - m_registry = new jump_thread_path_registry (); + m_registry = new fwd_jt_path_registry (); m_simplifier = simplifier; m_state = state; } diff --git a/gcc/tree-ssa-threadedge.h b/gcc/tree-ssa-threadedge.h index 0002b200d8b..18e6bd41aaa 100644 --- a/gcc/tree-ssa-threadedge.h +++ b/gcc/tree-ssa-threadedge.h @@ -75,7 +75,7 @@ private: // Dummy condition to avoid creating lots of throw away statements. gcond *dummy_cond; - class jump_thread_path_registry *m_registry; + class fwd_jt_path_registry *m_registry; jump_threader_simplifier *m_simplifier; jt_state *m_state; }; diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 18f16efbb7a..93538104fdf 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -167,29 +167,36 @@ jump_thread_path_allocator::allocate_thread_path () return new (r) vec<jump_thread_edge *> (); } -jump_thread_path_registry::jump_thread_path_registry () +jt_path_registry::jt_path_registry () { m_paths.create (5); - m_removed_edges = new hash_table<struct removed_edges> (17); m_num_threaded_edges = 0; - m_redirection_data = NULL; } -jump_thread_path_registry::~jump_thread_path_registry () +jt_path_registry::~jt_path_registry () { m_paths.release (); +} + +fwd_jt_path_registry::fwd_jt_path_registry () +{ + m_removed_edges = new hash_table<struct removed_edges> (17); + m_redirection_data = NULL; +} + +fwd_jt_path_registry::~fwd_jt_path_registry () +{ delete m_removed_edges; } jump_thread_edge * -jump_thread_path_registry::allocate_thread_edge (edge e, - jump_thread_edge_type t) +jt_path_registry::allocate_thread_edge (edge e, jump_thread_edge_type t) { return m_allocator.allocate_thread_edge (e, t); } vec<jump_thread_edge *> * -jump_thread_path_registry::allocate_thread_path () +jt_path_registry::allocate_thread_path () { return m_allocator.allocate_thread_path (); } @@ -426,8 +433,7 @@ create_block_for_threading (basic_block bb, edges associated with E in the hash table. */ redirection_data * -jump_thread_path_registry::lookup_redirection_data (edge e, - enum insert_option insert) +fwd_jt_path_registry::lookup_redirection_data (edge e, insert_option insert) { struct redirection_data **slot; struct redirection_data *elt; @@ -1413,9 +1419,9 @@ redirection_block_p (basic_block bb) If JOINERS is true, then thread through joiner blocks as well. */ bool -jump_thread_path_registry::thread_block_1 (basic_block bb, - bool noloop_only, - bool joiners) +fwd_jt_path_registry::thread_block_1 (basic_block bb, + bool noloop_only, + bool joiners) { /* E is an incoming edge into BB that we may or may not want to redirect to a duplicate of BB. */ @@ -1594,7 +1600,7 @@ jump_thread_path_registry::thread_block_1 (basic_block bb, opportunity. */ bool -jump_thread_path_registry::thread_block (basic_block bb, bool noloop_only) +fwd_jt_path_registry::thread_block (basic_block bb, bool noloop_only) { bool retval; retval = thread_block_1 (bb, noloop_only, false); @@ -1675,9 +1681,8 @@ determine_bb_domination_status (class loop *loop, basic_block bb) to the inside of the loop. */ bool -jump_thread_path_registry::thread_through_loop_header - (class loop *loop, - bool may_peel_loop_headers) +fwd_jt_path_registry::thread_through_loop_header (class loop *loop, + bool may_peel_loop_headers) { basic_block header = loop->header; edge e, tgt_edge, latch = loop_latch_edge (loop); @@ -1932,7 +1937,7 @@ count_stmts_and_phis_in_block (basic_block bb) hash table lookups to map from threaded edge to new target. */ void -jump_thread_path_registry::mark_threaded_blocks (bitmap threaded_blocks) +fwd_jt_path_registry::mark_threaded_blocks (bitmap threaded_blocks) { unsigned int i; bitmap_iterator bi; @@ -2197,7 +2202,7 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n) } void -jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) +jt_path_registry::debug_path (FILE *dump_file, int pathno) { vec<jump_thread_edge *> *p = m_paths[pathno]; fprintf (dump_file, "path: "); @@ -2208,7 +2213,7 @@ jump_thread_path_registry::debug_path (FILE *dump_file, int pathno) } void -jump_thread_path_registry::dump () +jt_path_registry::debug () { for (unsigned i = 0; i < m_paths.length (); ++i) debug_path (stderr, i); @@ -2223,8 +2228,8 @@ jump_thread_path_registry::dump () Returns TRUE if we were able to successfully rewire the edge. */ bool -jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, - unsigned edge_num) +back_jt_path_registry::rewire_first_differing_edge (unsigned path_num, + unsigned edge_num) { vec<jump_thread_edge *> *path = m_paths[path_num]; edge &e = (*path)[edge_num]->e; @@ -2269,11 +2274,9 @@ jump_thread_path_registry::rewire_first_differing_edge (unsigned path_num, specifies the path that was just threaded. */ void -jump_thread_path_registry::adjust_paths_after_duplication - (unsigned curr_path_num) +back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num) { vec<jump_thread_edge *> *curr_path = m_paths[curr_path_num]; - gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2347,8 +2350,16 @@ jump_thread_path_registry::adjust_paths_after_duplication m_paths.unordered_remove (cand_path_num); continue; } - /* Otherwise, just remove the redundant sub-path. */ - cand_path->block_remove (0, j); + if ((*cand_path)[j]->type != EDGE_FSM_THREAD) + { + /* If all the EDGE_FSM_THREADs are common, all that's + left is the final EDGE_NO_COPY_SRC_BLOCK. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Dropping illformed candidate.\n"); + } + else + /* Otherwise, just remove the redundant sub-path. */ + cand_path->block_remove (0, j); } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2372,11 +2383,11 @@ jump_thread_path_registry::adjust_paths_after_duplication Returns false if it is unable to copy the region, true otherwise. */ bool -jump_thread_path_registry::duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no) +back_jt_path_registry::duplicate_thread_path (edge entry, + edge exit, + basic_block *region, + unsigned n_region, + unsigned current_path_no) { unsigned i; class loop *loop = entry->dest->loop_father; @@ -2551,7 +2562,7 @@ valid_jump_thread_path (vec<jump_thread_edge *> *path) DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR. */ void -jump_thread_path_registry::remove_jump_threads_including (edge_def *e) +fwd_jt_path_registry::remove_jump_threads_including (edge_def *e) { if (!m_paths.exists ()) return; @@ -2560,69 +2571,52 @@ jump_thread_path_registry::remove_jump_threads_including (edge_def *e) *slot = e; } -/* Walk through all blocks and thread incoming edges to the appropriate - outgoing edge for each edge pair recorded in THREADED_EDGES. +/* Thread all paths that have been queued for jump threading, and + update the CFG accordingly. It is the caller's responsibility to fix the dominance information and rewrite duplicated SSA_NAMEs back into SSA form. - If MAY_PEEL_LOOP_HEADERS is false, we avoid threading edges through - loop headers if it does not simplify the loop. + If PEEL_LOOP_HEADERS is false, avoid threading edges through loop + headers if it does not simplify the loop. - Returns true if one or more edges were threaded, false otherwise. */ + Returns true if one or more edges were threaded. */ bool -jump_thread_path_registry::thread_through_all_blocks - (bool may_peel_loop_headers) +jt_path_registry::thread_through_all_blocks (bool peel_loop_headers) { - bool retval = false; - unsigned int i; - auto_bitmap threaded_blocks; - hash_set<edge> visited_starting_edges; - - if (!m_paths.exists ()) - { - retval = false; - goto out; - } + if (m_paths.length () == 0) + return false; m_num_threaded_edges = 0; - /* Remove any paths that referenced removed edges. */ - if (m_removed_edges) - for (i = 0; i < m_paths.length (); ) - { - unsigned int j; - vec<jump_thread_edge *> *path = m_paths[i]; + bool retval = update_cfg (peel_loop_headers); - for (j = 0; j < path->length (); j++) - { - edge e = (*path)[j]->e; - if (m_removed_edges->find_slot (e, NO_INSERT)) - break; - } + statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - if (j != path->length ()) - { - cancel_thread (path, "Thread references removed edge"); - m_paths.unordered_remove (i); - continue; - } - i++; - } + if (retval) + { + loops_state_set (LOOPS_NEED_FIXUP); + return true; + } + return false; +} - /* Jump-thread all FSM threads before other jump-threads. */ - for (i = 0; i < m_paths.length ();) +/* This is the backward threader version of thread_through_all_blocks + using a generic BB copier. */ + +bool +back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) +{ + bool retval = false; + hash_set<edge> visited_starting_edges; + + while (m_paths.length ()) { - vec<jump_thread_edge *> *path = m_paths[i]; + vec<jump_thread_edge *> *path = m_paths[0]; edge entry = (*path)[0]->e; - /* Only code-generate FSM jump-threads in this loop. */ - if ((*path)[0]->type != EDGE_FSM_THREAD) - { - i++; - continue; - } + gcc_checking_assert ((*path)[0]->type == EDGE_FSM_THREAD); /* Do not jump-thread twice from the same starting edge. @@ -2638,8 +2632,8 @@ jump_thread_path_registry::thread_through_all_blocks || !valid_jump_thread_path (path)) { /* Remove invalid FSM jump-thread paths. */ - cancel_thread (path, "Invalid FSM jump-thread path"); - m_paths.unordered_remove (i); + cancel_thread (path, "Avoiding threading twice from same edge"); + m_paths.unordered_remove (0); continue; } @@ -2650,7 +2644,7 @@ jump_thread_path_registry::thread_through_all_blocks for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - if (duplicate_thread_path (entry, exit, region, len - 1, i)) + if (duplicate_thread_path (entry, exit, region, len - 1, 0)) { /* We do not update dominance info. */ free_dominance_info (CDI_DOMINATORS); @@ -2660,27 +2654,44 @@ jump_thread_path_registry::thread_through_all_blocks } path->release (); - m_paths.unordered_remove (i); + m_paths.unordered_remove (0); free (region); } + return retval; +} - /* Remove from PATHS all the jump-threads starting with an edge already - jump-threaded. */ - for (i = 0; i < m_paths.length ();) - { - vec<jump_thread_edge *> *path = m_paths[i]; - edge entry = (*path)[0]->e; +/* This is the forward threader version of thread_through_all_blocks, + using a custom BB copier. */ - /* Do not jump-thread twice from the same block. */ - if (visited_starting_edges.contains (entry)) - { - cancel_thread (path, "Avoiding threading twice from same BB"); - m_paths.unordered_remove (i); - } - else +bool +fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) +{ + bool retval = false; + + /* Remove any paths that referenced removed edges. */ + if (m_removed_edges) + for (unsigned i = 0; i < m_paths.length (); ) + { + unsigned int j; + vec<jump_thread_edge *> *path = m_paths[i]; + + for (j = 0; j < path->length (); j++) + { + edge e = (*path)[j]->e; + if (m_removed_edges->find_slot (e, NO_INSERT)) + break; + } + + if (j != path->length ()) + { + cancel_thread (path, "Thread references removed edge"); + m_paths.unordered_remove (i); + continue; + } i++; - } + } + auto_bitmap threaded_blocks; mark_threaded_blocks (threaded_blocks); initialize_original_copy_tables (); @@ -2737,16 +2748,8 @@ jump_thread_path_registry::thread_through_all_blocks gcc_assert (e->aux == NULL); } - statistics_counter_event (cfun, "Jumps threaded", m_num_threaded_edges); - free_original_copy_tables (); - m_paths.release (); - - if (retval) - loops_state_set (LOOPS_NEED_FIXUP); - - out: return retval; } @@ -2761,7 +2764,7 @@ jump_thread_path_registry::thread_through_all_blocks Return TRUE if PATH was successfully threaded. */ bool -jump_thread_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) +jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) { if (!dbg_cnt (registered_jump_thread)) { diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 2030bda15af..58e3a38e0c5 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -54,49 +54,65 @@ private: obstack m_obstack; }; -// This is the underlying jump thread registry. When all candidates -// have been registered with register_jump_thread(), -// thread_through_all_blocks() is called to actually change the CFG. +// Abstract class for the jump thread registry. +// +// When all candidates have been registered with +// register_jump_thread(), thread_through_all_blocks() is called to +// update the CFG. -class jump_thread_path_registry +class jt_path_registry { public: - jump_thread_path_registry (); - ~jump_thread_path_registry (); + jt_path_registry (); + virtual ~jt_path_registry (); bool register_jump_thread (vec<jump_thread_edge *> *); - void remove_jump_threads_including (edge); - bool thread_through_all_blocks (bool); + bool thread_through_all_blocks (bool peel_loop_headers); jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t); vec<jump_thread_edge *> *allocate_thread_path (); - void dump (); + void debug (); +protected: + void debug_path (FILE *, int pathno); + vec<vec<jump_thread_edge *> *> m_paths; + unsigned long m_num_threaded_edges; +private: + virtual bool update_cfg (bool peel_loop_headers) = 0; + jump_thread_path_allocator m_allocator; + DISABLE_COPY_AND_ASSIGN (jt_path_registry); +}; + +// Forward threader path registry using a custom BB copier. +class fwd_jt_path_registry : public jt_path_registry +{ +public: + fwd_jt_path_registry (); + ~fwd_jt_path_registry (); + void remove_jump_threads_including (edge); private: - void debug_path (FILE *, int pathno); + bool update_cfg (bool peel_loop_headers) override; void mark_threaded_blocks (bitmap threaded_blocks); - bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); - void adjust_paths_after_duplication (unsigned curr_path_num); - bool duplicate_thread_path (edge entry, - edge exit, - basic_block *region, - unsigned n_region, - unsigned current_path_no); bool thread_block_1 (basic_block, bool noloop_only, bool joiners); bool thread_block (basic_block, bool noloop_only); bool thread_through_loop_header (class loop *loop, bool may_peel_loop_headers); class redirection_data *lookup_redirection_data (edge e, enum insert_option); - vec<vec<jump_thread_edge *> *> m_paths; - hash_table<struct removed_edges> *m_removed_edges; // Main data structure to hold information for duplicates of BB. hash_table<redirection_data> *m_redirection_data; +}; - // Jump threading statistics. - unsigned long m_num_threaded_edges; +// Backward threader path registry using a generic BB copier. - jump_thread_path_allocator m_allocator; +class back_jt_path_registry : public jt_path_registry +{ +private: + bool update_cfg (bool peel_loop_headers) override; + void adjust_paths_after_duplication (unsigned curr_path_num); + bool duplicate_thread_path (edge entry, edge exit, basic_block *region, + unsigned n_region, unsigned current_path_no); + bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num); }; // Rather than search all the edges in jump thread paths each time DOM </cut>

4 years, 4 months

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig Culprit: <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> </cut> Results regressed to (for first_bad == c3496da580b0fc10fdeba8f6a5e6aef4c78b5598) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # First few build errors in logs: from (for last_good == a9e7c3cedc2914f63cd135b75832b9bf850af782) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29873 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 cd investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 ../artifacts/test.sh # Reproduce last_good build git checkout --detach a9e7c3cedc2914f63cd135b75832b9bf850af782 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… Full commit (up to 1000 lines): <cut> commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598 Author: Slark Xiao <slark_xiao(a)163.com> Date: Tue Aug 31 10:40:25 2021 +0800 net: Add depends on OF_NET for LiteX's LiteETH Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao(a)163.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- drivers/net/ethernet/litex/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/litex/Kconfig b/drivers/net/ethernet/litex/Kconfig index 265dba414b41..63bf01d28f0c 100644 --- a/drivers/net/ethernet/litex/Kconfig +++ b/drivers/net/ethernet/litex/Kconfig @@ -17,6 +17,7 @@ if NET_VENDOR_LITEX config LITEX_LITEETH tristate "LiteX Ethernet support" + depends on OF_NET help If you wish to compile a kernel for hardware with a LiteX LiteEth device then you should answer Y to this. </cut>

4 years, 4 months

[TCWG CI] Regression caused by llvm:09507b53250dc266632c204558cb1c2b56e8ddea

by ci_notify＠linaro.org

Identified regression caused by *llvm:09507b53250dc266632c204558cb1c2b56e8ddea*: commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> [AArch64][SME] Disable NEON in streaming mode Results regressed to (for first_bad == 09507b53250dc266632c204558cb1c2b56e8ddea) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-09507b53250dc266632c204558cb1c2b56e8ddea/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 103 from (for last_good == 93c55d5ea24b8f455b0621bac373f142e0008739) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-93c55d5ea24b8f455b0621bac373f142e0008739/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea cd investigate-llvm-09507b53250dc266632c204558cb1c2b56e8ddea # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 09507b53250dc266632c204558cb1c2b56e8ddea ../artifacts/test.sh # Reproduce last_good build git checkout --detach 93c55d5ea24b8f455b0621bac373f142e0008739 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 09507b53250dc266632c204558cb1c2b56e8ddea Author: Cullen Rhodes <cullen.rhodes(a)arm.com> Date: Mon Aug 16 07:31:55 2021 +0000 [AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902 --- llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp | 11 ++++++++++- llvm/test/MC/AArch64/SME/streaming-sve-feature.s | 8 ++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp index 3c2df1621e11..987cabce6cc9 100644 --- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp +++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp @@ -57,7 +57,16 @@ createAArch64MCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) { CPU = "apple-a12"; } - return createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + // Most of the NEON instruction set isn't supported in streaming mode on SME + // targets, disable NEON unless explicitly requested. + bool RequestedNEON = FS.contains("neon"); + bool RequestedStreamingSVE = FS.contains("streaming-sve"); + MCSubtargetInfo *STI = + createAArch64MCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + if (RequestedStreamingSVE && !RequestedNEON && + STI->hasFeature(AArch64::FeatureNEON)) + STI->ToggleFeature(AArch64::FeatureNEON); + return STI; } void AArch64_MC::initLLVMToCVRegMapping(MCRegisterInfo *MRI) { diff --git a/llvm/test/MC/AArch64/SME/streaming-sve-feature.s b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s new file mode 100644 index 000000000000..e35505ca39c5 --- /dev/null +++ b/llvm/test/MC/AArch64/SME/streaming-sve-feature.s @@ -0,0 +1,8 @@ +// RUN: llvm-mc -triple=aarch64 -mattr=+streaming-sve,+neon < %s 2>&1 | FileCheck %s +// RUN: not llvm-mc -triple=aarch64 -mattr=+streaming-sve < %s 2>&1 | FileCheck %s --check-prefix=CHECK-ERROR + +// Verify NEON is disabled when targeting streaming mode, if it's not +// explicitly requested. +add v0.8b, v1.8b, v2.8b +// CHECK: add v0.8b, v1.8b, v2.8b +// CHECK-ERROR: error: instruction requires: neon </cut>

4 years, 4 months

[TCWG CI] Regression caused by llvm:4389a413e2129d7d55ee779638b649aa852b6f8a

by ci_notify＠linaro.org

Identified regression caused by *llvm:4389a413e2129d7d55ee779638b649aa852b6f8a*: commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 470.lbm,lbm_base.default regressed by 104 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Oz_LTO artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

4 years, 4 months

[TCWG CI] Regression caused by gcc:01b5038718056b024b370b74a874fbd92c5bbab3

by ci_notify＠linaro.org

Identified regression caused by *gcc:01b5038718056b024b370b74a874fbd92c5bbab3*: commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Disable threading through latches until after loop optimizations. Results regressed to (for first_bad == 01b5038718056b024b370b74a874fbd92c5bbab3) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-01b5038718056b024b370b74a874fbd92c5bbab3/results_id: 1 # 459.GemsFDTD,GemsFDTD_base.default regressed by 102 # 464.h264ref,h264ref_base.default regressed by 102 from (for last_good == fb88bf9931f17d137eb50c001e1c924aa1e34e83) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -Os artifacts/build-fb88bf9931f17d137eb50c001e1c924aa1e34e83/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 cd investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 01b5038718056b024b370b74a874fbd92c5bbab3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach fb88bf9931f17d137eb50c001e1c924aa1e34e83 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 01b5038718056b024b370b74a874fbd92c5bbab3 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 9 20:30:28 2021 +0200 Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html gcc/ChangeLog: * tree-pass.h (PROP_loop_opts_done): New. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Intersect with global range. * tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Disable threading through latches until after loop optimizations have run. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of threading through latches. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. Co-authored-by: Michael Matz <matz(a)suse.de> --- gcc/gimple-range-path.cc | 3 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 ++--------------------- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +---------- gcc/tree-pass.h | 2 ++ gcc/tree-ssa-loop.c | 2 +- gcc/tree-ssa-threadbackward.c | 28 +++++++++++++++-- 7 files changed, 37 insertions(+), 56 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index a4fa3b296ff..c616b65756f 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt) basic_block bb = stmt ? gimple_bb (stmt) : exit_bb (); if (stmt && range_defined_in_block (r, name, bb)) { + if (TREE_CODE (name) == SSA_NAME) + r.intersect (gimple_range_global (name)); + set_cache (r, name); return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c index e1c33e86cd7..823ada982ff 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ +/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ void foo(); void bla(); @@ -26,4 +26,4 @@ void thread_latch_through_header (void) case. And we want to thread through the header as well. These are both caught by threading in DOM. */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c index c7bf867b084..ee46759bacc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -1,41 +1,8 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */ -/* All the threads in the thread1 dump start on a X->BB12 edge, as can - be seen in the dump: - - Registering FSM jump thread: (x, 12) incoming edge; ... - etc - etc - - Before the new evrp, we were threading paths that started at the - following edges: - - Registering FSM jump thread: (10, 12) incoming edge - Registering FSM jump thread: (6, 12) incoming edge - Registering FSM jump thread: (9, 12) incoming edge - - This was because the PHI at BB12 had constant values coming in from - BB10, BB6, and BB9: - - # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)> - - Now with the new evrp, we get: - - # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)> - - Thus, we have 3 more paths that are known to be constant and can be - threaded. Which means that by the second threading pass, we can - only find one profitable path. - - For the record, all these extra constants are better paths coming - out of switches. For example: - - SWITCH_BB -> BBx -> BBy -> BBz -> PHI - - We now know the value of the switch index at PHI. */ /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */ +/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */ int sum0, sum1, sum2, sum3; int foo (char *s, char **ret) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index 5fc2145a432..ba07942f9dd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,23 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* Here we have the same issue as was commented in ssa-dom-thread-6.c. - The PHI coming into the threader has a lot more constants, so the - threader can thread more paths. - -$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 -252c252 -< # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)> ---- -> # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)> -272a273 - - I spot checked a few and they all have the same pattern. We are - basically tracking the switch index better through multiple - paths. */ - /* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 83941bc0cee..eb75eb17951 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -225,6 +225,8 @@ protected: been optimized. */ #define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */ #define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */ +#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations + have completed. */ #define PROP_gimple \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 0cc4b3bbccf..1bbf2f1fb2c 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done = OPTGROUP_LOOP, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ - 0, /* properties_provided */ + PROP_loop_opts_done, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_cleanup_cfg, /* todo_flags_finish */ diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 449232c7715..e72992328de 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "ssa.h" #include "tree-cfgcleanup.h" #include "tree-pretty-print.h" +#include "cfghooks.h" // Path registry for the backwards threader. After all paths have been // registered with register_path(), thread_through_all_blocks() is called @@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers) TAKEN_EDGE, otherwise it is NULL. CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path - would create an irreducible loop. */ + would create an irreducible loop. + + ?? It seems we should be able to loosen some of the restrictions in + this function after loop optimizations have run. */ bool back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, @@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, the last entry in the array when determining if we thread through the loop latch. */ if (loop->latch == bb) - threaded_through_latch = true; + { + threaded_through_latch = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " (latch)"); + } } gimple *stmt = get_gimple_control_stmt (m_path[0]); @@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path, "a multiway branch.\n"); return false; } + + /* Threading through an empty latch would cause code to be added to + the latch. This could alter the loop form sufficiently to cause + loop optimizations to fail. Disable these threads until after + loop optimizations have run. */ + if ((threaded_through_latch + || (taken_edge && taken_edge->dest == loop->latch)) + && !(cfun->curr_properties & PROP_loop_opts_done) + && empty_block_p (loop->latch)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n"); + return false; + + } return true; } </cut>

4 years, 4 months

[TCWG CI] Regression caused by llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0

by ci_notify＠linaro.org

Identified regression caused by *llvm:50f4ae58eb136bc9d802cb98f02b6ff237eb61e0*: commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> [AArch64] Correct store ReadAdrBase operand Results regressed to (for first_bad == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 # 447.dealII,[.] _ZNK9MappingQ1ILi3EE12compute_fillERK12TriaIte regressed by 114 from (for last_good == 955c9437fd605216445fbd608de4ef1d96f825e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-955c9437fd605216445fbd608de4ef1d96f825e9/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 cd investigate-llvm-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 955c9437fd605216445fbd608de4ef1d96f825e9 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 Author: David Green <david.green(a)arm.com> Date: Mon Aug 23 21:07:55 2021 +0100 [AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287 --- llvm/lib/Target/AArch64/AArch64InstrFormats.td | 20 +- llvm/lib/Target/AArch64/AArch64SchedA53.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA55.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA57.td | 1 + llvm/lib/Target/AArch64/AArch64SchedA64FX.td | 1 + llvm/lib/Target/AArch64/AArch64SchedCyclone.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM3.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM4.td | 1 + llvm/lib/Target/AArch64/AArch64SchedExynosM5.td | 1 + llvm/lib/Target/AArch64/AArch64SchedFalkor.td | 1 + llvm/lib/Target/AArch64/AArch64SchedKryo.td | 1 + llvm/lib/Target/AArch64/AArch64SchedTSV110.td | 1 + llvm/lib/Target/AArch64/AArch64SchedThunderX.td | 1 + .../lib/Target/AArch64/AArch64SchedThunderX2T99.td | 1 + .../Target/AArch64/AArch64SchedThunderX3T110.td | 1 + llvm/lib/Target/AArch64/AArch64Schedule.td | 1 + .../llvm-mca/AArch64/Cortex/A55-store-readadv.s | 246 ++++++++++----------- 17 files changed, 148 insertions(+), 133 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td index 10c6fcd5cacd..ea0c62d2045f 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td +++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td @@ -3482,7 +3482,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed8 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3492,7 +3492,7 @@ multiclass Store8RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed8 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend8:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3554,7 +3554,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed16 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3564,7 +3564,7 @@ multiclass Store16RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed16 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend16:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3626,7 +3626,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3636,7 +3636,7 @@ multiclass Store32RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3698,7 +3698,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3708,7 +3708,7 @@ multiclass Store64RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, [(storeop (Ty regtype:$Rt), (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))]>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } @@ -3768,7 +3768,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roW : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR32:$Rm, ro_Wextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b0; } @@ -3776,7 +3776,7 @@ multiclass Store128RO<bits<2> sz, bit V, bits<2> opc, DAGOperand regtype, def roX : LoadStore128RO<sz, V, opc, regtype, asm, (outs), (ins regtype:$Rt, GPR64sp:$Rn, GPR64:$Rm, ro_Xextend128:$extend), []>, - Sched<[WriteSTIdx, ReadAdrBase]> { + Sched<[WriteSTIdx, ReadST, ReadAdrBase]> { let Inst{13} = 0b1; } diff --git a/llvm/lib/Target/AArch64/AArch64SchedA53.td b/llvm/lib/Target/AArch64/AArch64SchedA53.td index 65c84b1f39c0..3fef369a4e2b 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA53.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA53.td @@ -149,6 +149,7 @@ def A53WriteFSqrtDP : SchedWriteRes<[A53UnitFPMDS]> { let Latency = 32; // No forwarding for these reads. def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedA55.td b/llvm/lib/Target/AArch64/AArch64SchedA55.td index 0e680078c348..34d6fb5fb306 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA55.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA55.td @@ -182,6 +182,7 @@ def CortexA55WriteFSqrtDP : SchedWriteRes<[CortexA55UnitFPDIV]> { let Latency = def : ReadAdvance<ReadVLD, 0>; def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 1>; +def : ReadAdvance<ReadST, 1>; // ALU - ALU input operands are generally needed in EX1. An operand produced in // in say EX2 can be forwarded for consumption to ALU in EX1, thereby diff --git a/llvm/lib/Target/AArch64/AArch64SchedA57.td b/llvm/lib/Target/AArch64/AArch64SchedA57.td index c1eacca8cc1f..c9addac18ba7 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA57.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA57.td @@ -116,6 +116,7 @@ def : ReadAdvance<ReadIM, 0>; def : ReadAdvance<ReadIMA, 2, [WriteIM32, WriteIM64]>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td index 6df800487ce2..dc551364ed6f 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedA64FX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedA64FX.td @@ -761,6 +761,7 @@ def : ReadAdvance<ReadIMA, 0>; def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; +def : ReadAdvance<ReadST, 0>; def : ReadAdvance<ReadVLD, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td index 11df304a974c..310c240966f9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedCyclone.td +++ b/llvm/lib/Target/AArch64/AArch64SchedCyclone.td @@ -258,6 +258,7 @@ def CyReadAdrBase : SchedReadVariant<[ SchedVar<ScaledIdxPred, [ReadBaseRS]>, // Read base reg after shifting offset. SchedVar<NoSchedPred, [ReadDefault]>]>; // Read base reg with no shift. def : SchedAlias<ReadAdrBase, CyReadAdrBase>; // Map AArch64->Cyclone type. +def : ReadAdvance<ReadST, 0>; //--- // 7.8.9,7.8.11. Load/Store, paired diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td index 6a33258be02c..a96917c9364a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM3.td @@ -277,6 +277,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td index db066a19b0b6..8c5d6bbf0ceb 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM4.td @@ -581,6 +581,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td index 0429b6ab2ee2..64f88d719aa9 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td +++ b/llvm/lib/Target/AArch64/AArch64SchedExynosM5.td @@ -616,6 +616,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // Finer scheduling model. diff --git a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td index 8bb95e442249..8c40efd07e8a 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedFalkor.td +++ b/llvm/lib/Target/AArch64/AArch64SchedFalkor.td @@ -111,6 +111,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; // Detailed Refinements // ----------------------------------------------------------------------------- diff --git a/llvm/lib/Target/AArch64/AArch64SchedKryo.td b/llvm/lib/Target/AArch64/AArch64SchedKryo.td index 45964e1ed6de..f824ce462fe0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedKryo.td +++ b/llvm/lib/Target/AArch64/AArch64SchedKryo.td @@ -117,6 +117,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// diff --git a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td index 438371c1b6a8..4a1b5167e89d 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedTSV110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedTSV110.td @@ -113,6 +113,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; def : InstRW<[WriteI], (instrs COPY)>; diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td index 125eb284cfd1..f41f12733e69 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX.td @@ -192,6 +192,7 @@ def THXT8XWriteFSqrtDP : SchedWriteRes<[THXT8XUnitFPMDS]> { def : ReadAdvance<ReadExtrHi, 1>; def : ReadAdvance<ReadAdrBase, 2>; def : ReadAdvance<ReadVLD, 2>; +def : ReadAdvance<ReadST, 2>; // FIXME: This needs more targeted benchmarking. // ALU - Most operands in the ALU pipes are not needed for two cycles. Shiftable diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td index 8d8675b7ac6f..0da286e942a0 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX2T99.td @@ -362,6 +362,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td index 00838cc4b9bd..8f03be9be0dd 100644 --- a/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td +++ b/llvm/lib/Target/AArch64/AArch64SchedThunderX3T110.td @@ -621,6 +621,7 @@ def : ReadAdvance<ReadID, 0>; def : ReadAdvance<ReadExtrHi, 0>; def : ReadAdvance<ReadAdrBase, 0>; def : ReadAdvance<ReadVLD, 0>; +def : ReadAdvance<ReadST, 0>; //===----------------------------------------------------------------------===// // 3. Instruction Tables. diff --git a/llvm/lib/Target/AArch64/AArch64Schedule.td b/llvm/lib/Target/AArch64/AArch64Schedule.td index 49c0c1782236..4e5a67a3a394 100644 --- a/llvm/lib/Target/AArch64/AArch64Schedule.td +++ b/llvm/lib/Target/AArch64/AArch64Schedule.td @@ -47,6 +47,7 @@ def WriteAdr : SchedWrite; // Address pre/post increment. def WriteLDIdx : SchedWrite; // Load from a register index (maybe scaled). def WriteSTIdx : SchedWrite; // Store to a register index (maybe scaled). +def ReadST : SchedRead; // Read the stored value. def ReadAdrBase : SchedRead; // Read the base resister of a reg-offset LD/ST. // Serialized two-level address load. diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s index ff45caf46e21..ad49a96c27c5 100644 --- a/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s +++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/A55-store-readadv.s @@ -125,12 +125,12 @@ stp x0, x1, [x2], #16 # CHECK: Iterations: 100 # CHECK-NEXT: Instructions: 11800 -# CHECK-NEXT: Total Cycles: 20301 +# CHECK-NEXT: Total Cycles: 19801 # CHECK-NEXT: Total uOps: 14400 # CHECK: Dispatch Width: 2 -# CHECK-NEXT: uOps Per Cycle: 0.71 -# CHECK-NEXT: IPC: 0.58 +# CHECK-NEXT: uOps Per Cycle: 0.73 +# CHECK-NEXT: IPC: 0.60 # CHECK-NEXT: Block RThroughput: 72.0 # CHECK: Instruction Info: @@ -401,127 +401,127 @@ stp x0, x1, [x2], #16 # CHECK-NEXT: - - - - - - - - - - - 1.00 stp x0, x1, [x2], #16 # CHECK: Timeline view: -# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123 +# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 012345678 +# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 -# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] -# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! -# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 -# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 -# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! -# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] -# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 -# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! -# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] -# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 -# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! -# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] -# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 -# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! -# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] -# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 -# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! -# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] -# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 -# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! -# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] -# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 -# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! -# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] -# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 -# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! -# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] -# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] -# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] -# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] -# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] -# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] -# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] -# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] -# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] -# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] -# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] -# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] -# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] -# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] -# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] -# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] -# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] -# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] -# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] -# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! -# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 -# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] -# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! -# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 -# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 -# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 -# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] -# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 -# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! -# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 -# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 -# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] -# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 -# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! -# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 -# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 +# CHECK: [0,0] DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,1] . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16] +# CHECK-NEXT: [0,2] . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,3] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2, #16]! +# CHECK-NEXT: [0,4] . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,5] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str b0, [x2], #16 +# CHECK-NEXT: [0,6] . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,7] . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2], #16 +# CHECK-NEXT: [0,8] . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,9] . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16]! +# CHECK-NEXT: [0,10] . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,11] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str d0, [x2, #16] +# CHECK-NEXT: [0,12] . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,13] . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2], #16 +# CHECK-NEXT: [0,14] . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,15] . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16]! +# CHECK-NEXT: [0,16] . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,17] . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str h0, [x2, #16] +# CHECK-NEXT: [0,18] . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,19] . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2], #16 +# CHECK-NEXT: [0,20] . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,21] . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16]! +# CHECK-NEXT: [0,22] . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,23] . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str q0, [x2, #16] +# CHECK-NEXT: [0,24] . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,25] . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2], #16 +# CHECK-NEXT: [0,26] . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,27] . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16]! +# CHECK-NEXT: [0,28] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,29] . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . str s0, [x2, #16] +# CHECK-NEXT: [0,30] . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,31] . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2], #16 +# CHECK-NEXT: [0,32] . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,33] . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16]! +# CHECK-NEXT: [0,34] . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,35] . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . str w0, [x2, #16] +# CHECK-NEXT: [0,36] . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,37] . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2], #16 +# CHECK-NEXT: [0,38] . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,39] . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16]! +# CHECK-NEXT: [0,40] . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,41] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . . str x0, [x2, #16] +# CHECK-NEXT: [0,42] . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,43] . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2], #16 +# CHECK-NEXT: [0,44] . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,45] . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16]! +# CHECK-NEXT: [0,46] . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,47] . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . . . . . strb w0, [x2, #16] +# CHECK-NEXT: [0,48] . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,49] . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . . . . strh w0, [x2], #16 +# CHECK-NEXT: [0,50] . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,51] . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16]! +# CHECK-NEXT: [0,52] . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,53] . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . strh w0, [x2, #16] +# CHECK-NEXT: [0,54] . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,55] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . . str d0, [x2, x2, lsl #3] +# CHECK-NEXT: [0,56] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,57] . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . . str q0, [x2, w0, sxtw] +# CHECK-NEXT: [0,58] . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,59] . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . . . . str w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,60] . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,61] . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . . . str x0, [x2, w0, sxtw] +# CHECK-NEXT: [0,62] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,63] . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strb w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,64] . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,65] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . . strh w0, [x2, w0, sxtw] +# CHECK-NEXT: [0,66] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,67] . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . . stur b0, [x2, #255] +# CHECK-NEXT: [0,68] . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,69] . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . . stur d0, [x2, #255] +# CHECK-NEXT: [0,70] . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,71] . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . stur h0, [x2, #255] +# CHECK-NEXT: [0,72] . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,73] . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur q0, [x2, #255] +# CHECK-NEXT: [0,74] . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,75] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . stur s0, [x2, #255] +# CHECK-NEXT: [0,76] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,77] . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . stur w0, [x2, #255] +# CHECK-NEXT: [0,78] . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,79] . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . sturb w0, [x2, #255] +# CHECK-NEXT: [0,80] . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,81] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . sturh w0, [x2, #255] +# CHECK-NEXT: [0,82] . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp d0, d1, [x2, #16] +# CHECK-NEXT: [0,84] . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . stnp q0, q1, [x2, #16] +# CHECK-NEXT: [0,86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . stnp s0, s1, [x2, #16] +# CHECK-NEXT: [0,90] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,91] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . stnp w0, w1, [x2, #16] +# CHECK-NEXT: [0,92] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,93] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stnp x0, x1, [x2, #16] +# CHECK-NEXT: [0,94] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . stp d0, d1, [x2, #16] +# CHECK-NEXT: [0,96] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,97] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . stp d0, d1, [x2, #16]! +# CHECK-NEXT: [0,98] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,99] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . stp d0, d1, [x2], #16 +# CHECK-NEXT: [0,100] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,101] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . stp q0, q1, [x2, #16] +# CHECK-NEXT: [0,102] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,103] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . stp q0, q1, [x2, #16]! +# CHECK-NEXT: [0,104] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . . add x2, x3, #1 +# CHECK-NEXT: [0,105] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . stp q0, q1, [x2], #16 +# CHECK-NEXT: [0,106] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DeeE. . . . . add x2, x3, #1 +# CHECK-NEXT: [0,107] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16] +# CHECK-NEXT: [0,108] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . . add x2, x3, #1 +# CHECK-NEXT: [0,109] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . stp s0, s1, [x2, #16]! +# CHECK-NEXT: [0,110] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . . add x2, x3, #1 +# CHECK-NEXT: [0,111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . stp s0, s1, [x2], #16 +# CHECK-NEXT: [0,112] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,113] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . stp x0, x1, [x2, #16] +# CHECK-NEXT: [0,114] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE . . add x2, x3, #1 +# CHECK-NEXT: [0,115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . stp x0, x1, [x2, #16]! +# CHECK-NEXT: [0,116] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeE. add x2, x3, #1 +# CHECK-NEXT: [0,117] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE stp x0, x1, [x2], #16 # CHECK: Average Wait times (based on the timeline view): # CHECK-NEXT: [0]: Executions </cut>

4 years, 4 months

[TCWG CI] Regression caused by llvm:a26f1bf67ec70f72e64101cf483b26466928fc38

by ci_notify＠linaro.org

Identified regression caused by *llvm:a26f1bf67ec70f72e64101cf483b26466928fc38*: commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> [PassManager] Run additional LICM before LoopRotate Results regressed to (for first_bad == a26f1bf67ec70f72e64101cf483b26466928fc38) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-a26f1bf67ec70f72e64101cf483b26466928fc38/results_id: 1 # 447.dealII,[.] SparseMatrix<double>::vmult<Vector<double>. Ve regressed by 111 from (for last_good == bb1e5399e4586239d6424f5eea5a9f06c52ebe9b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os_mthumb artifacts/build-bb1e5399e4586239d6424f5eea5a9f06c52ebe9b/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 cd investigate-llvm-a26f1bf67ec70f72e64101cf483b26466928fc38 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a26f1bf67ec70f72e64101cf483b26466928fc38 ../artifacts/test.sh # Reproduce last_good build git checkout --detach bb1e5399e4586239d6424f5eea5a9f06c52ebe9b ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a26f1bf67ec70f72e64101cf483b26466928fc38 Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Fri Apr 2 10:40:12 2021 +0300 [PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/atta… Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b9… But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e… but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249 --- llvm/lib/Passes/PassBuilder.cpp | 10 +++ llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | 4 + llvm/test/CodeGen/AMDGPU/opt-pipeline.ll | 30 +++++--- llvm/test/Other/new-pm-defaults.ll | 7 +- llvm/test/Other/new-pm-thinlto-defaults.ll | 7 +- .../Other/new-pm-thinlto-postlink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-postlink-samplepgo-defaults.ll | 7 +- .../Other/new-pm-thinlto-prelink-pgo-defaults.ll | 9 ++- .../new-pm-thinlto-prelink-samplepgo-defaults.ll | 5 +- llvm/test/Other/opt-O2-pipeline.ll | 10 ++- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | 10 ++- llvm/test/Other/opt-O3-pipeline.ll | 10 ++- llvm/test/Other/opt-Os-pipeline.ll | 10 ++- llvm/test/Other/pass-pipelines.ll | 3 + llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll | 25 ++++--- .../PhaseOrdering/X86/spurious-peeling.ll | 87 +++++++++------------- llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll | 78 +++++++++---------- .../loop-rotation-vs-common-code-hoisting.ll | 22 +++--- 18 files changed, 193 insertions(+), 150 deletions(-) diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 3a325277e370..5a2285215769 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true, isLTOPreLink(Phase))); // TODO: Investigate promotion cap for O1. @@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, LPM1.addPass(LoopInstSimplifyPass()); LPM1.addPass(LoopSimplifyCFGPass()); + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap)); + // Disable header duplication in loop rotation at -Oz. LPM1.addPass( LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); diff --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp index 109e7c97ff1b..2c80a16febef 100644 --- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses( MPM.add(createLoopInstSimplifyPass()); MPM.add(createLoopSimplifyCFGPass()); } + // Try to remove as much code from the loop header as possible, + // to reduce amount of IR that will have to be duplicated. + // TODO: Investigate promotion cap for O1. + MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap)); // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO)); // TODO: Investigate promotion cap for O1. diff --git a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll index 34e5e6c647da..5e33d968c710 100644 --- a/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll +++ b/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll @@ -129,16 +129,20 @@ ; GCN-O1-NEXT: Simplify the CFG ; GCN-O1-NEXT: Reassociate expressions ; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O1-NEXT: Function Alias Analysis Results +; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: Canonicalize natural loops ; GCN-O1-NEXT: LCSSA Verifier ; GCN-O1-NEXT: Loop-Closed SSA Form Pass -; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O1-NEXT: Function Alias Analysis Results ; GCN-O1-NEXT: Scalar Evolution Analysis +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Loop Pass Manager +; GCN-O1-NEXT: Loop Invariant Code Motion ; GCN-O1-NEXT: Loop Pass Manager ; GCN-O1-NEXT: Rotate Loops -; GCN-O1-NEXT: Memory SSA ; GCN-O1-NEXT: Lazy Branch Probability Analysis ; GCN-O1-NEXT: Lazy Block Frequency Analysis ; GCN-O1-NEXT: Loop Pass Manager @@ -451,16 +455,20 @@ ; GCN-O2-NEXT: Simplify the CFG ; GCN-O2-NEXT: Reassociate expressions ; GCN-O2-NEXT: Dominator Tree Construction +; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O2-NEXT: Function Alias Analysis Results +; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: Canonicalize natural loops ; GCN-O2-NEXT: LCSSA Verifier ; GCN-O2-NEXT: Loop-Closed SSA Form Pass -; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O2-NEXT: Function Alias Analysis Results ; GCN-O2-NEXT: Scalar Evolution Analysis +; GCN-O2-NEXT: Lazy Branch Probability Analysis +; GCN-O2-NEXT: Lazy Block Frequency Analysis +; GCN-O2-NEXT: Loop Pass Manager +; GCN-O2-NEXT: Loop Invariant Code Motion ; GCN-O2-NEXT: Loop Pass Manager ; GCN-O2-NEXT: Rotate Loops -; GCN-O2-NEXT: Memory SSA ; GCN-O2-NEXT: Lazy Branch Probability Analysis ; GCN-O2-NEXT: Lazy Block Frequency Analysis ; GCN-O2-NEXT: Loop Pass Manager @@ -810,16 +818,20 @@ ; GCN-O3-NEXT: Simplify the CFG ; GCN-O3-NEXT: Reassociate expressions ; GCN-O3-NEXT: Dominator Tree Construction +; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) +; GCN-O3-NEXT: Function Alias Analysis Results +; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Natural Loop Information ; GCN-O3-NEXT: Canonicalize natural loops ; GCN-O3-NEXT: LCSSA Verifier ; GCN-O3-NEXT: Loop-Closed SSA Form Pass -; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl) -; GCN-O3-NEXT: Function Alias Analysis Results ; GCN-O3-NEXT: Scalar Evolution Analysis +; GCN-O3-NEXT: Lazy Branch Probability Analysis +; GCN-O3-NEXT: Lazy Block Frequency Analysis +; GCN-O3-NEXT: Loop Pass Manager +; GCN-O3-NEXT: Loop Invariant Code Motion ; GCN-O3-NEXT: Loop Pass Manager ; GCN-O3-NEXT: Rotate Loops -; GCN-O3-NEXT: Memory SSA ; GCN-O3-NEXT: Lazy Branch Probability Analysis ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll index 01b02b8fd482..337a0857701c 100644 --- a/llvm/test/Other/new-pm-defaults.ll +++ b/llvm/test/Other/new-pm-defaults.ll @@ -113,9 +113,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}> ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -156,6 +156,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-defaults.ll b/llvm/test/Other/new-pm-thinlto-defaults.ll index fbf47de87eeb..bba43dd50e7a 100644 --- a/llvm/test/Other/new-pm-thinlto-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-defaults.ll @@ -98,9 +98,9 @@ ; CHECK-O-NEXT: Running analysis: CallGraphAnalysis ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis -; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy -; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis -; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy +; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy +; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis +; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy ; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy ; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass ; CHECK-O-NEXT: Starting CGSCC pass manager run. @@ -139,6 +139,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll index 4bcf70e15a5b..57f0e0da73b6 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll @@ -68,10 +68,10 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -112,6 +112,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll index 1071d28432b9..0e0e2854b8df 100644 --- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll @@ -78,9 +78,9 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy @@ -121,6 +121,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll index e2f1385cf52b..4cfb9825c97e 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll @@ -93,10 +93,10 @@ ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA -; CHECK-O-NEXT: Running analysis: CallGraphAnalysis -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running analysis: CallGraphAnalysis +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo @@ -158,6 +158,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll index d4dc552aea01..a05555c57003 100644 --- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll +++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll @@ -73,8 +73,8 @@ ; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass ; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis ; CHECK-O-NEXT: Starting {{.*}}Module pass manager run. -; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA -; CHECK-O-NEXT: Running analysis: GlobalsAA +; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA +; CHECK-O-NEXT: Running analysis: GlobalsAA ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis @@ -116,6 +116,7 @@ ; CHECK-O-NEXT: Starting Loop pass manager run. ; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass ; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass +; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: LoopRotatePass ; CHECK-O-NEXT: Running pass: LICM ; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass diff --git a/llvm/test/Other/opt-O2-pipeline.ll b/llvm/test/Other/opt-O2-pipeline.ll index f7217c122fdb..a3b01e5464d4 100644 --- a/llvm/test/Other/opt-O2-pipeline.ll +++ b/llvm/test/Other/opt-O2-pipeline.ll @@ -101,16 +101,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll index 6b98c1f80d9e..fafd5c8fdcb8 100644 --- a/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll +++ b/llvm/test/Other/opt-O3-pipeline-enable-matrix.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-O3-pipeline.ll b/llvm/test/Other/opt-O3-pipeline.ll index 00a1d61ac058..103d49bbbbab 100644 --- a/llvm/test/Other/opt-O3-pipeline.ll +++ b/llvm/test/Other/opt-O3-pipeline.ll @@ -106,16 +106,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/opt-Os-pipeline.ll b/llvm/test/Other/opt-Os-pipeline.ll index 21f9b8c6009e..508c21edbc68 100644 --- a/llvm/test/Other/opt-Os-pipeline.ll +++ b/llvm/test/Other/opt-Os-pipeline.ll @@ -87,16 +87,20 @@ ; CHECK-NEXT: Simplify the CFG ; CHECK-NEXT: Reassociate expressions ; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) +; CHECK-NEXT: Function Alias Analysis Results +; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Canonicalize natural loops ; CHECK-NEXT: LCSSA Verifier ; CHECK-NEXT: Loop-Closed SSA Form Pass -; CHECK-NEXT: Basic Alias Analysis (stateless AA impl) -; CHECK-NEXT: Function Alias Analysis Results ; CHECK-NEXT: Scalar Evolution Analysis +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Loop Pass Manager +; CHECK-NEXT: Loop Invariant Code Motion ; CHECK-NEXT: Loop Pass Manager ; CHECK-NEXT: Rotate Loops -; CHECK-NEXT: Memory SSA ; CHECK-NEXT: Lazy Branch Probability Analysis ; CHECK-NEXT: Lazy Block Frequency Analysis ; CHECK-NEXT: Loop Pass Manager diff --git a/llvm/test/Other/pass-pipelines.ll b/llvm/test/Other/pass-pipelines.ll index ccd364d5d740..768e8343529e 100644 --- a/llvm/test/Other/pass-pipelines.ll +++ b/llvm/test/Other/pass-pipelines.ll @@ -53,6 +53,9 @@ ; CHECK-O2-NEXT: FunctionPass Manager ; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager +; CHECK-O2: Loop Pass Manager +; CHECK-O2-NOT: Manager ; CHECK-O2: Loop Pass Manager ; CHECK-O2-NOT: Manager ; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll index 82deee9f367b..8f43029fa303 100644 --- a/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll +++ b/llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll @@ -22,30 +22,33 @@ define dso_local i32 @main() { ; CHECK-NEXT: bb: ; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4 ; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4 -; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4 -; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]] +; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4 +; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]] ; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0 -; CHECK-NEXT: br label [[BB1:%.*]] -; CHECK: bb1: -; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]] -; CHECK: bb19.preheader: +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]] +; CHECK: bb27.preheader: ; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]] ; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4 ; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0 -; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]] -; CHECK: bb13.preheader.bb27.thread.split_crit_edge: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 +; CHECK-NEXT: br label [[BB27:%.*]] +; CHECK: bb27.thread: ; CHECK-NEXT: store i32 0, i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 0, i32* @c, align 4 ; CHECK-NEXT: br label [[BB32:%.*]] +; CHECK: bb27: +; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]] ; CHECK: bb32.loopexit: -; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4 +; CHECK-NEXT: store i32 -1, i32* @f, align 4 ; CHECK-NEXT: br label [[BB32]] ; CHECK: bb32: -; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ] +; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ] ; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4 ; CHECK-NEXT: ret i32 0 +; CHECK: bb36: +; CHECK-NEXT: store i32 1, i32* @c, align 4 +; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]] ; bb: %i = alloca i32, align 4 diff --git a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll index 3e659414d982..4661bd8a36cc 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll @@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; OLDPM-NEXT: entry: ; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; OLDPM: for.body7.lr.ph.i: ; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] -; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]] -; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]] +; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]] +; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]] +; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]] +; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; OLDPM: for.body7.i: -; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ] -; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]] +; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]] -; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1 +; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 ; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] ; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; OLDPM: _ZN12FloatVecPair6vecIncEv.exit: @@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV ; NEWPM-NEXT: entry: ; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0 ; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]] -; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 -; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]] -; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 -; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] +; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1 +; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]] +; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0 +; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]] ; NEWPM: for.body7.lr.ph.i: ; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0 -; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] -; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]] -; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef -; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 -; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]] -; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef -; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]] -; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]] -; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1 -; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]] -; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1 +; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]] +; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef +; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]] +; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0 +; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]] +; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef +; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]] ; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]] ; NEWPM: for.body7.i: -; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ] -; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]] +; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ] +; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]] +; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]] ; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]] -; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]] -; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]] -; NEWPM: for.body7.i.for.body7.i_crit_edge: -; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1 -; NEWPM-NEXT: br label [[FOR_BODY7_I]] +; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1 +; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]] +; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]] ; NEWPM: _ZN12FloatVecPair6vecIncEv.exit: ; NEWPM-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll index 280f849dbb35..8b8b535f1a77 100644 --- a/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll +++ b/llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll @@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-LABEL: @vdiv( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0 -; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]] -; CHECK: for.body.lr.ph: +; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 ; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 -; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]] ; CHECK: vector.memcheck: ; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]] ; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]] ; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]] ; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] -; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]] ; CHECK: vector.ph: ; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292 ; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0 @@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7 ; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]] ; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]] ; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4 ; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]] ; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]] ; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8 ; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]] ; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]] ; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12 ; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]] ; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]] ; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16 ; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4 ; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0 -; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] ; CHECK: middle.block.unr-lcssa: ; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0 @@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 { ; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ] ; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7 +; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7 ; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]] ; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]] ; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>* -; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7 +; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7 ; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4 ; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1 ; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]] -; CHECK: for.body.preheader: -; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]] +; CHECK: for.body.preheader8: +; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1 ; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 -; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0 -; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] +; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3 +; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0 +; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]] ; CHECK: for.body.prol.preheader: ; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]] ; CHECK: for.body.prol: ; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ] -; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ] +; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ] ; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]] ; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]] -; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1 ; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1 ; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0 -; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]] ; CHECK: for.body.prol.loopexit: -; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] +; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ] ; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3 -; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]] -; CHECK: for.body.preheader.new: +; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]] +; CHECK: for.body.preheader8.new: ; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]] ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]] -; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1 ; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]] ; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]] -; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2 ; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]] ; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]] -; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3 ; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]] +; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]] ; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]] -; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]] +; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]] ; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4 ; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]] -; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]] +; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]] ; CHECK: for.end: ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll index 7d7d18a5247d..bb320af193e3 100644 --- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll +++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll @@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_OLDPM: for.cond.preheader: +; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_OLDPM: for.body.preheader: ; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]] +; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_OLDPM: for.cond.cleanup: ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f2() ; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_OLDPM: for.body: -; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ] +; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_OLDPM-NEXT: tail call void @f0() ; ROTATED_LATER_OLDPM-NEXT: tail call void @f1() -; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1 +; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1 ; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]] ; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] ; ROTATED_LATER_OLDPM: return: @@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) { ; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1 ; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]] ; ROTATED_LATER_NEWPM: for.cond.preheader: +; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1 +; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]] +; ROTATED_LATER_NEWPM: for.body.preheader: ; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1 -; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0 -; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]] -; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]] ; ROTATED_LATER_NEWPM: for.cond.cleanup: ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f2() ; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]] ; ROTATED_LATER_NEWPM: for.body: -; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ] +; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ] ; ROTATED_LATER_NEWPM-NEXT: tail call void @f0() ; ROTATED_LATER_NEWPM-NEXT: tail call void @f1() ; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]] ; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge: -; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1 +; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1 ; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]] ; ROTATED_LATER_NEWPM: return: ; ROTATED_LATER_NEWPM-NEXT: ret void </cut>

4 years, 4 months

[ACTIVITY] report week ending 10 Sep

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Respin of a linux-user cleanup patchset + Code review, as usual * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Working on version 2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. -- PMM

4 years, 5 months

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-aarch64-lts-allyesconfig - Build # 13 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_kernel/gnu-master-aarch64-lts-allyesconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/gnu-master-aarch64-lts-allyesconfig Culprit: <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. </cut> Results regressed to (for first_bad == a25e0b5e6ac8a77a71c229e0a7b744603365b0e9) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 29 # First few build errors in logs: from (for last_good == 5fe0865ab788bdc387b284a3ad57e5a95a767b18) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19270 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 cd investigate-gcc-a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5fe0865ab788bdc387b284a3ad57e5a95a767b18 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Build log: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-lts-all… Full commit (up to 1000 lines): <cut> commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Author: qing zhao <qing.zhao(a)oracle.com> Date: Thu Sep 9 15:44:49 2021 -0700 Add -ftrivial-auto-var-init option and uninitialized variable attribute. Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit initializer as uninitialized, -Wuninitialized will still report warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. You can control this behavior for a specific variable by using the variable attribute "uninitialized" to control runtime overhead. gcc/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * builtins.c (expand_builtin_memset): Make external visible. * builtins.h (expand_builtin_memset): Declare extern. * common.opt (ftrivial-auto-var-init=): New option. * doc/extend.texi: Document the uninitialized attribute. * doc/invoke.texi: Document -ftrivial-auto-var-init. * flag-types.h (enum auto_init_type): New enumerated type auto_init_type. * gimple-fold.c (clear_padding_type): Add one new parameter. (clear_padding_union): Likewise. (clear_padding_emit_loop): Likewise. (clear_type_padding_in_mask): Likewise. (gimple_fold_builtin_clear_padding): Handle this new parameter. * gimplify.c (gimple_add_init_for_auto_var): New function. (gimple_add_padding_init_for_auto_var): New function. (is_var_need_auto_init): New function. (gimplify_decl_expr): Add initialization to automatic variables per users' requests. (gimplify_call_expr): Add one new parameter for call to __builtin_clear_padding. (gimplify_init_constructor): Add padding initialization in the end. * internal-fn.c (INIT_PATTERN_VALUE): New macro. (expand_DEFERRED_INIT): New function. * internal-fn.def (DEFERRED_INIT): New internal function. * tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT. * tree-sra.c (generate_subtree_deferred_init): New function. (scan_function): Avoid setting cannot_scalarize_away_bitmap for calls to .DEFERRED_INIT. (sra_modify_deferred_init): New function. (sra_modify_function_body): Handle calls to DEFERRED_INIT specially. * tree-ssa-structalias.c (find_func_aliases_for_call): Likewise. * tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT specially. (check_defs): Likewise. (warn_uninitialized_vars): Likewise. * tree-ssa.c (ssa_undefined_value_p): Likewise. * tree.c (build_common_builtin_nodes): Build tree node for BUILT_IN_CLEAR_PADDING when needed. gcc/c-family/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-attribs.c (handle_uninitialized_attribute): New function. (c_common_attribute_table): Add "uninitialized" attribute. gcc/testsuite/ChangeLog: 2021-09-09 qing zhao <qing.zhao(a)oracle.com> * c-c++-common/auto-init-1.c: New test. * c-c++-common/auto-init-10.c: New test. * c-c++-common/auto-init-11.c: New test. * c-c++-common/auto-init-12.c: New test. * c-c++-common/auto-init-13.c: New test. * c-c++-common/auto-init-14.c: New test. * c-c++-common/auto-init-15.c: New test. * c-c++-common/auto-init-16.c: New test. * c-c++-common/auto-init-2.c: New test. * c-c++-common/auto-init-3.c: New test. * c-c++-common/auto-init-4.c: New test. * c-c++-common/auto-init-5.c: New test. * c-c++-common/auto-init-6.c: New test. * c-c++-common/auto-init-7.c: New test. * c-c++-common/auto-init-8.c: New test. * c-c++-common/auto-init-9.c: New test. * c-c++-common/auto-init-esra.c: New test. * c-c++-common/auto-init-padding-1.c: New test. * c-c++-common/auto-init-padding-2.c: New test. * c-c++-common/auto-init-padding-3.c: New test. * g++.dg/auto-init-uninit-pred-1_a.C: New test. * g++.dg/auto-init-uninit-pred-2_a.C: New test. * g++.dg/auto-init-uninit-pred-3_a.C: New test. * g++.dg/auto-init-uninit-pred-4.C: New test. * gcc.dg/auto-init-sra-1.c: New test. * gcc.dg/auto-init-sra-2.c: New test. * gcc.dg/auto-init-uninit-1.c: New test. * gcc.dg/auto-init-uninit-12.c: New test. * gcc.dg/auto-init-uninit-13.c: New test. * gcc.dg/auto-init-uninit-14.c: New test. * gcc.dg/auto-init-uninit-15.c: New test. * gcc.dg/auto-init-uninit-16.c: New test. * gcc.dg/auto-init-uninit-17.c: New test. * gcc.dg/auto-init-uninit-18.c: New test. * gcc.dg/auto-init-uninit-19.c: New test. * gcc.dg/auto-init-uninit-2.c: New test. * gcc.dg/auto-init-uninit-20.c: New test. * gcc.dg/auto-init-uninit-21.c: New test. * gcc.dg/auto-init-uninit-22.c: New test. * gcc.dg/auto-init-uninit-23.c: New test. * gcc.dg/auto-init-uninit-24.c: New test. * gcc.dg/auto-init-uninit-25.c: New test. * gcc.dg/auto-init-uninit-26.c: New test. * gcc.dg/auto-init-uninit-3.c: New test. * gcc.dg/auto-init-uninit-34.c: New test. * gcc.dg/auto-init-uninit-36.c: New test. * gcc.dg/auto-init-uninit-37.c: New test. * gcc.dg/auto-init-uninit-4.c: New test. * gcc.dg/auto-init-uninit-5.c: New test. * gcc.dg/auto-init-uninit-6.c: New test. * gcc.dg/auto-init-uninit-8.c: New test. * gcc.dg/auto-init-uninit-9.c: New test. * gcc.dg/auto-init-uninit-A.c: New test. * gcc.dg/auto-init-uninit-B.c: New test. * gcc.dg/auto-init-uninit-C.c: New test. * gcc.dg/auto-init-uninit-H.c: New test. * gcc.dg/auto-init-uninit-I.c: New test. * gcc.target/aarch64/auto-init-1.c: New test. * gcc.target/aarch64/auto-init-2.c: New test. * gcc.target/aarch64/auto-init-3.c: New test. * gcc.target/aarch64/auto-init-4.c: New test. * gcc.target/aarch64/auto-init-5.c: New test. * gcc.target/aarch64/auto-init-6.c: New test. * gcc.target/aarch64/auto-init-7.c: New test. * gcc.target/aarch64/auto-init-8.c: New test. * gcc.target/aarch64/auto-init-padding-1.c: New test. * gcc.target/aarch64/auto-init-padding-10.c: New test. * gcc.target/aarch64/auto-init-padding-11.c: New test. * gcc.target/aarch64/auto-init-padding-12.c: New test. * gcc.target/aarch64/auto-init-padding-2.c: New test. * gcc.target/aarch64/auto-init-padding-3.c: New test. * gcc.target/aarch64/auto-init-padding-4.c: New test. * gcc.target/aarch64/auto-init-padding-5.c: New test. * gcc.target/aarch64/auto-init-padding-6.c: New test. * gcc.target/aarch64/auto-init-padding-7.c: New test. * gcc.target/aarch64/auto-init-padding-8.c: New test. * gcc.target/aarch64/auto-init-padding-9.c: New test. * gcc.target/i386/auto-init-1.c: New test. * gcc.target/i386/auto-init-2.c: New test. * gcc.target/i386/auto-init-21.c: New test. * gcc.target/i386/auto-init-22.c: New test. * gcc.target/i386/auto-init-23.c: New test. * gcc.target/i386/auto-init-24.c: New test. * gcc.target/i386/auto-init-3.c: New test. * gcc.target/i386/auto-init-4.c: New test. * gcc.target/i386/auto-init-5.c: New test. * gcc.target/i386/auto-init-6.c: New test. * gcc.target/i386/auto-init-7.c: New test. * gcc.target/i386/auto-init-8.c: New test. * gcc.target/i386/auto-init-padding-1.c: New test. * gcc.target/i386/auto-init-padding-10.c: New test. * gcc.target/i386/auto-init-padding-11.c: New test. * gcc.target/i386/auto-init-padding-12.c: New test. * gcc.target/i386/auto-init-padding-2.c: New test. * gcc.target/i386/auto-init-padding-3.c: New test. * gcc.target/i386/auto-init-padding-4.c: New test. * gcc.target/i386/auto-init-padding-5.c: New test. * gcc.target/i386/auto-init-padding-6.c: New test. * gcc.target/i386/auto-init-padding-7.c: New test. * gcc.target/i386/auto-init-padding-8.c: New test. * gcc.target/i386/auto-init-padding-9.c: New test. --- gcc/builtins.c | 3 +- gcc/builtins.h | 1 + gcc/c-family/c-attribs.c | 27 +++ gcc/common.opt | 16 ++ gcc/doc/extend.texi | 16 ++ gcc/doc/invoke.texi | 41 +++- gcc/flag-types.h | 7 + gcc/gimple-fold.c | 54 +++-- gcc/gimplify.c | 151 ++++++++++++- gcc/internal-fn.c | 99 +++++++++ gcc/internal-fn.def | 4 + gcc/testsuite/c-c++-common/auto-init-1.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-10.c | 22 ++ gcc/testsuite/c-c++-common/auto-init-11.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-12.c | 14 ++ gcc/testsuite/c-c++-common/auto-init-13.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-14.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-15.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-16.c | 13 ++ gcc/testsuite/c-c++-common/auto-init-2.c | 39 ++++ gcc/testsuite/c-c++-common/auto-init-3.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-4.c | 19 ++ gcc/testsuite/c-c++-common/auto-init-5.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-6.c | 21 ++ gcc/testsuite/c-c++-common/auto-init-7.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-8.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-9.c | 20 ++ gcc/testsuite/c-c++-common/auto-init-esra.c | 35 +++ gcc/testsuite/c-c++-common/auto-init-padding-1.c | 23 ++ gcc/testsuite/c-c++-common/auto-init-padding-2.c | 114 ++++++++++ gcc/testsuite/c-c++-common/auto-init-padding-3.c | 114 ++++++++++ gcc/testsuite/g++.dg/auto-init-uninit-pred-1_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-2_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-3_a.C | 3 + gcc/testsuite/g++.dg/auto-init-uninit-pred-4.C | 3 + gcc/testsuite/gcc.dg/auto-init-sra-1.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-sra-2.c | 24 +++ gcc/testsuite/gcc.dg/auto-init-uninit-1.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-12.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-13.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-14.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-15.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-16.c | 25 +++ gcc/testsuite/gcc.dg/auto-init-uninit-17.c | 15 ++ gcc/testsuite/gcc.dg/auto-init-uninit-18.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-19.c | 26 +++ gcc/testsuite/gcc.dg/auto-init-uninit-2.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-20.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-21.c | 4 + gcc/testsuite/gcc.dg/auto-init-uninit-22.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-23.c | 27 +++ gcc/testsuite/gcc.dg/auto-init-uninit-24.c | 3 + gcc/testsuite/gcc.dg/auto-init-uninit-25.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-26.c | 23 ++ gcc/testsuite/gcc.dg/auto-init-uninit-3.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-34.c | 60 ++++++ gcc/testsuite/gcc.dg/auto-init-uninit-36.c | 238 +++++++++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-37.c | 156 ++++++++++++++ gcc/testsuite/gcc.dg/auto-init-uninit-4.c | 10 + gcc/testsuite/gcc.dg/auto-init-uninit-5.c | 6 + gcc/testsuite/gcc.dg/auto-init-uninit-6.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-8.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-9.c | 8 + gcc/testsuite/gcc.dg/auto-init-uninit-A.c | 7 + gcc/testsuite/gcc.dg/auto-init-uninit-B.c | 17 ++ gcc/testsuite/gcc.dg/auto-init-uninit-C.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-H.c | 5 + gcc/testsuite/gcc.dg/auto-init-uninit-I.c | 3 + gcc/testsuite/gcc.target/aarch64/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-2.c | 35 +++ gcc/testsuite/gcc.target/aarch64/auto-init-3.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-4.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-5.c | 19 ++ gcc/testsuite/gcc.target/aarch64/auto-init-6.c | 18 ++ gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 32 +++ gcc/testsuite/gcc.target/aarch64/auto-init-8.c | 32 +++ .../gcc.target/aarch64/auto-init-padding-1.c | 17 ++ .../gcc.target/aarch64/auto-init-padding-10.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-11.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-12.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-2.c | 18 ++ .../gcc.target/aarch64/auto-init-padding-3.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-4.c | 27 +++ .../gcc.target/aarch64/auto-init-padding-5.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-6.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-7.c | 20 ++ .../gcc.target/aarch64/auto-init-padding-8.c | 22 ++ .../gcc.target/aarch64/auto-init-padding-9.c | 21 ++ gcc/testsuite/gcc.target/i386/auto-init-1.c | 32 +++ gcc/testsuite/gcc.target/i386/auto-init-2.c | 36 ++++ gcc/testsuite/gcc.target/i386/auto-init-21.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-22.c | 14 ++ gcc/testsuite/gcc.target/i386/auto-init-23.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-24.c | 13 ++ gcc/testsuite/gcc.target/i386/auto-init-3.c | 17 ++ gcc/testsuite/gcc.target/i386/auto-init-4.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-5.c | 20 ++ gcc/testsuite/gcc.target/i386/auto-init-6.c | 19 ++ gcc/testsuite/gcc.target/i386/auto-init-7.c | 33 +++ gcc/testsuite/gcc.target/i386/auto-init-8.c | 35 +++ .../gcc.target/i386/auto-init-padding-1.c | 19 ++ .../gcc.target/i386/auto-init-padding-10.c | 21 ++ .../gcc.target/i386/auto-init-padding-11.c | 26 +++ .../gcc.target/i386/auto-init-padding-12.c | 26 +++ .../gcc.target/i386/auto-init-padding-2.c | 19 ++ .../gcc.target/i386/auto-init-padding-3.c | 30 +++ .../gcc.target/i386/auto-init-padding-4.c | 28 +++ .../gcc.target/i386/auto-init-padding-5.c | 21 ++ .../gcc.target/i386/auto-init-padding-6.c | 22 ++ .../gcc.target/i386/auto-init-padding-7.c | 22 ++ .../gcc.target/i386/auto-init-padding-8.c | 22 ++ .../gcc.target/i386/auto-init-padding-9.c | 22 ++ gcc/tree-cfg.c | 47 +++- gcc/tree-sra.c | 124 ++++++++++- gcc/tree-ssa-structalias.c | 3 + gcc/tree-ssa-uninit.c | 48 +++++ gcc/tree-ssa.c | 40 ++++ gcc/tree.c | 13 ++ 118 files changed, 3131 insertions(+), 44 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 99548627761..3e57eb03af0 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -142,7 +142,6 @@ static rtx expand_builtin_strcpy (tree, rtx); static rtx expand_builtin_strcpy_args (tree, tree, tree, rtx); static rtx expand_builtin_stpcpy (tree, rtx, machine_mode); static rtx expand_builtin_strncpy (tree, rtx); -static rtx expand_builtin_memset (tree, rtx, machine_mode); static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree); static rtx expand_builtin_bzero (tree); static rtx expand_builtin_strlen (tree, rtx, machine_mode); @@ -3872,7 +3871,7 @@ builtin_memset_gen_str (void *data, void *prev, try to get the result in TARGET, if convenient (and in mode MODE if that's convenient). */ -static rtx +rtx expand_builtin_memset (tree exp, rtx target, machine_mode mode) { if (!validate_arglist (exp, diff --git a/gcc/builtins.h b/gcc/builtins.h index 16b47ac1a7b..d330b78e591 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -114,6 +114,7 @@ extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT, fixed_size_mode); +extern rtx expand_builtin_memset (tree, rtx, machine_mode); extern rtx expand_builtin_saveregs (void); extern tree std_build_builtin_va_list (void); extern tree std_fn_abi_va_list (tree); diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index d14e9c441b3..007b928c54b 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -83,6 +83,7 @@ static tree handle_artificial_attribute (tree *, tree, tree, int, bool *); static tree handle_flatten_attribute (tree *, tree, tree, int, bool *); static tree handle_error_attribute (tree *, tree, tree, int, bool *); static tree handle_used_attribute (tree *, tree, tree, int, bool *); +static tree handle_uninitialized_attribute (tree *, tree, tree, int, bool *); static tree handle_externally_visible_attribute (tree *, tree, tree, int, bool *); static tree handle_no_reorder_attribute (tree *, tree, tree, int, @@ -333,6 +334,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_used_attribute, NULL }, { "unused", 0, 0, false, false, false, false, handle_unused_attribute, NULL }, + { "uninitialized", 0, 0, true, false, false, false, + handle_uninitialized_attribute, NULL }, { "retain", 0, 0, true, false, false, false, handle_retain_attribute, NULL }, { "externally_visible", 0, 0, true, false, false, false, @@ -1617,6 +1620,30 @@ handle_retain_attribute (tree *pnode, tree name, tree ARG_UNUSED (args), return NULL_TREE; } +/* Handle an "uninitialized" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_uninitialized_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + tree decl = *node; + if (!VAR_P (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a variable", name, decl); + *no_add_attrs = true; + } + else if (TREE_STATIC (decl) || DECL_EXTERNAL (decl)) + { + warning (OPT_Wattributes, "%qE attribute ignored because %qD " + "is not a local variable", name, decl); + *no_add_attrs = true; + } + + return NULL_TREE; +} + /* Handle a "externally_visible" attribute; arguments as in struct attribute_spec.handler. */ diff --git a/gcc/common.opt b/gcc/common.opt index f103a7de004..b921f5e3b25 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3081,6 +3081,22 @@ ftree-scev-cprop Common Var(flag_tree_scev_cprop) Init(1) Optimization Enable copy propagation of scalar-evolution information. +ftrivial-auto-var-init= +Common Joined RejectNegative Enum(auto_init_type) Var(flag_auto_var_init) Init(AUTO_INIT_UNINITIALIZED) Optimization +-ftrivial-auto-var-init=[uninitialized|pattern|zero] Add initializations to automatic variables. + +Enum +Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized automatic variable initialization type %qs) + +EnumValue +Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED) + +EnumValue +Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN) + +EnumValue +Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO) + ; -fverbose-asm causes extra commentary information to be produced in ; the generated assembly code (to make it more readable). This option ; is generally only of use to those who actually need to read the diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 52bc4e5b76e..8b324a097a4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7610,6 +7610,22 @@ will be placed in new, unique sections. This additional functionality requires Binutils version 2.36 or later. +@item uninitialized +@cindex @code{uninitialized} variable attribute +This attribute, attached to a variable with automatic storage, means that +the variable should not be automatically initialized by the compiler when +the option @code{-ftrivial-auto-var-init} presents. + +With the option @code{-ftrivial-auto-var-init}, all the automatic variables +that do not have explicit initializers will be initialized by the compiler. +These additional compiler initializations might incur run-time overhead, +sometimes dramatically. This attribute can be used to mark some variables +to be excluded from such automatical initialization in order to reduce runtime +overhead. + +This attribute has no effect when the option @code{-ftrivial-auto-var-init} +does not present. + @item vector_size (@var{bytes}) @cindex @code{vector_size} variable attribute This attribute specifies the vector size for the type of the declared diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d4b3a66ee4f..b08a5eb4d73 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -573,9 +573,9 @@ Objective-C and Objective-C++ Dialects}. -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra @gol -ftree-switch-conversion -ftree-tail-merge @gol --ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol --funit-at-a-time -funroll-all-loops -funroll-loops @gol --funsafe-math-optimizations -funswitch-loops @gol +-ftree-ter -ftree-vectorize -ftree-vrp -ftrivial-auto-var-init @gol +-funconstrained-commons -funit-at-a-time -funroll-all-loops @gol +-funroll-loops -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol -fweb -fwhole-program -fwpa -fuse-linker-plugin -fzero-call-used-regs @gol --param @var{name}=@var{value} @@ -11843,6 +11843,41 @@ Perform basic block vectorization on trees. This flag is enabled by default at @option{-O3} and by @option{-ftree-vectorize}, @option{-fprofile-use}, and @option{-fauto-profile}. +@item -ftrivial-auto-var-init=@var{choice} +@opindex ftrivial-auto-var-init +Initialize automatic variables with either a pattern or with zeroes to increase +the security and predictability of a program by preventing uninitialized memory +disclosure and use. +GCC still considers an automatic variable that doesn't have an explicit +initializer as uninitialized, -Wuninitialized will still report warning messages +on such automatic variables. +With this option, GCC will also initialize any padding of automatic variables +that have structure or union types to zeroes. + +The three values of @var{choice} are: + +@itemize @bullet +@item +@samp{uninitialized} doesn't initialize any automatic variables. +This is C and C++'s default. + +@item +@samp{pattern} Initialize automatic variables with values which will likely +transform logic bugs into crashes down the line, are easily recognized in a +crash dump and without being values that programmers can rely on for useful +program semantics. +The current value is byte-repeatable pattern with byte "0xFE". +The values used for pattern initialization might be changed in the future. + +@item +@samp{zero} Initialize automatic variables with zeroes. +@end itemize + +The default is @samp{uninitialized}. + +You can control this behavior for a specific variable by using the variable +attribute @code{uninitialized} (@pxref{Variable Attributes}). + @item -fvect-cost-model=@var{model} @opindex fvect-cost-model Alter the cost model used for vectorization. The @var{model} argument diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 45a2338d5f6..5bd1f771c8b 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -281,6 +281,13 @@ enum vect_cost_model { VECT_COST_MODEL_DEFAULT = 1 }; +/* Automatic variable initialization type. */ +enum auto_init_type { + AUTO_INIT_UNINITIALIZED = 0, + AUTO_INIT_PATTERN = 1, + AUTO_INIT_ZERO = 2 +}; + /* Different instrumentation modes. */ enum sanitize_code { /* AddressSanitizer. */ diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 3f2c176cff6..dd0e6b5daff 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -4518,12 +4518,14 @@ clear_padding_add_padding (clear_padding_struct *buf, } } -static void clear_padding_type (clear_padding_struct *, tree, HOST_WIDE_INT); +static void clear_padding_type (clear_padding_struct *, tree, + HOST_WIDE_INT, bool); /* Clear padding bits of union type TYPE. */ static void -clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_union (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { clear_padding_struct *union_buf; HOST_WIDE_INT start_off = 0, next_off = 0; @@ -4568,7 +4570,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE && !COMPLETE_TYPE_P (TREE_TYPE (field))); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not have " "well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4579,7 +4581,7 @@ clear_padding_union (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) union_buf->off = start_off; union_buf->size = start_size; memset (union_buf->buf, ~0, start_size); - clear_padding_type (union_buf, TREE_TYPE (field), fldsz); + clear_padding_type (union_buf, TREE_TYPE (field), fldsz, for_auto_init); clear_padding_add_padding (union_buf, sz - fldsz); clear_padding_flush (union_buf, true); } @@ -4649,7 +4651,8 @@ clear_padding_type_may_have_padding_p (tree type) __builtin_clear_padding (buf.base); */ static void -clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) +clear_padding_emit_loop (clear_padding_struct *buf, tree type, + tree end, bool for_auto_init) { tree l1 = create_artificial_label (buf->loc); tree l2 = create_artificial_label (buf->loc); @@ -4660,7 +4663,7 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) g = gimple_build_label (l1); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); - clear_padding_type (buf, type, buf->sz); + clear_padding_type (buf, type, buf->sz, for_auto_init); clear_padding_flush (buf, true); g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base, size_int (buf->sz)); @@ -4678,10 +4681,16 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, tree end) } /* Clear padding bits for TYPE. Called recursively from - gimple_fold_builtin_clear_padding. */ + gimple_fold_builtin_clear_padding. If FOR_AUTO_INIT is true, + the __builtin_clear_padding is not called by the end user, + instead, it's inserted by the compiler to initialize the + paddings of automatic variable. Therefore, we should not + emit the error messages for flexible array members to confuse + the end user. */ static void -clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) +clear_padding_type (clear_padding_struct *buf, tree type, + HOST_WIDE_INT sz, bool for_auto_init) { switch (TREE_CODE (type)) { @@ -4765,7 +4774,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) continue; gcc_assert (TREE_CODE (ftype) == ARRAY_TYPE && !COMPLETE_TYPE_P (ftype)); - if (!buf->clear_in_mask) + if (!buf->clear_in_mask && !for_auto_init) error_at (buf->loc, "flexible array member %qD does not " "have well defined padding bits for %qs", field, "__builtin_clear_padding"); @@ -4781,7 +4790,8 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) gcc_assert (pos >= 0 && fldsz >= 0 && pos >= cur_pos); clear_padding_add_padding (buf, pos - cur_pos); cur_pos = pos; - clear_padding_type (buf, TREE_TYPE (field), fldsz); + clear_padding_type (buf, TREE_TYPE (field), + fldsz, for_auto_init); cur_pos += fldsz; } } @@ -4821,7 +4831,7 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) buf->align = TYPE_ALIGN (elttype); buf->off = 0; buf->size = 0; - clear_padding_emit_loop (buf, elttype, end); + clear_padding_emit_loop (buf, elttype, end, for_auto_init); buf->base = base; buf->sz = prev_sz; buf->align = prev_align; @@ -4831,10 +4841,10 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; } for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case UNION_TYPE: - clear_padding_union (buf, type, sz); + clear_padding_union (buf, type, sz, for_auto_init); break; case REAL_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4858,14 +4868,14 @@ clear_padding_type (clear_padding_struct *buf, tree type, HOST_WIDE_INT sz) break; case COMPLEX_TYPE: fldsz = int_size_in_bytes (TREE_TYPE (type)); - clear_padding_type (buf, TREE_TYPE (type), fldsz); - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case VECTOR_TYPE: nelts = TYPE_VECTOR_SUBPARTS (type).to_constant (); fldsz = int_size_in_bytes (TREE_TYPE (type)); for (HOST_WIDE_INT i = 0; i < nelts; i++) - clear_padding_type (buf, TREE_TYPE (type), fldsz); + clear_padding_type (buf, TREE_TYPE (type), fldsz, for_auto_init); break; case NULLPTR_TYPE: gcc_assert ((size_t) sz <= clear_padding_unit); @@ -4901,7 +4911,7 @@ clear_type_padding_in_mask (tree type, unsigned char *mask) buf.sz = int_size_in_bytes (type); buf.size = 0; buf.union_ptr = mask; - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, false); clear_padding_flush (&buf, true); } @@ -4911,9 +4921,13 @@ static bool gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) { gimple *stmt = gsi_stmt (*gsi); - gcc_assert (gimple_call_num_args (stmt) == 2); + gcc_assert (gimple_call_num_args (stmt) == 3); tree ptr = gimple_call_arg (stmt, 0); tree typearg = gimple_call_arg (stmt, 1); + /* the 3rd argument of __builtin_clear_padding is to distinguish whether + this call is made by the user or by the compiler for automatic variable + initialization. */ + bool for_auto_init = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); tree type = TREE_TYPE (TREE_TYPE (typearg)); location_t loc = gimple_location (stmt); clear_padding_struct buf; @@ -4970,7 +4984,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) buf.sz = eltsz; buf.align = TYPE_ALIGN (elttype); buf.alias_type = build_pointer_type (elttype); - clear_padding_emit_loop (&buf, elttype, end); + clear_padding_emit_loop (&buf, elttype, end, for_auto_init); } } else @@ -4983,7 +4997,7 @@ gimple_fold_builtin_clear_padding (gimple_stmt_iterator *gsi) gsi_insert_before (gsi, g, GSI_SAME_STMT); } buf.alias_type = build_pointer_type (type); - clear_padding_type (&buf, type, buf.sz); + clear_padding_type (&buf, type, buf.sz, for_auto_init); clear_padding_flush (&buf, true); } diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 99d1c7fcce4..3314f76cf3f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1743,6 +1743,94 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) return NULL_TREE; } +/* Generate an initialization to automatic variable DECL based on INIT_TYPE. + Build a call to internal const function DEFERRED_INIT: + 1st argument: SIZE of the DECL; + 2nd argument: INIT_TYPE; + 3rd argument: IS_VLA, 0 NO, 1 YES; + + as LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA) + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. */ +static void +gimple_add_init_for_auto_var (tree decl, + enum auto_init_type init_type, + bool is_vla, + gimple_seq *seq_p) +{ + gcc_assert (auto_var_p (decl)); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + location_t loc = EXPR_LOCATION (decl); + tree decl_size = TYPE_SIZE_UNIT (TREE_TYPE (decl)); + + tree init_type_node + = build_int_cst (integer_type_node, (int) init_type); + tree is_vla_node + = build_int_cst (integer_type_node, (int) is_vla); + + tree call = build_call_expr_internal_loc (loc, IFN_DEFERRED_INIT, + TREE_TYPE (decl), 3, + decl_size, init_type_node, + is_vla_node); + + gimplify_assign (decl, call, seq_p); +} + +/* Generate padding initialization for automatic vairable DECL. + C guarantees that brace-init with fewer initializers than members + aggregate will initialize the rest of the aggregate as-if it were + static initialization. In turn static initialization guarantees + that padding is initialized to zero. So, we always initialize paddings + to zeroes regardless INIT_TYPE. + To do the padding initialization, we insert a call to + __BUILTIN_CLEAR_PADDING (&decl, 0, for_auto_init = true). + Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING, + 'for_auto_init' to distinguish whether this call is for automatic + variable initialization or not. + */ +static void +gimple_add_padding_init_for_auto_var (tree decl, bool is_vla, + gimple_seq *seq_p) +{ + tree addr_of_decl = NULL_TREE; + bool for_auto_init = true; + tree fn = builtin_decl_explicit (BUILT_IN_CLEAR_PADDING); + + if (is_vla) + { + /* The temporary address variable for this vla should be + created in gimplify_vla_decl. */ + gcc_assert (DECL_HAS_VALUE_EXPR_P (decl)); + gcc_assert (TREE_CODE (DECL_VALUE_EXPR (decl)) == INDIRECT_REF); + addr_of_decl = TREE_OPERAND (DECL_VALUE_EXPR (decl), 0); + } + else + { + mark_addressable (decl); + addr_of_decl = build_fold_addr_expr (decl); + } + + gimple *call = gimple_build_call (fn, + 3, addr_of_decl, + build_zero_cst (TREE_TYPE (addr_of_decl)), + build_int_cst (integer_type_node, + (int) for_auto_init)); + gimplify_seq_add_stmt (seq_p, call); +} + +/* Return true if the DECL need to be automaticly initialized by the + compiler. */ +static bool +is_var_need_auto_init (tree decl) +{ + if (auto_var_p (decl) + && (flag_auto_var_init > AUTO_INIT_UNINITIALIZED) + && (!lookup_attribute ("uninitialized", DECL_ATTRIBUTES (decl)))) + return true; + return false; +} + /* Gimplify a DECL_EXPR node *STMT_P by making any necessary allocation and initialization explicit. */ @@ -1840,6 +1928,26 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) as they may contain a label address. */ walk_tree (&init, force_labels_r, NULL, NULL); } + /* When there is no explicit initializer, if the user requested, + We should insert an artifical initializer for this automatic + variable. */ + else if (is_var_need_auto_init (decl)) + { + gimple_add_init_for_auto_var (decl, + flag_auto_var_init, + is_vla, + seq_p); + /* The expanding of a call to the above .DEFERRED_INIT will apply + block initialization to the whole space covered by this variable. + As a result, all the paddings will be initialized to zeroes + for zero initialization and 0xFE byte-repeatable patterns for + pattern initialization. + In order to make the paddings as zeroes for pattern init, We + should add a call to __builtin_clear_padding to clear the + paddings to zero in compatiple with CLANG. */ + if (flag_auto_var_init == AUTO_INIT_PATTERN) + gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p); + } } return GS_ALL_DONE; @@ -3411,11 +3519,15 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) { /* Remember the original type of the argument in an internal dummy second argument, as in GIMPLE pointer conversions are - useless. */ + useless. also mark this call as not for automatic initialization + in the internal dummy third argument. */ p = CALL_EXPR_ARG (*expr_p, 0); + bool for_auto_init = false; *expr_p - = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 2, p, - build_zero_cst (TREE_TYPE (p))); + = build_call_expr_loc (EXPR_LOCATION (*expr_p), fndecl, 3, p, + build_zero_cst (TREE_TYPE (p)), + build_int_cst (integer_type_node, + (int) for_auto_init)); return GS_OK; } break; @@ -4872,6 +4984,9 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, tree object, ctor, type; enum gimplify_status ret; vec<constructor_elt, va_gc> *elts; + bool cleared = false; + bool is_empty_ctor = false; + bool is_init_expr = (TREE_CODE (*expr_p) == INIT_EXPR); gcc_assert (TREE_CODE (TREE_OPERAND (*expr_p, 1)) == CONSTRUCTOR); @@ -4914,7 +5029,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, struct gimplify_init_ctor_preeval_data preeval_data; HOST_WIDE_INT num_ctor_elements, num_nonzero_elements; HOST_WIDE_INT num_unique_nonzero_elements; - bool cleared, complete_p, valid_const_initializer; + bool complete_p, valid_const_initializer; /* Aggregate types must lower constructors to initialization of individual elements. The exception is that a CONSTRUCTOR node @@ -4923,6 +5038,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, { if (notify_temp_creation) return GS_OK; + is_empty_ctor = true; break; } @@ -5248,13 +5364,28 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, if (want_value) { *expr_p = object; - return GS_OK; + ret = GS_OK; } else { *expr_p = NULL; - return GS_ALL_DONE; - } + ret = GS_ALL_DONE; + } + + /* If the user requests to initialize automatic variables, we + should initialize paddings inside the variable. Add a call to + __BUILTIN_CLEAR_PADDING (&object, 0, for_auto_init = true) to + initialize paddings of object always to zero regardless of + INIT_TYPE. Note, we will not insert this call if the aggregate + variable has be completely cleared already or it's initialized + with an empty constructor. */ + if (is_init_expr + && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor) + || !AGGREGATE_TYPE_P (type)) + && is_var_need_auto_init (object)) + gimple_add_padding_init_for_auto_var (object, false, pre_p); + + return ret; } /* Given a pointer value OP0, return a simplified version of an @@ -5395,10 +5526,12 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, tree *to_p, crack at this before we break it down. */ if (ret != GS_UNHANDLED) break; + /* If we're initializing from a CONSTRUCTOR, break this into individual MODIFY_EXPRs. */ - return gimplify_init_constructor (expr_p, pre_p, post_p, want_value, - false); + ret = gimplify_init_constructor (expr_p, pre_p, post_p, want_value, + false); + return ret; case COND_EXPR: /* If we're assigning to a non-register type, push the assignment diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 1360a00f0b9..ada2a820ff1 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -53,6 +53,9 @@ along with GCC; see the file COPYING3. If not see #include "rtl-iter.h" #include "gimple-range.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* The names of each internal function, indexed by function number. */ const char *const internal_fn_name_array[] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) #CODE, @@ -2977,6 +2980,102 @@ expand_UNIQUE (internal_fn, gcall *stmt) emit_insn (pattern); } +/* Expand the IFN_DEFERRED_INIT function: + LHS = DEFERRED_INIT (SIZE of the DECL, INIT_TYPE, IS_VLA); + + if IS_VLA is false, the LHS is the DECL itself, + if IS_VLA is true, the LHS is a MEM_REF whose address is the pointer + to this DECL. + + Initialize the LHS with zero/pattern according to its second argument + INIT_TYPE: + if INIT_TYPE is AUTO_INIT_ZERO, use zeroes to initialize; + if INIT_TYPE is AUTO_INIT_PATTERN, use 0xFE byte-repeatable pattern + to initialize; + The LHS variable is initialized including paddings. + The reasons to choose 0xFE for pattern initialization are: + 1. It is a non-canonical virtual address on x86_64, and at the + high end of the i386 kernel address space. + 2. It is a very large float value (-1.694739530317379e+38). + 3. It is also an unusual number for integers. */ +#define INIT_PATTERN_VALUE 0xFE +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree var_size = gimple_call_arg (stmt, 0); + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2)); + bool reg_lhs = true; + + tree var_type = TREE_TYPE (lhs); + gcc_assert (init_type > AUTO_INIT_UNINITIALIZED); + + if (DECL_P (lhs)) + { + rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + reg_lhs = !MEM_P (tem); + } + else if (TREE_CODE (lhs) == SSA_NAME) + reg_lhs = true; + else + { + gcc_assert (is_vla); + reg_lhs = false; + } + + + if (!reg_lhs) + { + /* If this is a VLA or the variable is not in register, + expand to a memset to initialize it. */ + + mark_addressable (lhs); + tree var_addr = build_fold_addr_expr (lhs); + + tree value = (init_type == AUTO_INIT_PATTERN) ? + build_int_cst (integer_type_node, + INIT_PATTERN_VALUE) : + integer_zero_node; + tree m_call = build_call_expr (builtin_decl_implicit (BUILT_IN_MEMSET), + 3, var_addr, value, var_size); + /* Expand this memset call. */ + expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type)); + } + else + { + /* If this variable is in a register, use expand_assignment might + generate better code. */ + tree init = build_zero_cst (var_type); + unsigned HOST_WIDE_INT total_bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (var_type)); + + if (init_type == AUTO_INIT_PATTERN) + { + tree alt_type = NULL_TREE; + if (!can_native_interpret_type_p (var_type)) + { + alt_type </cut>

4 years, 5 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 22 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 Culprit: <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 </cut> Results regressed to (for first_bad == 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f/results_id: 1 # 447.dealII,[.] _ZNK13LaplaceSolver6SolverILi3EE15assemble_mat regressed by 120 # 464.h264ref,h264ref_base.default regressed by 104 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 135 from (for last_good == 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3 artifacts/build-50f4ae58eb136bc9d802cb98f02b6ff237eb61e0/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4963 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/4956 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f cd investigate-llvm-10c982e0b3e6d46d1fe288d7dbe0a393c65a640f git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f ../artifacts/test.sh # Reproduce last_good build git checkout --detach 50f4ae58eb136bc9d802cb98f02b6ff237eb61e0 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 10c982e0b3e6d46d1fe288d7dbe0a393c65a640f Author: Simon Pilgrim <llvm-dev(a)redking.me.uk> Date: Mon Aug 23 21:06:06 2021 +0100 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450 --- .../InstCombine/InstructionCombining.cpp | 21 ----- .../InstCombine/gep-combine-loop-invariant.ll | 12 +-- llvm/test/Transforms/InstCombine/gep-custom-dl.ll | 4 +- llvm/test/Transforms/InstCombine/getelementptr.ll | 4 +- llvm/test/Transforms/InstCombine/select-gep.ll | 12 +-- llvm/test/Transforms/InstCombine/shift.ll | 4 +- .../LoopVectorize/AArch64/sve-vector-reverse.ll | 100 ++++++++++----------- .../LoopVectorize/AArch64/vector-reverse-mask4.ll | 54 +++++------ .../Transforms/LoopVectorize/ARM/mve-reductions.ll | 26 +++--- .../X86/x86-interleaved-accesses-masked-group.ll | 60 +++++++------ .../x86-interleaved-store-accesses-with-gaps.ll | 58 ++++++------ .../LoopVectorize/consecutive-ptr-uniforms.ll | 4 +- .../LoopVectorize/interleaved-accesses.ll | 62 +++++++------ 13 files changed, 210 insertions(+), 211 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp index 1026b9da44e9..48645b484fd2 100644 --- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp +++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp @@ -2131,27 +2131,6 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) { } } } - - // Guard the gep(gep) fold so we don't create an add inside a loop - // when there wasn't an equivalent instruction there before. - bool DifferentLoops = false; - if (LI) - if (auto *GEPLoop = LI->getLoopFor(GEP.getParent())) - if (auto *SrcOpI = dyn_cast<Instruction>(Src)) - if (LI->getLoopFor(SrcOpI->getParent()) != GEPLoop) - DifferentLoops = true; - - // Fold (gep(gep(Ptr,Idx0),Idx1) -> gep(Ptr,add(Idx0,Idx1)) - if (!DifferentLoops && GO1->getType() == SO1->getType()) { - bool NewInBounds = GEP.isInBounds() && Src->isInBounds(); - auto *NewIdx = - Builder.CreateAdd(GO1, SO1, GEP.getName() + ".idx", - /*HasNUW*/ false, /*HasNSW*/ NewInBounds); - auto *NewGEP = GetElementPtrInst::Create( - GEPEltType, Src->getPointerOperand(), {NewIdx}); - NewGEP->setIsInBounds(NewInBounds); - return NewGEP; - } } // Note that if our source is a gep chain itself then we wait for that diff --git a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll index dfa664fde208..f9aac12cfb1f 100644 --- a/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll +++ b/llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll @@ -8,10 +8,10 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-LABEL: @foo( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[IDX_EXT2:%.*]] = zext i32 [[CUR_MATCH:%.*]] to i64 +; CHECK-NEXT: [[ADD_PTR4:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[IDX_EXT2]] ; CHECK-NEXT: [[IDX_EXT1:%.*]] = zext i32 [[BEST_LEN:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR25_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT1]], [[IDX_EXT2]] -; CHECK-NEXT: [[ADD_PTR36_IDX:%.*]] = add nsw i64 [[ADD_PTR25_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[WIN:%.*]], i64 [[ADD_PTR36_IDX]] +; CHECK-NEXT: [[ADD_PTR25:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR4]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR36:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR25]], i64 -1 ; CHECK-NEXT: [[TMP0:%.*]] = bitcast i8* [[ADD_PTR36]] to i32* ; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[TMP0]], align 4 ; CHECK-NEXT: [[CMP7:%.*]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.*]] @@ -20,9 +20,9 @@ define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i3 ; CHECK-NEXT: br label [[IF_THEN:%.*]] ; CHECK: do.body: ; CHECK-NEXT: [[IDX_EXT:%.*]] = zext i32 [[TMP4:%.*]] to i64 -; CHECK-NEXT: [[ADD_PTR2_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT]], [[IDX_EXT1]] -; CHECK-NEXT: [[ADD_PTR3_IDX:%.*]] = add nsw i64 [[ADD_PTR2_IDX]], -1 -; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[ADD_PTR3_IDX]] +; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, i8* [[WIN]], i64 [[IDX_EXT1]] +; CHECK-NEXT: [[ADD_PTR2:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR]], i64 -1 +; CHECK-NEXT: [[ADD_PTR3:%.*]] = getelementptr inbounds i8, i8* [[ADD_PTR2]], i64 [[IDX_EXT]] ; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8* [[ADD_PTR3]] to i32* ; CHECK-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP2]], align 4 ; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]] diff --git a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll index 0980451d8ec7..3de70f3c151c 100644 --- a/llvm/test/Transforms/InstCombine/gep-custom-dl.ll +++ b/llvm/test/Transforms/InstCombine/gep-custom-dl.ll @@ -75,8 +75,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test4(i32* %I, i32 %C, i32 %D) { ; CHECK-LABEL: @test4( -; CHECK-NEXT: [[B_IDX:%.*]] = add i32 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i32 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i32 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i32 %C diff --git a/llvm/test/Transforms/InstCombine/getelementptr.ll b/llvm/test/Transforms/InstCombine/getelementptr.ll index 688303d308c1..f2a336767fda 100644 --- a/llvm/test/Transforms/InstCombine/getelementptr.ll +++ b/llvm/test/Transforms/InstCombine/getelementptr.ll @@ -115,8 +115,8 @@ define void @test_evaluate_gep_as_ptrs_array(i8 addrspace(2)* %B) { define i32* @test7(i32* %I, i64 %C, i64 %D) { ; CHECK-LABEL: @test7( -; CHECK-NEXT: [[B_IDX:%.*]] = add i64 [[D:%.*]], [[C:%.*]] -; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[B_IDX]] +; CHECK-NEXT: [[A:%.*]] = getelementptr i32, i32* [[I:%.*]], i64 [[C:%.*]] +; CHECK-NEXT: [[B:%.*]] = getelementptr i32, i32* [[A]], i64 [[D:%.*]] ; CHECK-NEXT: ret i32* [[B]] ; %A = getelementptr i32, i32* %I, i64 %C diff --git a/llvm/test/Transforms/InstCombine/select-gep.ll b/llvm/test/Transforms/InstCombine/select-gep.ll index 2e112fe93a4c..519f0a94a136 100644 --- a/llvm/test/Transforms/InstCombine/select-gep.ll +++ b/llvm/test/Transforms/InstCombine/select-gep.ll @@ -102,10 +102,10 @@ define i32* @test2b(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2c( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 0, i64 6 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x @@ -118,10 +118,10 @@ define i32* @test2c(i32* %p, i64 %x, i64 %y) { ; PR51069 define i32* @test2d(i32* %p, i64 %x, i64 %y) { ; CHECK-LABEL: @test2d( -; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X:%.*]], [[Y:%.*]] +; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i32, i32* [[P:%.*]], i64 [[X:%.*]] +; CHECK-NEXT: [[ICMP:%.*]] = icmp ugt i64 [[X]], [[Y:%.*]] ; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 6, i64 0 -; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]] -; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[P:%.*]], i64 [[SEL_IDX1]] +; CHECK-NEXT: [[SEL:%.*]] = getelementptr i32, i32* [[GEP1]], i64 [[SEL_IDX]] ; CHECK-NEXT: ret i32* [[SEL]] ; %gep1 = getelementptr inbounds i32, i32* %p, i64 %x diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll index f87de574bc99..2c5c4a7dbe1c 100644 --- a/llvm/test/Transforms/InstCombine/shift.ll +++ b/llvm/test/Transforms/InstCombine/shift.ll @@ -1774,10 +1774,10 @@ define void @ashr_out_of_range(i177* %A) { define void @ashr_out_of_range_1(i177* %A) { ; CHECK-LABEL: @ashr_out_of_range_1( ; CHECK-NEXT: [[L:%.*]] = load i177, i177* [[A:%.*]], align 4 +; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, i177* [[A]], i64 -1 ; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175 ; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64 -; CHECK-NEXT: [[G62_IDX:%.*]] = add i64 [[TMP1]], -1 -; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[A]], i64 [[G62_IDX]] +; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, i177* [[G11]], i64 [[TMP1]] ; CHECK-NEXT: store i177 0, i177* [[G62]], align 4 ; CHECK-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll index d406c6de1571..5cd5af5dd9e6 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll @@ -34,30 +34,30 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast double* [[TMP9]] to <vscale x 8 x double>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP10]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <vscale x 8 x double>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x double>, <vscale x 8 x double>* [[TMP11]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP15]] to <vscale x 8 x double>* -; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP16]], align 8, !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP17]] to <vscale x 8 x double>* +; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP18]], align 8, !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -72,8 +72,8 @@ define void @vector_reverse_f64(i64 %N, double* %a, double* %b) #0{ ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[B]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1 @@ -126,30 +126,30 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1 ; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]] -; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8 -; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1 -; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64 -; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[DOTIDX]] -; CHECK-NEXT: [[TMP10:%.*]] = bitcast i64* [[TMP9]] to <vscale x 8 x i64>* -; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP10]], align 8, !alias.scope !9 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8 +; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1 +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, i64* [[TMP6]], i64 [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = bitcast i64* [[TMP10]] to <vscale x 8 x i64>* +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i64>, <vscale x 8 x i64>* [[TMP11]], align 8, !alias.scope !9 ; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[WIDE_LOAD]]) -; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) -; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP11]]) -; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32() -; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8 -; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1 -; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64 -; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]] -; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[DOTIDX8]] -; CHECK-NEXT: [[TMP16:%.*]] = bitcast i64* [[TMP15]] to <vscale x 8 x i64>* -; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP16]], align 8, !alias.scope !12, !noalias !9 -; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3 -; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]] -; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] +; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer) +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP12]]) +; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32() +; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8 +; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1 +; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64 +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, i64* [[TMP13]], i64 [[TMP16]] +; CHECK-NEXT: [[TMP18:%.*]] = bitcast i64* [[TMP17]] to <vscale x 8 x i64>* +; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP18]], align 8, !alias.scope !12, !noalias !9 +; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64() +; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3 +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]] +; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -164,8 +164,8 @@ define void @vector_reverse_i64(i64 %N, i64* %a, i64* %b) #0 { ; CHECK-NEXT: [[I_09_IN:%.*]] = phi i64 [ [[I_09:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_09]] = add nsw i64 [[I_09_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, i64* [[B]], i64 [[I_09]] -; CHECK-NEXT: [[TMP20:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP20]], 1 +; CHECK-NEXT: [[TMP22:%.*]] = load i64, i64* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP22]], 1 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i64, i64* [[A]], i64 [[I_09]] ; CHECK-NEXT: store i64 [[ADD]], i64* [[ARRAYIDX2]], align 8 ; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_09_IN]], 1 diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll index 4233760333ac..077d3c1f71b3 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll @@ -44,30 +44,32 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[TMP4:%.*]] = bitcast double* [[TMP3]] to <4 x double>* ; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP4]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -7 -; CHECK-NEXT: [[TMP6:%.*]] = bitcast double* [[TMP5]] to <4 x double>* -; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP6]], align 8, !alias.scope !0 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, double* [[TMP2]], i64 -4 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, double* [[TMP5]], i64 -3 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast double* [[TMP6]] to <4 x double>* +; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, <4 x double>* [[TMP7]], align 8, !alias.scope !0 ; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP7:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer -; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -3 -; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP7]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP11:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP11]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds double, double* [[TMP9]], i64 -7 -; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> -; CHECK-NEXT: [[TMP13:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP13]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP14:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> -; CHECK-NEXT: [[TMP16:%.*]] = bitcast double* [[TMP10]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP14]], <4 x double>* [[TMP16]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 -; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP12]] to <4 x double>* -; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP15]], <4 x double>* [[TMP17]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer +; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[TMP1]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -3 +; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[TMP10]], i64 -4 +; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds double, double* [[TMP13]], i64 -3 +; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0> +; CHECK-NEXT: [[TMP15:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.*]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* nonnull [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00> +; CHECK-NEXT: [[TMP18:%.*]] = bitcast double* [[TMP11]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0 +; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP14]] to <4 x double>* +; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8 -; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -82,13 +84,13 @@ define void @vector_reverse_mask_v4i1(double* %a, double* %cond, i64 %N) #0 { ; CHECK-NEXT: [[I_08_IN:%.*]] = phi i64 [ [[I_08:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[COND]], i64 [[I_08]] -; CHECK-NEXT: [[TMP19:%.*]] = load double, double* [[ARRAYIDX]], align 8 -; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP19]], 0.000000e+00 +; CHECK-NEXT: [[TMP21:%.*]] = load double, double* [[ARRAYIDX]], align 8 +; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP21]], 0.000000e+00 ; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]] ; CHECK: if.then: ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds double, double* [[A]], i64 [[I_08]] -; CHECK-NEXT: [[TMP20:%.*]] = load double, double* [[ARRAYIDX1]], align 8 -; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00 +; CHECK-NEXT: [[TMP22:%.*]] = load double, double* [[ARRAYIDX1]], align 8 +; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00 ; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8 ; CHECK-NEXT: br label [[FOR_INC]] ; CHECK: for.inc: diff --git a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll index 3e8ac1bad93c..e66fbede57b7 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll @@ -1367,26 +1367,28 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] ; CHECK: vector.body: ; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[VEC_PHI:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.*]], [[VECTOR_BODY]] ] ; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i32 [[INDEX]], 1 -; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <8 x i32>* -; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP4]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[ARR:%.*]], i32 -1 +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[TMP4]], i32 [[TMP3]] +; CHECK-NEXT: [[TMP6:%.*]] = bitcast i32* [[TMP5]] to <8 x i32>* +; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP6]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) -; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]] -; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) -; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]] +; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]]) +; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[VEC_PHI]] +; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]]) +; CHECK-NEXT: [[TMP10]] = add i32 [[TMP9]], [[TMP8]] ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4 -; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT]], label [[SCALAR_PH]] ; CHECK: scalar.ph: ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] -; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] +; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP10]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: ; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] @@ -1402,7 +1404,7 @@ define i32 @reduction_interleave_group(i32 %n, i32* %arr) #0 { ; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]] ; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP31:![0-9]+]] ; CHECK: exit: -; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ] +; CHECK-NEXT: [[RET_LCSSA:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ] ; CHECK-NEXT: ret i32 [[RET_LCSSA]] ; entry: diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll index a80140fea413..884d743a1bad 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll @@ -1439,17 +1439,19 @@ define dso_local void @masked_strided2(i8* noalias nocapture readonly %p, i8* no ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -1875,17 +1877,19 @@ define dso_local void @masked_strided2_unknown_tc(i8* noalias nocapture readonly ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = select <8 x i1> [[TMP6]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = sub <8 x i8> zeroinitializer, [[TMP7]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP7]], <8 x i8> [[TMP8]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, i8* [[TMP10]], i32 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.*]] = bitcast i8* [[TMP11]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -2311,16 +2315,18 @@ define dso_local void @unconditional_masked_strided2_unknown_tc(i8* noalias noca ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 [[TMP1]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i8* [[TMP7]] to <16 x i8>* -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, i8* [[Q:%.*]], i32 -1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, i8* [[TMP8]], i32 [[TMP4]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = bitcast i8* [[TMP9]] to <16 x i8>* +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll index 65838c1f4b02..24bedad51ae1 100644 --- a/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll +++ b/llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll @@ -74,23 +74,25 @@ define dso_local void @test1(i16* noalias nocapture %points, i16* noalias nocapt ; ; ENABLED_MASKED_STRIDED-LABEL: @test1( ; ENABLED_MASKED_STRIDED-NEXT: entry: +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = bitcast i16* [[TMP0]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP1]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = bitcast i16* [[TMP3]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP4]], align 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP2]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, <4 x i16>* [[TMP2]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP5]], align 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i64 [[TMP3]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP6]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = bitcast i16* [[TMP7]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> [[WIDE_LOAD1]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP6]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP8]], i32 2, <16 x i1> <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false>) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end: ; ENABLED_MASKED_STRIDED-NEXT: ret void ; @@ -244,29 +246,31 @@ define dso_local void @test2(i16* noalias nocapture %points, i32 %numPoints, i16 ; ENABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer +; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 -1 ; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]] ; ENABLED_MASKED_STRIDED: vector.body: ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0 ; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer ; ENABLED_MASKED_STRIDED-NEXT: [[INDUCTION:%.*]] = or <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = bitcast i16* [[TMP1]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP2]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = shl nsw i64 [[INDEX]], 2 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = bitcast i16* [[TMP4]] to <4 x i16>* -; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP5]], i32 2, <4 x i1> [[TMP0]], <4 x i16> poison) -; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, i16* [[POINTS:%.*]], i64 [[TMP3]] -; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = bitcast i16* [[TMP6]] to <16 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = getelementptr inbounds i16, i16* [[X:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = bitcast i16* [[TMP2]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP3]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = shl nsw i64 [[INDEX]], 2 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, i16* [[Y:%.*]], i64 [[INDEX]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = bitcast i16* [[TMP5]] to <4 x i16>* +; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* [[TMP6]], i32 2, <4 x i1> [[TMP1]], <4 x i16> poison) +; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], 1 +; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, i16* [[TMP0]], i64 [[TMP7]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = bitcast i16* [[TMP8]] to <16 x i16>* ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_MASKED_LOAD]], <4 x i16> [[WIDE_MASKED_LOAD3]], <16 x i32> <i32 0, i32 4, i32 undef, i32 undef, i32 1, i32 5, i32 undef, i32 undef, i32 2, i32 6, i32 undef, i32 undef, i32 3, i32 7, i32 undef, i32 undef> -; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP0]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> -; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> -; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP7]], i32 2, <16 x i1> [[TMP8]]) +; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3> +; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 true, i1 false, i1 false> +; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> [[INTERLEAVED_VEC]], <16 x i16>* [[TMP9]], i32 2, <16 x i1> [[TMP10]]) ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 -; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] +; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END_LOOPEXIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] ; ENABLED_MASKED_STRIDED: for.end.loopexit: ; ENABLED_MASKED_STRIDED-NEXT: br label [[FOR_END]] ; ENABLED_MASKED_STRIDED: for.end: diff --git a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll index 89c6efa6945c..0a127ad4ef88 100644 --- a/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll +++ b/llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll @@ -50,8 +50,8 @@ for.end: ; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] ; CHECK: %offset.idx = sub i64 %n, %index ; CHECK-NOT: getelementptr -; CHECK: %[[G0IDX:.+]] = add nsw i64 %offset.idx, -3 -; CHECK: getelementptr inbounds i32, i32* %a, i64 %[[G0IDX]] +; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3 +; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx ; CHECK-NOT: getelementptr ; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body ; diff --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll index e56b607342e6..3e77d76a26a7 100644 --- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll +++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll @@ -686,17 +686,19 @@ define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias ; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, <8 x i32>* [[TMP1]], align 4 ; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] +; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]] ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7> -; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] -; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[TMP4]] to <8 x i32>* -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]] +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 -1 +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP5]], i64 [[TMP2]] +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <8 x i32>* +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 -; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] +; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512 +; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -760,17 +762,19 @@ define void @mixed_load3_store3(i32* nocapture %A) { ; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10> ; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11> ; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]] -; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] -; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] -; CHECK-NEXT: [[TMP5:%.*]] = bitcast i32* [[NEXT_GEP]] to <12 x i32>* -; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> -; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> -; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> [[TMP7]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> -; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP5]], align 4 +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[NEXT_GEP]], i64 2 +; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]] +; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]] +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[TMP3]], i64 -2 +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[TMP6]] to <12 x i32>* +; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> +; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> +; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11> +; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4> -; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 -; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] +; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024 +; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -1315,21 +1319,23 @@ define void @PR27626_4(i32 *%a, i32 %x, i32 %y, i32 %z, i64 %n) { ; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 2 ; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 4 ; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 6 -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] -; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] -; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] -; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP6]], align 4 -; CHECK-NEXT: store i32 [[X]], i32* [[TMP7]], align 4 +; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i64 [[OFFSET_IDX]] +; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP3]] +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP4]] +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[TMP5]] +; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 -1 +; CHECK-NEXT: store i32 [[X:%.*]], i32* [[TMP7]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP8]], align 4 ; CHECK-NEXT: store i32 [[X]], i32* [[TMP9]], align 4 -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[A]], i64 [[OFFSET_IDX]] -; CHECK-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <8 x i32>* +; CHECK-NEXT: store i32 [[X]], i32* [[TMP10]], align 4 +; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TMP11]], i64 [[TMP6]] +; CHECK-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <8 x i32>* ; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> -; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP11]], align 4 +; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP13]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] +; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]] </cut>

4 years, 5 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain September 2021