linaro-toolchain

linaro-toolchain@lists.linaro.org

3 participants
5678 discussions

by Peter Maydell

Progress (a report covering two half-weeks) * UM-2 [QEMU upstream maintainership] - lots of code review - fixed another bug in the armv7m clock framework code - refactoring patchset to trim some fat from a header that gets included by every C file in the build * QEMU-420 [GICv4 emulation] - CPU interface parts of GICv4 work are code-complete - started on the redistributor work -- PMM

4 years

[ACTIVITY] week ending Feb. 13 2022

by Alex Bennée

Project Stratos =============== - posted Metadata and signalling channels for Zephyr virtio-backends on Xen Message-Id: <87h79bgd1m.fsf(a)linaro.org> - spent some time troubleshooting Xen builds with Viresh vhost-device maintainer effort ([UM-196]) - posted [a pull request in rust-vmm/community] [a pull request in rust-vmm/community] <elfeed:github.com#tag:github.com,2008:PullRequestEvent/20180885703> QEMU Upstream Work ([UM-2]) =========================== - posted [RFC PATCH] tcg/optimize: only read val after const check Message-Id: <20220209112142.3367525-1-alex.bennee(a)linaro.org> - posted [PULL 00/28] testing and plugin updates Message-Id: <20220209141529.3418384-1-alex.bennee(a)linaro.org> - triage for [qemu-x86_64 uses host libraries instead of emulated system libraries] - triage for [linux-user: substantial memory leak when threads are created and destroyed] - posted [RFC PATCH] linux-user: trap internal SIGABRT's Message-Id: <20220209112207.3368139-1-alex.bennee(a)linaro.org> - posted [PATCH v5 0/2] semihosting/next (SYS_HEAPINFO) Message-Id: <20220210113021.3799514-2-alex.bennee(a)linaro.org> - posted [PATCH v1 00/11] testing/next (docker, lcitool, ci, tcg) Message-Id: <20220211160309.335014-1-alex.bennee(a)linaro.org> [UM-2] <https://linaro.atlassian.net/browse/UM-2> [qemu-x86_64 uses host libraries instead of emulated system libraries] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/857> [linux-user: substantial memory leak when threads are created and destroyed] <elfeed:gitlab.com#https://gitlab.com/qemu-project/qemu/-/issues/866> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Completed Reviews [0/0] ======================= Absences ======== Current Review Queue ==================== TODO [PATCH v6 00/43] CXl 2.0 emulation Support Message-Id: <20220211120747.3074-1-Jonathan.Cameron(a)huawei.com> ============================================================================================================== TODO [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features Message-Id: <20220210040423.95120-1-richard.henderson(a)linaro.org> ==================================================================================================================================== TODO [PATCH v4 00/41] linux-user: Streamline handling of SIGSEGV Message-Id: <20211006172307.780893-1-richard.henderson(a)linaro.org> ================================================================================================================================== -- Alex Bennée

4 years

Re: [TCWG CI] 401.bzip2 grew in size by 9% after llvm: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`

by Maxim Kuvyrkov

Hi Roman, Your below patch increased code-size of 401.bzip2 by 9% on 32-bit ARM when compiled with -Os. That’s quite a lot, would you please investigate whether this regression can be avoided? Please let me know if this doesn’t reproduce for you and I’ll try to help. Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On 9 Feb 2022, at 17:10, ci_notify(a)linaro.org wrote: > > After llvm commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > the following benchmarks grew in size by more than 1%: > - 401.bzip2 grew in size by 9% from 37909 to 41405 bytes > - 401.bzip2:[.] BZ2_decompress grew in size by 42% from 7664 to 10864 bytes > - 429.mcf grew in size by 2% from 7732 to 7908 bytes > > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. > > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Configuration: > - Benchmark: SPEC CPU2006 > - Toolchain: Clang + Glibc + LLVM Linker > - Version: all components were built from their tip of trunk > - Target: arm-linux-gnueabihf > - Compiler flags: -Os -mthumb > - Hardware: APM Mustang 8x X-Gene1 > > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. > > This commit has regressed these CI configurations: > - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTO > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os > - tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTO > > First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… > > Reproduce builds: > <cut> > mkdir investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > cd investigate-llvm-77a0da926c9ea86afa9baf28158d79c7678fc6b9 > > # Fetch scripts > git clone https://git.linaro.org/toolchain/jenkins-scripts > > # Fetch manifests and test.sh script > mkdir -p artifacts/manifests > curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail > chmod +x artifacts/test.sh > > # Reproduce the baseline build (build all pre-requisites) > ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh > > # Save baseline build state (which is then restored in artifacts/test.sh) > mkdir -p ./bisect > rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ > > cd llvm > > # Reproduce first_bad build > git checkout --detach 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > ../artifacts/test.sh > > # Reproduce last_good build > git checkout --detach f59787084e09aeb787cb3be3103b2419ccd14163 > ../artifacts/test.sh > > cd .. > </cut> > > Full commit (up to 1000 lines): > <cut> > commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 > Author: Roman Lebedev <lebedev.ri(a)gmail.com> > Date: Mon Feb 7 16:03:40 2022 +0300 > > [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` > > D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. > What it essentially does is prevents scalarized vectorization of masked memory operations: > ``` > // TODO: Cost model for emulated masked load/store is completely > // broken. This hack guides the cost model to use an artificially > // high enough value to practically disable vectorization with such > // operations, except where previously deployed legality hack allowed > // using very low cost values. This is to avoid regressions coming simply > // from moving "masked load/store" check from legality to cost model. > // Masked Load/Gather emulation was previously never allowed. > // Limited number of Masked Store/Scatter emulation was allowed. > ``` > > While i don't really understand about what specifically `is completely broken` > was talking about, i believe that at least on X86 with AVX2-or-later, > this is no longer true. (or at least, i would like to know what is still broken). > So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. > > But since this was added for X86 specifically, let's just instead completely remove this hack. > > Reviewed By: RKSimon > > Differential Revision: https://reviews.llvm.org/D114779 > --- > llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 34 +- > .../X86/masked-gather-i32-with-i8-index.ll | 40 +- > .../X86/masked-gather-i64-with-i8-index.ll | 40 +- > .../CostModel/X86/masked-interleaved-load-i16.ll | 36 +- > .../CostModel/X86/masked-interleaved-store-i16.ll | 24 +- > .../test/Analysis/CostModel/X86/masked-load-i16.ll | 46 +- > .../test/Analysis/CostModel/X86/masked-load-i32.ll | 16 +- > .../test/Analysis/CostModel/X86/masked-load-i64.ll | 16 +- > llvm/test/Analysis/CostModel/X86/masked-load-i8.ll | 46 +- > .../AArch64/tail-fold-uniform-memops.ll | 159 ++- > .../Transforms/LoopVectorize/X86/gather_scatter.ll | 1176 ++++++++++++++++---- > .../X86/x86-interleaved-accesses-masked-group.ll | 1041 ++++++++--------- > .../Transforms/LoopVectorize/if-pred-stores.ll | 6 +- > .../Transforms/LoopVectorize/memdep-fold-tail.ll | 6 +- > llvm/test/Transforms/LoopVectorize/optsize.ll | 837 +++++++++++--- > llvm/test/Transforms/LoopVectorize/tripcount.ll | 673 ++++++++++- > .../LoopVectorize/vplan-sink-scalars-and-merge.ll | 4 +- > 17 files changed, 3064 insertions(+), 1136 deletions(-) > > diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > index bfe08d42c883..ccce2c2a7b15 100644 > --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp > @@ -307,11 +307,6 @@ static cl::opt<bool> InterleaveSmallLoopScalarReduction( > cl::desc("Enable interleaving for loops with small iteration counts that " > "contain scalar reductions to expose ILP.")); > > -/// The number of stores in a loop that are allowed to need predication. > -static cl::opt<unsigned> NumberOfStoresToPredicate( > - "vectorize-num-stores-pred", cl::init(1), cl::Hidden, > - cl::desc("Max number of stores to be predicated behind an if.")); > - > static cl::opt<bool> EnableIndVarRegisterHeur( > "enable-ind-var-reg-heur", cl::init(true), cl::Hidden, > cl::desc("Count the induction variable only once when interleaving")); > @@ -1797,10 +1792,6 @@ private: > /// as a vector operation. > bool isConsecutiveLoadOrStore(Instruction *I); > > - /// Returns true if an artificially high cost for emulated masked memrefs > - /// should be used. > - bool useEmulatedMaskMemRefHack(Instruction *I, ElementCount VF); > - > /// Map of scalar integer values to the smallest bitwidth they can be legally > /// represented as. The vector equivalents of these values should be truncated > /// to this type. > @@ -6437,22 +6428,6 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) { > return RUs; > } > > -bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I, > - ElementCount VF) { > - // TODO: Cost model for emulated masked load/store is completely > - // broken. This hack guides the cost model to use an artificially > - // high enough value to practically disable vectorization with such > - // operations, except where previously deployed legality hack allowed > - // using very low cost values. This is to avoid regressions coming simply > - // from moving "masked load/store" check from legality to cost model. > - // Masked Load/Gather emulation was previously never allowed. > - // Limited number of Masked Store/Scatter emulation was allowed. > - assert(isPredicatedInst(I, VF) && "Expecting a scalar emulated instruction"); > - return isa<LoadInst>(I) || > - (isa<StoreInst>(I) && > - NumPredStores > NumberOfStoresToPredicate); > -} > - > void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > // If we aren't vectorizing the loop, or if we've already collected the > // instructions to scalarize, there's nothing to do. Collection may already > @@ -6478,9 +6453,7 @@ void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) { > ScalarCostsTy ScalarCosts; > // Do not apply discount if scalable, because that would lead to > // invalid scalarization costs. > - // Do not apply discount logic if hacked cost is needed > - // for emulated masked memrefs. > - if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I, VF) && > + if (!VF.isScalable() && > computePredInstDiscount(&I, ScalarCosts, VF) >= 0) > ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end()); > // Remember that BB will remain after vectorization. > @@ -6736,11 +6709,6 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I, > Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()), > /*Insert=*/false, /*Extract=*/true); > Cost += TTI.getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput); > - > - if (useEmulatedMaskMemRefHack(I, VF)) > - // Artificially setting to a high enough value to practically disable > - // vectorization with such operations. > - Cost = 3000000; > } > > return Cost; > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > index 62412a5d1af0..c52755b7d65c 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX1: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; AVX512: LV: Found an estimated cost of 22 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > index b8eba8b0327b..b38026c824b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll > @@ -17,30 +17,30 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX1: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > @@ -50,8 +50,8 @@ target triple = "x86_64-unknown-linux-gnu" > ; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; AVX512: LV: Found an estimated cost of 24 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > index d6bfdf9d3848..184e23a0128b 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll > @@ -89,30 +89,30 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2 > @@ -164,17 +164,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test" > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 7 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2 > > define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) { > diff --git a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > index 5f67026737fc..224dd75a4dc5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll > @@ -89,17 +89,17 @@ for.end: > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2 > ; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 23 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 50 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > +; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > ; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2" > ; > @@ -107,16 +107,16 @@ for.end: > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2 > ; > ; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2 > -; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > +; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2 > > define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) { > entry: > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > index c8c3078f1625..2722a52c3d96 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i16.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE2: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; SSE42: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX1: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > index f74c9f044d0b..16c00cfc03b5 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i32.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > +; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > ; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > index c5a7825348e9..1baeff242304 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i64.ll > @@ -16,16 +16,16 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > +; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > ; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8 > diff --git a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > index fc540da58700..99d0f28a03f8 100644 > --- a/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > +++ b/llvm/test/Analysis/CostModel/X86/masked-load-i8.ll > @@ -16,37 +16,37 @@ target triple = "x86_64-unknown-linux-gnu" > ; CHECK: LV: Checking a loop in "test" > ; > ; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE2: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; SSE42: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX1: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-SLOWGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > -; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > +; AVX2-FASTGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; > ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > ; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1 > diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > index bf0aba1931d1..8ce310962b48 100644 > --- a/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > +++ b/llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll > @@ -1,3 +1,4 @@ > +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py > ; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s | FileCheck %s > > ; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64 > @@ -9,21 +10,43 @@ target triple = "aarch64-linux-gnu" > ; we don't artificially create new predicated blocks for the load. > define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 { > ; CHECK-LABEL: @uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] > +; CHECK: vector.ph: > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK-NEXT: [[LOAD_VAL:%.*]] = load i32, i32* %src, align 4 > -; CHECK-NOT: load i32, i32* %src, align 4 > -; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0 > -; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer > -; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, i32* %dst, i64 [[TMP3]] > -; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TMP6]], i32 0 > -; CHECK-NEXT: [[STORE_PTR:%.*]] = bitcast i32* [[TMP7]] to <4 x i32>* > -; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP5]], <4 x i32>* [[STORE_PTR]], i32 4, <4 x i1> [[LOOP_PRED]]) > -; CHECK-NEXT: [[IDX_NEXT]] = add i64 [[IDX]], 4 > -; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IDX_NEXT]], %n.vec > -; CHECK-NEXT: br i1 [[CMP]], label %middle.block, label %vector.body > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[SRC:%.*]], align 4 > +; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0 > +; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[TMP2]], i32 0 > +; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[BROADCAST_SPLAT]], <4 x i32>* [[TMP4]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]]) > +; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 > +; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] > +; CHECK-NEXT: [[VAL:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDVARS_IV]] > +; CHECK-NEXT: store i32 [[VAL]], i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > > entry: > br label %for.body > @@ -47,18 +70,108 @@ for.end: ; preds = %for.body, %entry > ; and the original condition. > define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 { > ; CHECK-LABEL: @cond_uniform_load( > +; CHECK-NEXT: entry: > +; CHECK-NEXT: [[DST1:%.*]] = bitcast i32* [[DST:%.*]] to i8* > +; CHECK-NEXT: [[COND3:%.*]] = bitcast i32* [[COND:%.*]] to i8* > +; CHECK-NEXT: [[SRC6:%.*]] = bitcast i32* [[SRC:%.*]] to i8* > +; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]] > +; CHECK: vector.memcheck: > +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i32, i32* [[DST]], i64 [[N:%.*]] > +; CHECK-NEXT: [[SCEVGEP2:%.*]] = bitcast i32* [[SCEVGEP]] to i8* > +; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i32, i32* [[COND]], i64 [[N]] > +; CHECK-NEXT: [[SCEVGEP45:%.*]] = bitcast i32* [[SCEVGEP4]] to i8* > +; CHECK-NEXT: [[SCEVGEP7:%.*]] = getelementptr i32, i32* [[SRC]], i64 1 > +; CHECK-NEXT: [[SCEVGEP78:%.*]] = bitcast i32* [[SCEVGEP7]] to i8* > +; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP45]] > +; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult i8* [[COND3]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] > +; CHECK-NEXT: [[BOUND09:%.*]] = icmp ult i8* [[DST1]], [[SCEVGEP78]] > +; CHECK-NEXT: [[BOUND110:%.*]] = icmp ult i8* [[SRC6]], [[SCEVGEP2]] > +; CHECK-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]] > +; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]] > +; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]] > ; CHECK: vector.ph: > -; CHECK: [[TMP1:%.*]] = insertelement <4 x i32*> poison, i32* %src, i32 0 > -; CHECK-NEXT: [[SRC_SPLAT:%.*]] = shufflevector <4 x i32*> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer > +; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], 3 > +; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4 > +; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] > +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] > ; CHECK: vector.body: > -; CHECK-NEXT: [[IDX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.*]], %vector.body ] > -; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0 > -; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n) > -; CHECK: [[COND_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison) > -; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer > +; CHECK-NEXT: [[INDEX12:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT19:%.*]], [[PRED_LOAD_CONTINUE18:%.*]] ] > +; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX12]], 0 > +; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]]) > +; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP1]], i32 0 > +; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32* [[TMP2]] to <4 x i32>* > +; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison), !alias.scope !4 > +; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer > ; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true> > -; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > -; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef) > +; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP6]], i32 0 > +; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; CHECK: pred.load.if: > +; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP8]], i32 0 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; CHECK: pred.load.continue: > +; CHECK-NEXT: [[TMP10:%.*]] = phi <4 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP9]], [[PRED_LOAD_IF]] ] > +; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP6]], i32 1 > +; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_LOAD_IF13:%.*]], label [[PRED_LOAD_CONTINUE14:%.*]] > +; CHECK: pred.load.if13: > +; CHECK-NEXT: [[TMP12:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP12]], i32 1 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]] > +; CHECK: pred.load.continue14: > +; CHECK-NEXT: [[TMP14:%.*]] = phi <4 x i32> [ [[TMP10]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF13]] ] > +; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP6]], i32 2 > +; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_LOAD_IF15:%.*]], label [[PRED_LOAD_CONTINUE16:%.*]] > +; CHECK: pred.load.if15: > +; CHECK-NEXT: [[TMP16:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> [[TMP14]], i32 [[TMP16]], i32 2 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]] > +; CHECK: pred.load.continue16: > +; CHECK-NEXT: [[TMP18:%.*]] = phi <4 x i32> [ [[TMP14]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP17]], [[PRED_LOAD_IF15]] ] > +; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i1> [[TMP6]], i32 3 > +; CHECK-NEXT: br i1 [[TMP19]], label [[PRED_LOAD_IF17:%.*]], label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.if17: > +; CHECK-NEXT: [[TMP20:%.*]] = load i32, i32* [[SRC]], align 4, !alias.scope !7 > +; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP20]], i32 3 > +; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]] > +; CHECK: pred.load.continue18: > +; CHECK-NEXT: [[TMP22:%.*]] = phi <4 x i32> [ [[TMP18]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP21]], [[PRED_LOAD_IF17]] ] > +; CHECK-NEXT: [[TMP23:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP4]], <4 x i1> zeroinitializer > +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP23]], <4 x i32> zeroinitializer, <4 x i32> [[TMP22]] > +; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[TMP0]] > +; CHECK-NEXT: [[TMP25:%.*]] = or <4 x i1> [[TMP6]], [[TMP23]] > +; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i32, i32* [[TMP24]], i32 0 > +; CHECK-NEXT: [[TMP27:%.*]] = bitcast i32* [[TMP26]] to <4 x i32>* > +; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[PREDPHI]], <4 x i32>* [[TMP27]], i32 4, <4 x i1> [[TMP25]]), !alias.scope !9, !noalias !11 > +; CHECK-NEXT: [[INDEX_NEXT19]] = add i64 [[INDEX12]], 4 > +; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT19]], [[N_VEC]] > +; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] > +; CHECK: middle.block: > +; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] > +; CHECK: scalar.ph: > +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ] > +; CHECK-NEXT: br label [[FOR_BODY:%.*]] > +; CHECK: for.body: > +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] > +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[COND]], i64 [[INDEX]] > +; CHECK-NEXT: [[TMP29:%.*]] = load i32, i32* [[ARRAYIDX]], align 4 > +; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP29]], 0 > +; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END]], label [[IF_THEN:%.*]] > +; CHECK: if.then: > +; CHECK-NEXT: [[TMP30:%.*]] = load i32, i32* [[SRC]], align 4 > +; CHECK-NEXT: br label [[IF_END]] > +; CHECK: if.end: > +; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP30]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ] > +; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 [[INDEX]] > +; CHECK-NEXT: store i32 [[VAL_0]], i32* [[ARRAYIDX1]], align 4 > +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 > +; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N]] > +; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]] > +; CHECK: for.end: > +; CHECK-NEXT: ret void > +; > entry: > br label %for.body > > diff --git a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > index def98e03030f..d13942e85466 100644 > --- a/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > +++ b/llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll > @@ -25,22 +25,22 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: iter.check: > ; AVX512-NEXT: br label [[VECTOR_BODY:%.*]] > ; AVX512: vector.body: > -; AVX512-NEXT: [[INDEX8:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > -; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ITER_CHECK:%.*]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ] > +; AVX512-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i32>, <16 x i32>* [[TMP1]], align 4 > ; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer > -; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP3:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison) > ; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64> > ; AVX512-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <16 x i64> [[TMP5]] > ; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef) > ; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01> > -; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX8]] > +; AVX512-NEXT: [[TMP8:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > ; AVX512-NEXT: [[TMP9:%.*]] = bitcast float* [[TMP8]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]]) > -; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX8]], 16 > +; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16 > ; AVX512-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP11:%.*]] = bitcast i32* [[TMP10]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_1:%.*]] = load <16 x i32>, <16 x i32>* [[TMP11]], align 4 > @@ -55,7 +55,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP18:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT]] > ; AVX512-NEXT: [[TMP19:%.*]] = bitcast float* [[TMP18]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]]) > -; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX8]], 32 > +; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32 > ; AVX512-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP21:%.*]] = bitcast i32* [[TMP20]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_2:%.*]] = load <16 x i32>, <16 x i32>* [[TMP21]], align 4 > @@ -70,7 +70,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP28:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_1]] > ; AVX512-NEXT: [[TMP29:%.*]] = bitcast float* [[TMP28]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]]) > -; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX8]], 48 > +; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48 > ; AVX512-NEXT: [[TMP30:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP31:%.*]] = bitcast i32* [[TMP30]] to <16 x i32>* > ; AVX512-NEXT: [[WIDE_LOAD_3:%.*]] = load <16 x i32>, <16 x i32>* [[TMP31]], align 4 > @@ -85,7 +85,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; AVX512-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[OUT]], i64 [[INDEX_NEXT_2]] > ; AVX512-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <16 x float>* > ; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]]) > -; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX8]], 64 > +; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64 > ; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096 > ; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; AVX512: for.end: > @@ -95,8 +95,8 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX17:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] > -; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_LOAD_CONTINUE27:%.*]] ] > +; FVW2-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP1:%.*]] = bitcast i32* [[TMP0]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 4 > ; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TMP0]], i64 2 > @@ -112,7 +112,7 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer > ; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer > ; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer > -; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX17]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr i32, i32* [[INDEX:%.*]], i64 [[INDEX7]] > ; FVW2-NEXT: [[TMP13:%.*]] = bitcast i32* [[TMP12]] to <2 x i32>* > ; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison) > ; FVW2-NEXT: [[TMP14:%.*]] = getelementptr i32, i32* [[TMP12]], i64 2 > @@ -128,33 +128,105 @@ define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger > ; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64> > ; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64> > ; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64> > -; FVW2-NEXT: [[TMP24:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], <2 x i64> [[TMP20]] > -; FVW2-NEXT: [[TMP25:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP21]] > -; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP22]] > -; FVW2-NEXT: [[TMP27:%.*]] = getelementptr inbounds float, float* [[IN]], <2 x i64> [[TMP23]] > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef) > -; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef) > -; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP32:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX17]] > -; FVW2-NEXT: [[TMP33:%.*]] = bitcast float* [[TMP32]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]]) > -; FVW2-NEXT: [[TMP34:%.*]] = getelementptr float, float* [[TMP32]], i64 2 > -; FVW2-NEXT: [[TMP35:%.*]] = bitcast float* [[TMP34]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]]) > -; FVW2-NEXT: [[TMP36:%.*]] = getelementptr float, float* [[TMP32]], i64 4 > -; FVW2-NEXT: [[TMP37:%.*]] = bitcast float* [[TMP36]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]]) > -; FVW2-NEXT: [[TMP38:%.*]] = getelementptr float, float* [[TMP32]], i64 6 > -; FVW2-NEXT: [[TMP39:%.*]] = bitcast float* [[TMP38]] to <2 x float>* > -; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]]) > -; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX17]], 8 > -; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > -; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > +; FVW2-NEXT: [[TMP24:%.*]] = extractelement <2 x i1> [[TMP8]], i64 0 > +; FVW2-NEXT: br i1 [[TMP24]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] > +; FVW2: pred.load.if: > +; FVW2-NEXT: [[TMP25:%.*]] = extractelement <2 x i64> [[TMP20]], i64 0 > +; FVW2-NEXT: [[TMP26:%.*]] = getelementptr inbounds float, float* [[IN:%.*]], i64 [[TMP25]] > +; FVW2-NEXT: [[TMP27:%.*]] = load float, float* [[TMP26]], align 4 > +; FVW2-NEXT: [[TMP28:%.*]] = insertelement <2 x float> poison, float [[TMP27]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]] > +; FVW2: pred.load.continue: > +; FVW2-NEXT: [[TMP29:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP28]], [[PRED_LOAD_IF]] ] > +; FVW2-NEXT: [[TMP30:%.*]] = extractelement <2 x i1> [[TMP8]], i64 1 > +; FVW2-NEXT: br i1 [[TMP30]], label [[PRED_LOAD_IF14:%.*]], label [[PRED_LOAD_CONTINUE15:%.*]] > +; FVW2: pred.load.if14: > +; FVW2-NEXT: [[TMP31:%.*]] = extractelement <2 x i64> [[TMP20]], i64 1 > +; FVW2-NEXT: [[TMP32:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP31]] > +; FVW2-NEXT: [[TMP33:%.*]] = load float, float* [[TMP32]], align 4 > +; FVW2-NEXT: [[TMP34:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP33]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]] > +; FVW2: pred.load.continue15: > +; FVW2-NEXT: [[TMP35:%.*]] = phi <2 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], [[PRED_LOAD_IF14]] ] > +; FVW2-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP9]], i64 0 > +; FVW2-NEXT: br i1 [[TMP36]], label [[PRED_LOAD_IF16:%.*]], label [[PRED_LOAD_CONTINUE17:%.*]] > +; FVW2: pred.load.if16: > +; FVW2-NEXT: [[TMP37:%.*]] = extractelement <2 x i64> [[TMP21]], i64 0 > +; FVW2-NEXT: [[TMP38:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP37]] > +; FVW2-NEXT: [[TMP39:%.*]] = load float, float* [[TMP38]], align 4 > +; FVW2-NEXT: [[TMP40:%.*]] = insertelement <2 x float> poison, float [[TMP39]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]] > +; FVW2: pred.load.continue17: > +; FVW2-NEXT: [[TMP41:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE15]] ], [ [[TMP40]], [[PRED_LOAD_IF16]] ] > +; FVW2-NEXT: [[TMP42:%.*]] = extractelement <2 x i1> [[TMP9]], i64 1 > +; FVW2-NEXT: br i1 [[TMP42]], label [[PRED_LOAD_IF18:%.*]], label [[PRED_LOAD_CONTINUE19:%.*]] > +; FVW2: pred.load.if18: > +; FVW2-NEXT: [[TMP43:%.*]] = extractelement <2 x i64> [[TMP21]], i64 1 > +; FVW2-NEXT: [[TMP44:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP43]] > +; FVW2-NEXT: [[TMP45:%.*]] = load float, float* [[TMP44]], align 4 > +; FVW2-NEXT: [[TMP46:%.*]] = insertelement <2 x float> [[TMP41]], float [[TMP45]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]] > +; FVW2: pred.load.continue19: > +; FVW2-NEXT: [[TMP47:%.*]] = phi <2 x float> [ [[TMP41]], [[PRED_LOAD_CONTINUE17]] ], [ [[TMP46]], [[PRED_LOAD_IF18]] ] > +; FVW2-NEXT: [[TMP48:%.*]] = extractelement <2 x i1> [[TMP10]], i64 0 > +; FVW2-NEXT: br i1 [[TMP48]], label [[PRED_LOAD_IF20:%.*]], label [[PRED_LOAD_CONTINUE21:%.*]] > +; FVW2: pred.load.if20: > +; FVW2-NEXT: [[TMP49:%.*]] = extractelement <2 x i64> [[TMP22]], i64 0 > +; FVW2-NEXT: [[TMP50:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP49]] > +; FVW2-NEXT: [[TMP51:%.*]] = load float, float* [[TMP50]], align 4 > +; FVW2-NEXT: [[TMP52:%.*]] = insertelement <2 x float> poison, float [[TMP51]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]] > +; FVW2: pred.load.continue21: > +; FVW2-NEXT: [[TMP53:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE19]] ], [ [[TMP52]], [[PRED_LOAD_IF20]] ] > +; FVW2-NEXT: [[TMP54:%.*]] = extractelement <2 x i1> [[TMP10]], i64 1 > +; FVW2-NEXT: br i1 [[TMP54]], label [[PRED_LOAD_IF22:%.*]], label [[PRED_LOAD_CONTINUE23:%.*]] > +; FVW2: pred.load.if22: > +; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i64> [[TMP22]], i64 1 > +; FVW2-NEXT: [[TMP56:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP55]] > +; FVW2-NEXT: [[TMP57:%.*]] = load float, float* [[TMP56]], align 4 > +; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> [[TMP53]], float [[TMP57]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE23]] > +; FVW2: pred.load.continue23: > +; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ [[TMP53]], [[PRED_LOAD_CONTINUE21]] ], [ [[TMP58]], [[PRED_LOAD_IF22]] ] > +; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP11]], i64 0 > +; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF24:%.*]], label [[PRED_LOAD_CONTINUE25:%.*]] > +; FVW2: pred.load.if24: > +; FVW2-NEXT: [[TMP61:%.*]] = extractelement <2 x i64> [[TMP23]], i64 0 > +; FVW2-NEXT: [[TMP62:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP61]] > +; FVW2-NEXT: [[TMP63:%.*]] = load float, float* [[TMP62]], align 4 > +; FVW2-NEXT: [[TMP64:%.*]] = insertelement <2 x float> poison, float [[TMP63]], i64 0 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE25]] > +; FVW2: pred.load.continue25: > +; FVW2-NEXT: [[TMP65:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE23]] ], [ [[TMP64]], [[PRED_LOAD_IF24]] ] > +; FVW2-NEXT: [[TMP66:%.*]] = extractelement <2 x i1> [[TMP11]], i64 1 > +; FVW2-NEXT: br i1 [[TMP66]], label [[PRED_LOAD_IF26:%.*]], label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.if26: > +; FVW2-NEXT: [[TMP67:%.*]] = extractelement <2 x i64> [[TMP23]], i64 1 > +; FVW2-NEXT: [[TMP68:%.*]] = getelementptr inbounds float, float* [[IN]], i64 [[TMP67]] > +; FVW2-NEXT: [[TMP69:%.*]] = load float, float* [[TMP68]], align 4 > +; FVW2-NEXT: [[TMP70:%.*]] = insertelement <2 x float> [[TMP65]], float [[TMP69]], i64 1 > +; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE27]] > +; FVW2: pred.load.continue27: > +; FVW2-NEXT: [[TMP71:%.*]] = phi <2 x float> [ [[TMP65]], [[PRED_LOAD_CONTINUE25]] ], [ [[TMP70]], [[PRED_LOAD_IF26]] ] > +; FVW2-NEXT: [[TMP72:%.*]] = fadd <2 x float> [[TMP35]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP73:%.*]] = fadd <2 x float> [[TMP47]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP74:%.*]] = fadd <2 x float> [[TMP59]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP71]], <float 5.000000e-01, float 5.000000e-01> > +; FVW2-NEXT: [[TMP76:%.*]] = getelementptr float, float* [[OUT:%.*]], i64 [[INDEX7]] > +; FVW2-NEXT: [[TMP77:%.*]] = bitcast float* [[TMP76]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP72]], <2 x float>* [[TMP77]], i32 4, <2 x i1> [[TMP8]]) > +; FVW2-NEXT: [[TMP78:%.*]] = getelementptr float, float* [[TMP76]], i64 2 > +; FVW2-NEXT: [[TMP79:%.*]] = bitcast float* [[TMP78]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP73]], <2 x float>* [[TMP79]], i32 4, <2 x i1> [[TMP9]]) > +; FVW2-NEXT: [[TMP80:%.*]] = getelementptr float, float* [[TMP76]], i64 4 > +; FVW2-NEXT: [[TMP81:%.*]] = bitcast float* [[TMP80]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP74]], <2 x float>* [[TMP81]], i32 4, <2 x i1> [[TMP10]]) > +; FVW2-NEXT: [[TMP82:%.*]] = getelementptr float, float* [[TMP76]], i64 6 > +; FVW2-NEXT: [[TMP83:%.*]] = bitcast float* [[TMP82]] to <2 x float>* > +; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP75]], <2 x float>* [[TMP83]], i32 4, <2 x i1> [[TMP11]]) > +; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8 > +; FVW2-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096 > +; FVW2-NEXT: br i1 [[TMP84]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] > ; FVW2: for.end: > ; FVW2-NEXT: ret void > ; > @@ -365,40 +437,186 @@ define void @foo2(%struct.In* noalias %in, float* noalias %out, i32* noalias %tr > ; FVW2-NEXT: entry: > ; FVW2-NEXT: br label [[VECTOR_BODY:%.*]] > ; FVW2: vector.body: > -; FVW2-NEXT: [[INDEX10:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE9:%.*]] ] > -; FVW2-NEXT: [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.*]], [[PRED_STORE_CONTINUE9]] ] > -; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4 > +; FVW2-NEXT: [[INDEX7:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE35:%.*]] ] > +; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4 > ; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16 > -; FVW2-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > -; FVW2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > -; FVW2-NEXT: [[TMP3:%.*]] = load i32, i32* [[TMP1]], align 4 > -; FVW2-NEXT: [[TMP4:%.*]] = load i32, i32* [[TMP2]], align 4 > -; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0 > -; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1 > -; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer > -; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_IN:%.*]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1 > -; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float*> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef) > -; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01> > -; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0 > -; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] > +; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32 > +; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48 > +; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64 > +; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80 > +; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96 > +; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112 > +; FVW2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER:%.*]], i64 [[OFFSET_IDX]] > +; FVW2-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP0]] > +; FVW2-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP1]] > +; FVW2-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP2]] > +; FVW2-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP3]] > +; FVW2-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP4]] > +; FVW2-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP5]] > +; FVW2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, i32* [[TRIGGER]], i64 [[TMP6]] > +; FVW2-NEXT: [[TMP15:%.*]] = load i32, i32* [[TMP7]], align 4 > +; FVW2-NEXT: [[TMP16:%.*]] = load i32, i32* [[TMP8]], align 4 > +; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0 > +; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1 > +; FVW2-NEXT: [[TMP19:%.*]] = load i32, i32* [[TMP9]], align 4 > +; FVW2-NEXT: [[TMP20:%.*]] = load i32, i32* [[TMP10]], align 4 > +; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0 > </cut>

4 years

[ACTIVITY] week ending Feb. 6 2022

by Alex Bennée

Project Stratos =============== - posted [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - need to increase coverage of the QEMU boilerplate to get it merged - discussions on next steps with SCMI backend with Vincent (moving from the QEMU->QEMU PoC) QEMU Upstream Work ([UM-2]) =========================== - posted [PATCH v2 00/25] testing and plugin updates Message-Id: <20220201182050.15087-1-alex.bennee(a)linaro.org> - posted [RFC PATCH 0/4] improve coverage of vector backend Message-Id: <20220202191242.652607-2-alex.bennee(a)linaro.org> - posted [PATCH v3 00/26] testing and plugins pre-PR Message-Id: <20220204204335.1689602-1-alex.bennee(a)linaro.org> - posted [RFC PATCH] arm: force flag recalculation when messing with DAIF Message-Id: <20220202122353.457084-1-alex.bennee(a)linaro.org> - trying to track down a weird TLS bug: <https://gitlab.com/stsquad/qemu/-/jobs/2056025874#L3532> - on aarch64 HW, running qemu-s390x with a simple test case fails every 100/200 times - seems TLS memory gets made non-accessible (rw-p -> ---p, except to gdb) - strace doesn't show a culprit, possible kernel bug? [UM-2] <https://linaro.atlassian.net/browse/UM-2> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> [QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52> Other ===== - planning and brainstorming for Linaro Tech Day Completed Reviews [5/5] ======================= [PATCH v4 00/42] CXl 2.0 emulation Support Message-Id: <20220124171705.10432-1-Jonathan.Cameron(a)huawei.com> [PATCH] gitlab: fall back to commit hash in qemu-setup filename Message-Id: <20220125173454.10381-1-stefanha(a)redhat.com> [PATCH for-7.0] gitlab-ci: Add cirrus-ci based tests for NetBSD and OpenBSD Message-Id: <20211209103124.121942-1-thuth(a)redhat.com> [PATCH 00/20] tcg: vector improvements Message-Id: <20211218194250.247633-1-richard.henderson(a)linaro.org> Absences ======== Current Review Queue ==================== TODO [PATCH 0/4] target/arm: SVE fixes versus VHE Message-Id: <20220127063428.30212-1-richard.henderson(a)linaro.org> ================================================================================================================== TODO [PATCH 00/14] arm_gicv3_its: Implement MOVI and MOVALL commands Message-Id: <20220122182444.724087-1-peter.maydell(a)linaro.org> ================================================================================================================================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== -- Alex Bennée

4 years, 1 month

[ACTIVITY] report week ending 4 Feb

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Fixed some minor issues with the hvf accelerator and sent out a patchset + '-cpu max' didn't act like '-cpu host' + we weren't exposing PAuth to the guest * QEMU-420 [GICv4 emulation] - Sent out a patchset with more cleanups and fixes to the existing ITS code - The ITS parts of the GICv4 work are now code-complete; moving on to the redistributor end of things next week. -- PMM

4 years, 1 month

[ACTIVITY] report week ending 28 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Before the QEMU 7.0 release we tried to land a bug fix which corrected the handling in our PSCI emulation of calls where the function ID is unrecognized -- these are supposed to return an error code. The bugfix turned out to cause regressions for some boards when running guest code at EL3 (because those boards were incorrectly enabling PSCI emulation in that situation). Sent a patchset that fixed those boards so we don't enable PSCI when running EL3 guests, and re-introduced the original PSCI bugfix. - Fixed various bugs in the highbank/midway boards discovered in the process of writing and testing the above patchset. (These two boards were the most complicated to fix.) - More code review, and sent out an arm pullrequest - Small handful of other minor patches -- PMM

4 years, 1 month

tsan buildbot failure possibly due to DWARFv5 switch

by David Blaikie

Seems like my change to make Clang default to DWARFv5 might've caused a buildbot failure on your build worker here: https://lab.llvm.org/buildbot/#/builders/185/builds/1295 But I seem to be able to run this test successfully locally on my Linux machine - so I'm wondering if you can offer any help diagnosing the issue showing up on your builder/worker?

4 years, 1 month

[ACTIVITY] week ending Jan. 23 2022

by Alex Bennée

Project Stratos =============== - [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working) Message-Id: <20220121151534.3654562-1-alex.bennee(a)linaro.org> - trying to clear the way for merging virtio-gpio to QEMU vhost-device maintainer effort ([UM-196]) - reviewed vhost-device [pr7 with the vm-virtio vsock abstraction] [UM-196] <https://linaro.atlassian.net/browse/UM-196> [pr7 with the vm-virtio vsock abstraction] <https://github.com/stsquad/vhost-device/tree/review/pr7-with-laurat-abstrac…> QEMU Upstream Work ([UM-2]) =========================== - posted [PULL v2 00/31] testing/next and other misc fixes Message-Id: <20220118190043.1427303-1-alex.bennee(a)linaro.org> Upstream MTTCG tests ([QEMU-52]) - still waiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity tests for ARM Message-Id: <20211202115352.951548-1-alex.bennee(a)linaro.org> Completed Reviews [2/2] ======================= [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> [PATCH v2 0/6] qtests/libqos: Allow PCI tests to be run with virt-machine Message-Id: <20220118203833.316741-7-eric.auger(a)redhat.com> Absences ======== Current Review Queue ==================== TODO [PATCH v11 0/8] hmp,qmp: Add commands to introspect virtio devices Message-Id: <1642678168-20447-1-git-send-email-jonah.palmer(a)oracle.com> ============================================================================================================================================== TODO [PATCH v2 00/13] arm gicv3 ITS: Various bug fixes and refactorings Message-Id: <20220111171048.3545974-1-peter.maydell(a)linaro.org> ====================================================================================================================================== TODO [PATCH v2 00/11] Atomic cleanup + clang-12 build fix Message-Id: <20210717014121.1784956-1-richard.henderson(a)linaro.org> ============================================================================================================================ TODO [PATCH 0/7] tcg: some small towards more modular tcg Message-Id: <20210804143826.3402872-1-kraxel(a)redhat.com> ================================================================================================================= -- Alex Bennée

4 years, 1 month

[ACTIVITY] report week ending 21 Jan

by Peter Maydell

Progress: * UM-2 [QEMU upstream maintainership] - Sent patches for some reported bugs to do with state save/load * QEMU-420 [GICv4 emulation] - Wrote patches to implement the missing MOVALL and MOVI commands - Fixed a few minor bugs noticed along the way - Should be able to send out a patchset early next week and then can get back to the new-in-GICv4 work -- PMM

4 years, 1 month

[TCWG CI] Regression caused by gcc: Add -Wdangling-pointer [PR63272].

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Add -Wdangling-pointer [PR63272].: commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 Author: Martin Sebor <msebor(a)redhat.com> Add -Wdangling-pointer [PR63272]. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21324 # First few build errors in logs: # 00:03:31 sound/core/oss/mixer_oss.c:1057:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:2488:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized] # 00:03:32 sound/core/oss/pcm_oss.c:2998:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:03:35 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/mixer_oss.o] Error 1 # 00:03:35 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:03:35 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:03:36 make[4]: *** [scripts/Makefile.build:277: sound/core/seq/oss/seq_oss_init.o] Error 1 # 00:03:40 make[3]: *** [scripts/Makefile.build:277: sound/core/oss/pcm_oss.o] Error 1 # 00:03:50 make[3]: *** [scripts/Makefile.build:540: sound/core/seq/oss] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21354 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-stable-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… Reproduce builds: <cut> mkdir investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 cd investigate-gcc-9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-stable-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 671a283636de75f7ed638ee6b01ed2d44361b8b6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 9d6a0f388eb048f8d87f47af78f07b5ce513bfe6 Author: Martin Sebor <msebor(a)redhat.com> Date: Sat Jan 15 16:41:40 2022 -0700 Add -Wdangling-pointer [PR63272]. Resolves: PR c/63272 - GCC should warn when using pointer to dead scoped variable with in the same function gcc/c-family/ChangeLog: PR c/63272 * c.opt (-Wdangling-pointer): New option. gcc/ChangeLog: PR c/63272 * diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle -Wdangling-pointer. * doc/invoke.texi (-Wdangling-pointer): Document new option. * gimple-ssa-warn-access.cc (pass_waccess::clone): Set new member. (pass_waccess::check_pointer_uses): New function. (pass_waccess::gimple_call_return_arg): New function. (pass_waccess::gimple_call_return_arg_ref): New function. (pass_waccess::check_call_dangling): New function. (pass_waccess::check_dangling_uses): New function overloads. (pass_waccess::check_dangling_stores): New function. (pass_waccess::check_dangling_stores): New function. (pass_waccess::m_clobbers): New data member. (pass_waccess::m_func): New data member. (pass_waccess::m_run_number): New data member. (pass_waccess::m_check_dangling_p): New data member. (pass_waccess::check_alloca): Check m_early_checks_p. (pass_waccess::check_alloc_size_call): Same. (pass_waccess::check_strcat): Same. (pass_waccess::check_strncat): Same. (pass_waccess::check_stxcpy): Same. (pass_waccess::check_stxncpy): Same. (pass_waccess::check_strncmp): Same. (pass_waccess::check_memop_access): Same. (pass_waccess::check_read_access): Same. (pass_waccess::check_builtin): Call check_pointer_uses. (pass_waccess::warn_invalid_pointer): Add arguments. (is_auto_decl): New function. (pass_waccess::check_stmt): New function. (pass_waccess::check_block): Call check_stmt. (pass_waccess::execute): Call check_dangling_uses, check_dangling_stores. Empty m_clobbers. * passes.def (pass_warn_access): Invoke pass two more times. gcc/testsuite/ChangeLog: PR c/63272 * g++.dg/warn/Wfree-nonheap-object-6.C: Disable valid warnings. * g++.dg/warn/ref-temp1.C: Prune expected warning. * gcc.dg/uninit-pr50476.c: Expect a new warning. * c-c++-common/Wdangling-pointer-2.c: New test. * c-c++-common/Wdangling-pointer-3.c: New test. * c-c++-common/Wdangling-pointer-4.c: New test. * c-c++-common/Wdangling-pointer-5.c: New test. * c-c++-common/Wdangling-pointer-6.c: New test. * c-c++-common/Wdangling-pointer.c: New test. * g++.dg/warn/Wdangling-pointer-2.C: New test. * g++.dg/warn/Wdangling-pointer.C: New test. * gcc.dg/Wdangling-pointer-2.c: New test. * gcc.dg/Wdangling-pointer.c: New test. --- gcc/c-family/c.opt | 8 + gcc/diagnostic-spec.c | 1 + gcc/doc/invoke.texi | 62 +- gcc/gimple-ssa-warn-access.cc | 635 +++++++++++++++++++-- gcc/passes.def | 5 +- gcc/testsuite/c-c++-common/Wdangling-pointer-2.c | 437 ++++++++++++++ gcc/testsuite/c-c++-common/Wdangling-pointer-3.c | 64 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-4.c | 73 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-5.c | 90 +++ gcc/testsuite/c-c++-common/Wdangling-pointer-6.c | 32 ++ gcc/testsuite/c-c++-common/Wdangling-pointer.c | 434 ++++++++++++++ gcc/testsuite/g++.dg/warn/Wdangling-pointer-2.C | 23 + gcc/testsuite/g++.dg/warn/Wdangling-pointer.C | 74 +++ gcc/testsuite/g++.dg/warn/Wfree-nonheap-object-6.C | 4 +- gcc/testsuite/g++.dg/warn/ref-temp1.C | 3 + gcc/testsuite/gcc.dg/Wdangling-pointer-2.c | 82 +++ gcc/testsuite/gcc.dg/Wdangling-pointer.c | 75 +++ gcc/testsuite/gcc.dg/uninit-pr50476.c | 2 +- 18 files changed, 2043 insertions(+), 61 deletions(-) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 28363643664..db65c14a7a5 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -548,6 +548,14 @@ Wdangling-else C ObjC C++ ObjC++ Var(warn_dangling_else) Warning LangEnabledBy(C ObjC C++ ObjC++,Wparentheses) Warn about dangling else. +Wdangling-pointer +C ObjC C++ LTO ObjC++ Alias(Wdangling-pointer=, 2, 0) Warning +Warn for uses of pointers to auto variables whose lifetime has ended. + +Wdangling-pointer= +C ObjC C++ ObjC++ Joined RejectNegative UInteger Var(warn_dangling_pointer) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall, 2, 0) IntegerRange(0, 2) +Warn for uses of pointers to auto variables whose lifetime has ended. + Wdate-time C ObjC C++ ObjC++ CPP(warn_date_time) CppReason(CPP_W_DATE_TIME) Var(cpp_warn_date_time) Init(0) Warning Warn about __TIME__, __DATE__ and __TIMESTAMP__ usage. diff --git a/gcc/diagnostic-spec.c b/gcc/diagnostic-spec.c index c9e1c1be91d..a8af229d677 100644 --- a/gcc/diagnostic-spec.c +++ b/gcc/diagnostic-spec.c @@ -99,6 +99,7 @@ nowarn_spec_t::nowarn_spec_t (opt_code opt) m_bits = NW_UNINIT; break; + case OPT_Wdangling_pointer_: case OPT_Wreturn_local_addr: case OPT_Wuse_after_free_: m_bits = NW_DANGLING; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 121c8ea827f..7f2205e4a85 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -341,7 +341,8 @@ Objective-C and Objective-C++ Dialects}. -Wchar-subscripts @gol -Wclobbered -Wcomment @gol -Wconversion -Wno-coverage-mismatch -Wno-cpp @gol --Wdangling-else -Wdate-time @gol +-Wdangling-else -Wdangling-pointer -Wdangling-pointer=@var{n} @gol +-Wdate-time @gol -Wno-deprecated -Wno-deprecated-declarations -Wno-designated-init @gol -Wdisabled-optimization @gol -Wno-discarded-array-qualifiers -Wno-discarded-qualifiers @gol @@ -4389,6 +4390,8 @@ Warn about overriding virtual functions that are not marked with the @opindex Wno-use-after-free Warn about uses of pointers to dynamically allocated objects that have been rendered indeterminate by a call to a deallocation function. +The warning is enabled at all optimization levels but may yield different +results with optimization than without. @table @gcctabopt @item -Wuse-after-free=1 @@ -5714,6 +5717,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. -Wcatch-value @r{(C++ and Objective-C++ only)} @gol -Wchar-subscripts @gol -Wcomment @gol +-Wdangling-pointer=2 @gol -Wduplicate-decl-specifier @r{(C and Objective-C only)} @gol -Wenum-compare @r{(in C/ObjC; this is on by default in C++)} @gol -Wformat @gol @@ -8587,6 +8591,62 @@ looks like this: This warning is enabled by @option{-Wparentheses}. +@item -Wdangling-pointer +@itemx -Wdangling-pointer=@var{n} +@opindex Wdangling-pointer +@opindex Wno-dangling-pointer +Warn about uses of pointers (or C++ references) to objects with automatic +storage duration after their lifetime has ended. This includes local +variables declared in nested blocks, compound literals and other unnamed +temporary objects. In addition, warn about storing the address of such +objects in escaped pointers. The warning is enabled at all optimization +levels but may yield different results with optimization than without. + +@table @gcctabopt +@item -Wdangling-pointer=1 +At level 1 the warning diagnoses only unconditional uses of dangling pointers. +For example +@smallexample +int f (int c1, int c2, x) +@{ + char *p = strchr ((char[])@{ c1, c2 @}, c3); + return p ? *p : 'x'; // warning: dangling pointer to a compound literal +@} +@end smallexample +In the following function the store of the address of the local variable +@code{x} in the escaped pointer @code{*p} also triggers the warning. +@smallexample +void g (int **p) +@{ + int x = 7; + *p = &x; // warning: storing the address of a local variable in *p +@} +@end smallexample + +@item -Wdangling-pointer=2 +At level 2, in addition to unconditional uses the warning also diagnoses +conditional uses of dangling pointers. + +For example, because the array @var{a} in the following function is out of +scope when the pointer @var{s} that was set to point is used, the warning +triggers at this level. + +@smallexample +void f (char *s) +@{ + if (!s) + @{ + char a[12] = "tmpname"; + s = a; + @} + strcat (s, ".tmp"); // warning: dangling pointer to a may be used + ... +@} +@end smallexample +@end table + +@option{-Wdangling-pointer=2} is included in @option{-Wall}. + @item -Wdate-time @opindex Wdate-time @opindex Wno-date-time diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc index 882129143a1..f639807a78a 100644 --- a/gcc/gimple-ssa-warn-access.cc +++ b/gcc/gimple-ssa-warn-access.cc @@ -2069,10 +2069,12 @@ class pass_waccess : public gimple_opt_pass ~pass_waccess (); - opt_pass *clone () { return new pass_waccess (m_ctxt); } + opt_pass *clone (); virtual bool gate (function *); + void set_pass_param (unsigned, bool); + virtual unsigned int execute (function *); private: @@ -2089,6 +2091,9 @@ private: /* Check a call to an ordinary function for invalid accesses. */ bool check_call_access (gcall *); + /* Check a non-call statement. */ + void check_stmt (gimple *); + /* Check statements in a basic block. */ void check_block (basic_block); @@ -2112,26 +2117,41 @@ private: void check_atomic_memmodel (gimple *, tree, tree, const unsigned char *); /* Check for uses of indeterminate pointers. */ - void check_pointer_uses (gimple *, tree); + void check_pointer_uses (gimple *, tree, tree = NULL_TREE, bool = false); /* Return the argument that a call returns. */ tree gimple_call_return_arg (gcall *); + tree gimple_call_return_arg_ref (gcall *); + + /* Check a call for uses of a dangling pointer arguments. */ + void check_call_dangling (gcall *); + + /* Check uses of a dangling pointer or those derived from it. */ + void check_dangling_uses (tree, tree, bool = false, bool = false); + void check_dangling_uses (); + void check_dangling_stores (); + void check_dangling_stores (basic_block, hash_set<tree> &, auto_bitmap &); - void warn_invalid_pointer (tree, gimple *, gimple *, bool, bool = false); + void warn_invalid_pointer (tree, gimple *, gimple *, tree, bool, bool = false); /* Return true if use follows an invalidating statement. */ - bool use_after_inval_p (gimple *, gimple *); + bool use_after_inval_p (gimple *, gimple *, bool = false); /* A pointer_query object and its cache to store information about pointers and their targets in. */ pointer_query m_ptr_qry; pointer_query::cache_type m_var_cache; - + /* Mapping from DECLs and their clobber statements in the function. */ + hash_map<tree, gimple *> m_clobbers; /* A bit is set for each basic block whose statements have been assigned valid UIDs. */ bitmap m_bb_uids_set; /* The current function. */ function *m_func; + /* True to run checks for uses of dangling pointers. */ + bool m_check_dangling_p; + /* True to run checks early on in the optimization pipeline. */ + bool m_early_checks_p; }; /* Construct the pass. */ @@ -2140,11 +2160,22 @@ pass_waccess::pass_waccess (gcc::context *ctxt) : gimple_opt_pass (pass_data_waccess, ctxt), m_ptr_qry (NULL, &m_var_cache), m_var_cache (), + m_clobbers (), m_bb_uids_set (), - m_func () + m_func (), + m_check_dangling_p (), + m_early_checks_p () { } +/* Return a copy of the pass with RUN_NUMBER one greater than THIS. */ + +opt_pass* +pass_waccess::clone () +{ + return new pass_waccess (m_ctxt); +} + /* Release pointer_query cache. */ pass_waccess::~pass_waccess () @@ -2152,6 +2183,14 @@ pass_waccess::~pass_waccess () m_ptr_qry.flush_cache (); } +void +pass_waccess::set_pass_param (unsigned int n, bool early) +{ + gcc_assert (n == 0); + + m_early_checks_p = early; +} + /* Return true when any checks performed by the pass are enabled. */ bool @@ -2340,6 +2379,9 @@ maybe_warn_alloc_args_overflow (gimple *stmt, const tree args[2], void pass_waccess::check_alloca (gcall *stmt) { + if (m_early_checks_p) + return; + if ((warn_vla_limit >= HOST_WIDE_INT_MAX && warn_alloc_size_limit < warn_vla_limit) || (warn_alloca_limit >= HOST_WIDE_INT_MAX @@ -2361,6 +2403,13 @@ pass_waccess::check_alloca (gcall *stmt) void pass_waccess::check_alloc_size_call (gcall *stmt) { + if (m_early_checks_p) + return; + + if (gimple_call_num_args (stmt) < 1) + /* Avoid invalid calls to functions without a prototype. */ + return; + tree fndecl = gimple_call_fndecl (stmt); if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) { @@ -2413,6 +2462,9 @@ pass_waccess::check_alloc_size_call (gcall *stmt) void pass_waccess::check_strcat (gcall *stmt) { + if (m_early_checks_p) + return; + if (!warn_stringop_overflow && !warn_stringop_overread) return; @@ -2438,6 +2490,9 @@ pass_waccess::check_strcat (gcall *stmt) void pass_waccess::check_strncat (gcall *stmt) { + if (m_early_checks_p) + return; + if (!warn_stringop_overflow && !warn_stringop_overread) return; @@ -2507,6 +2562,9 @@ pass_waccess::check_strncat (gcall *stmt) void pass_waccess::check_stxcpy (gcall *stmt) { + if (m_early_checks_p) + return; + tree dst = call_arg (stmt, 0); tree src = call_arg (stmt, 1); @@ -2545,7 +2603,7 @@ pass_waccess::check_stxcpy (gcall *stmt) void pass_waccess::check_stxncpy (gcall *stmt) { - if (!warn_stringop_overflow) + if (m_early_checks_p || !warn_stringop_overflow) return; tree dst = call_arg (stmt, 0); @@ -2569,7 +2627,7 @@ pass_waccess::check_stxncpy (gcall *stmt) void pass_waccess::check_strncmp (gcall *stmt) { - if (!warn_stringop_overread) + if (m_early_checks_p || !warn_stringop_overread) return; tree arg1 = call_arg (stmt, 0); @@ -2674,6 +2732,9 @@ pass_waccess::check_strncmp (gcall *stmt) void pass_waccess::check_memop_access (gimple *stmt, tree dest, tree src, tree size) { + if (m_early_checks_p) + return; + /* For functions like memset and memcpy that operate on raw memory try to determine the size of the largest source and destination object using type-0 Object Size regardless of the object size @@ -2695,7 +2756,7 @@ pass_waccess::check_read_access (gimple *stmt, tree src, tree bound /* = NULL_TREE */, int ost /* = 1 */) { - if (!warn_stringop_overread) + if (m_early_checks_p || !warn_stringop_overread) return; if (bound && !useless_type_conversion_p (size_type_node, TREE_TYPE (bound))) @@ -2938,7 +2999,7 @@ pass_waccess::check_atomic_memmodel (gimple *stmt, tree ord_sucs, if (warning_suppressed_p (stmt, OPT_Winvalid_memory_model)) return; - if (maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid)) + if (!maybe_warn_memmodel (stmt, ord_sucs, ord_fail, valid)) return; suppress_warning (stmt, OPT_Winvalid_memory_model); @@ -3094,11 +3155,12 @@ pass_waccess::check_builtin (gcall *stmt) case BUILT_IN_FREE: case BUILT_IN_REALLOC: - { - tree arg = call_arg (stmt, 0); - if (TREE_CODE (arg) == SSA_NAME) - check_pointer_uses (stmt, arg); - } + if (!m_early_checks_p) + { + tree arg = call_arg (stmt, 0); + if (TREE_CODE (arg) == SSA_NAME) + check_pointer_uses (stmt, arg); + } return true; case BUILT_IN_GETTEXT: @@ -3725,16 +3787,67 @@ pass_waccess::maybe_check_dealloc_call (gcall *call) /* Return true if either USE_STMT's basic block (that of a pointer's use) is dominated by INVAL_STMT's (that of a pointer's invalidating statement, - or if they're in the same block, USE_STMT follows INVAL_STMT. */ + which is either a clobber or a deallocation call), or if they're in + the same block, USE_STMT follows INVAL_STMT. */ bool -pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt) +pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt, + bool last_block /* = false */) { + tree clobvar = + gimple_clobber_p (inval_stmt) ? gimple_assign_lhs (inval_stmt) : NULL_TREE; + basic_block inval_bb = gimple_bb (inval_stmt); basic_block use_bb = gimple_bb (use_stmt); + if (!inval_bb || !use_bb) + return false; + if (inval_bb != use_bb) - return dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb); + { + if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb)) + return true; + + if (!clobvar || !last_block) + return false; + + /* Proceed only when looking for uses of dangling pointers. */ + auto gsi = gsi_for_stmt (use_stmt); + + auto_bitmap visited; + + /* A use statement in the last basic block in a function or one that + falls through to it is after any other prior clobber of the used + variable unless it's followed by a clobber of the same variable. */ + basic_block bb = use_bb; + while (bb != inval_bb + && single_succ_p (bb) + && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK))) + { + if (!bitmap_set_bit (visited, bb->index)) + /* Avoid cycles. */ + return true; + + for (; !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (gimple_clobber_p (stmt)) + { + if (clobvar == gimple_assign_lhs (stmt)) + /* The use is followed by a clobber. */ + return false; + } + } + + bb = single_succ (bb); + gsi = gsi_start_bb (bb); + } + + /* The use is one of a dangling pointer if a clobber of the variable + [the pointer points to] has not been found before the function exit + point. */ + return bb == EXIT_BLOCK_PTR_FOR_FN (cfun); + } if (bitmap_set_bit (m_bb_uids_set, inval_bb->index)) /* The first time this basic block is visited assign increasing ids @@ -3752,27 +3865,30 @@ pass_waccess::use_after_inval_p (gimple *inval_stmt, gimple *use_stmt) return gimple_uid (inval_stmt) < gimple_uid (use_stmt); } -/* Issue a warning for the USE_STMT of pointer PTR rendered invalid - by INVAL_STMT. PTR may be null when it's been optimized away. - MAYBE is true to issue the "maybe" kind of warning. EQUALITY is - true when the pointer is used in an equality expression. */ +/* Issue a warning for the USE_STMT of pointer or reference REF rendered + invalid by INVAL_STMT. REF may be null when it's been optimized away. + When nonnull, INVAL_STMT is the deallocation function that rendered + the pointer or reference dangling. Otherwise, VAR is the auto variable + (including an unnamed temporary such as a compound literal) whose + lifetime's rended it dangling. MAYBE is true to issue the "maybe" + kind of warning. EQUALITY is true when the pointer is used in + an equality expression. */ void -pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, - gimple *inval_stmt, - bool maybe, - bool equality /* = false */) +pass_waccess::warn_invalid_pointer (tree ref, gimple *use_stmt, + gimple *inval_stmt, tree var, + bool maybe, bool equality /* = false */) { /* Avoid printing the unhelpful "<unknown>" in the diagnostics. */ - if (ptr && TREE_CODE (ptr) == SSA_NAME - && (!SSA_NAME_VAR (ptr) || DECL_ARTIFICIAL (SSA_NAME_VAR (ptr)))) - ptr = NULL_TREE; + if (ref && TREE_CODE (ref) == SSA_NAME + && (!SSA_NAME_VAR (ref) || DECL_ARTIFICIAL (SSA_NAME_VAR (ref)))) + ref = NULL_TREE; location_t use_loc = gimple_location (use_stmt); if (use_loc == UNKNOWN_LOCATION) { - use_loc = cfun->function_end_locus; - if (!ptr) + use_loc = m_func->function_end_locus; + if (!ref) /* Avoid issuing a warning with no context other than the function. That would make it difficult to debug in any but very simple cases. */ @@ -3788,12 +3904,12 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, const tree inval_decl = gimple_call_fndecl (inval_stmt); - if ((ptr && warning_at (use_loc, OPT_Wuse_after_free, + if ((ref && warning_at (use_loc, OPT_Wuse_after_free, (maybe ? G_("pointer %qE may be used after %qD") : G_("pointer %qE used after %qD")), - ptr, inval_decl)) - || (!ptr && warning_at (use_loc, OPT_Wuse_after_free, + ref, inval_decl)) + || (!ref && warning_at (use_loc, OPT_Wuse_after_free, (maybe ? G_("pointer may be used after %qD") : G_("pointer used after %qD")), @@ -3805,6 +3921,52 @@ pass_waccess::warn_invalid_pointer (tree ptr, gimple *use_stmt, } return; } + + if ((maybe && warn_dangling_pointer < 2) + || warning_suppressed_p (use_stmt, OPT_Wdangling_pointer_)) + return; + + if (DECL_NAME (var)) + { + if ((ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer %qE to %qD may be used") + : G_("using dangling pointer %qE to %qD")), + ref, var)) + || (!ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer to %qD may be used") + : G_("using a dangling pointer to %qD")), + var))) + inform (DECL_SOURCE_LOCATION (var), + "%qD declared here", var); + suppress_warning (use_stmt, OPT_Wdangling_pointer_); + return; + } + + if ((ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer %qE to an unnamed temporary " + "may be used") + : G_("using dangling pointer %qE to an unnamed " + "temporary")), + ref, var)) + || (!ref + && warning_at (use_loc, OPT_Wdangling_pointer_, + (maybe + ? G_("dangling pointer to an unnamed temporary " + "may be used") + : G_("using a dangling pointer to an unnamed " + "temporary")), + var))) + { + inform (DECL_SOURCE_LOCATION (var), + "unnamed temporary defined here"); + suppress_warning (use_stmt, OPT_Wdangling_pointer_); + } } /* If STMT is a call to either the standard realloc or to a user-defined @@ -3927,10 +4089,14 @@ pointers_related_p (gimple *stmt, tree p, tree q, pointer_query &qry) /* For a STMT either a call to a deallocation function or a clobber, warn for uses of the pointer PTR it was called with (including its copies - or others derived from it by pointer arithmetic). */ + or others derived from it by pointer arithmetic). If STMT is a clobber, + VAR is the decl of the clobbered variable. When MAYBE is true use + a "maybe" form of diagnostic. */ void -pass_waccess::check_pointer_uses (gimple *stmt, tree ptr) +pass_waccess::check_pointer_uses (gimple *stmt, tree ptr, + tree var /* = NULL_TREE */, + bool maybe /* = false */) { gcc_assert (TREE_CODE (ptr) == SSA_NAME); @@ -4013,18 +4179,25 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr) /* Warn if USE_STMT is dominated by the deallocation STMT. Otherwise, add the pointer to POINTERS so that the uses of any other pointers derived from it can be checked. */ - if (use_after_inval_p (stmt, use_stmt)) + if (use_after_inval_p (stmt, use_stmt, check_dangling)) { - /* TODO: Handle PHIs but careful of false positives. */ - if (gimple_code (use_stmt) != GIMPLE_PHI) + if (gimple_code (use_stmt) == GIMPLE_PHI) { - basic_block use_bb = gimple_bb (use_stmt); - bool this_maybe - = !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb); - warn_invalid_pointer (*use_p->use, use_stmt, stmt, - this_maybe, equality); - continue; + tree lhs = gimple_phi_result (use_stmt); + if (TREE_CODE (lhs) == SSA_NAME) + { + pointers.safe_push (lhs); + continue; + } } + + basic_block use_bb = gimple_bb (use_stmt); + bool this_maybe + = (maybe + || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, stmt_bb)); + warn_invalid_pointer (*use_p->use, use_stmt, stmt, var, + this_maybe, equality); + continue; } if (is_gimple_assign (use_stmt)) @@ -4059,26 +4232,100 @@ pass_waccess::check_call (gcall *stmt) if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) check_builtin (stmt); - if (tree callee = gimple_call_fndecl (stmt)) - { - /* Check for uses of the pointer passed to either a standard - or a user-defined deallocation function. */ - unsigned argno = fndecl_dealloc_argno (callee); - if (argno < (unsigned) call_nargs (stmt)) - { - tree arg = call_arg (stmt, argno); - if (TREE_CODE (arg) == SSA_NAME) - check_pointer_uses (stmt, arg); - } - } + if (!m_early_checks_p) + if (tree callee = gimple_call_fndecl (stmt)) + { + /* Check for uses of the pointer passed to either a standard + or a user-defined deallocation function. */ + unsigned argno = fndecl_dealloc_argno (callee); + if (argno < (unsigned) call_nargs (stmt)) + { + tree arg = call_arg (stmt, argno); + if (TREE_CODE (arg) == SSA_NAME) + check_pointer_uses (stmt, arg); + } + } check_call_access (stmt); + check_call_dangling (stmt); + + if (m_early_checks_p) + return; maybe_check_dealloc_call (stmt); check_nonstring_args (stmt); } +/* Return true of X is a DECL with automatic storage duration. */ + +static inline bool +is_auto_decl (tree x) +{ + return DECL_P (x) && !DECL_EXTERNAL (x) && !TREE_STATIC (x); +} + +/* Check non-call STMT for invalid accesses. */ + +void +pass_waccess::check_stmt (gimple *stmt) +{ + if (m_check_dangling_p && gimple_clobber_p (stmt)) + { + /* Ignore clobber statemts in blocks with exceptional edges. */ + basic_block bb = gimple_bb (stmt); + edge e = EDGE_PRED (bb, 0); + if (e->flags & EDGE_EH) + return; + + tree var = gimple_assign_lhs (stmt); + m_clobbers.put (var, stmt); + return; + } + + if (is_gimple_assign (stmt)) + { + /* Clobbered unnamed temporaries such as compound literals can be + revived. Check for an assignment to one and remove it from + M_CLOBBERS. */ + tree lhs = gimple_assign_lhs (stmt); + while (handled_component_p (lhs)) + lhs = TREE_OPERAND (lhs, 0); + + if (is_auto_decl (lhs)) + m_clobbers.remove (lhs); + return; + } + + if (greturn *ret = dyn_cast <greturn *> (stmt)) + { + if (optimize && flag_isolate_erroneous_paths_dereference) + /* Avoid interfering with -Wreturn-local-addr (which runs only + with optimization enabled). */ + return; + + tree arg = gimple_return_retval (ret); + if (!arg || TREE_CODE (arg) != ADDR_EXPR) + return; + + arg = TREE_OPERAND (arg, 0); + while (handled_component_p (arg)) + arg = TREE_OPERAND (arg, 0); + + if (!is_auto_decl (arg)) + return; + + gimple **pclobber = m_clobbers.get (arg); + if (!pclobber) + return; + + if (!use_after_inval_p (*pclobber, stmt)) + return; + + warn_invalid_pointer (NULL_TREE, stmt, *pclobber, arg, false); + } +} + /* Check basic block BB for invalid accesses. */ void @@ -4091,6 +4338,8 @@ pass_waccess::check_block (basic_block bb) gimple *stmt = gsi_stmt (si); if (gcall *call = dyn_cast <gcall *> (stmt)) check_call (call); + else + check_stmt (stmt); } } @@ -4139,6 +4388,262 @@ pass_waccess::gimple_call_return_arg (gcall *call) return gimple_call_arg (call, argno); } +/* Return the decl referenced by the argument that the call STMT to + a built-in function returns (including with an offset) or null if + it doesn't. */ + +tree +pass_waccess::gimple_call_return_arg_ref (gcall *call) +{ + if (tree arg = gimple_call_return_arg (call)) + { + access_ref aref; + if (m_ptr_qry.get_ref (arg, call, &aref, 0) + && DECL_P (aref.ref)) + return aref.ref; + } + + return NULL_TREE; +} + +/* Check for and diagnose all uses of the dangling pointer VAR to the auto + object DECL whose lifetime has ended. OBJREF is true when VAR denotes + an access to a DECL that may have been clobbered. */ + +void +pass_waccess::check_dangling_uses (tree var, tree decl, bool maybe /* = false */, + bool objref /* = false */) +{ + if (!decl || !is_auto_decl (decl)) + return; + + gimple **pclob = m_clobbers.get (decl); + if (!pclob) + return; + + if (!objref) + { + check_pointer_uses (*pclob, var, decl, maybe); + return; + } + + gimple *use_stmt = SSA_NAME_DEF_STMT (var); + if (!use_after_inval_p (*pclob, use_stmt, true)) + return; + + basic_block use_bb = gimple_bb (use_stmt); + basic_block clob_bb = gimple_bb (*pclob); + maybe = maybe || !dominated_by_p (CDI_POST_DOMINATORS, use_bb, clob_bb); + warn_invalid_pointer (var, use_stmt, *pclob, decl, maybe, false); +} + +/* Diagnose stores in BB and (recursively) its predecessors of the addresses + of local variables into nonlocal pointers that are left dangling after + the function returns. BBS is a bitmap of basic blocks visited. */ + +void +pass_waccess::check_dangling_stores (basic_block bb, + hash_set<tree> &stores, + auto_bitmap &bbs) +{ + if (!bitmap_set_bit (bbs, bb->index)) + /* Avoid cycles. */ + return; + + /* Iterate backwards over the statements looking for a store of + the address of a local variable into a nonlocal pointer. */ + for (auto gsi = gsi_last_nondebug_bb (bb); ; gsi_prev_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!stmt) + break; + + if (is_gimple_call (stmt) + && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))) + /* Avoid looking before nonconst, nonpure calls since those might + use the escaped locals. */ + return; + + if (!is_gimple_assign (stmt) || gimple_clobber_p (stmt)) + continue; + + access_ref lhs_ref; + tree lhs = gimple_assign_lhs (stmt); + if (!m_ptr_qry.get_ref (lhs, stmt, &lhs_ref, 0)) + continue; + + if (is_auto_decl (lhs_ref.ref)) + continue; + + if (DECL_P (lhs_ref.ref)) + { + if (!POINTER_TYPE_P (TREE_TYPE (lhs_ref.ref)) + || lhs_ref.deref > 0) + continue; + } + else if (TREE_CODE (lhs_ref.ref) == SSA_NAME) + { + /* Avoid looking at or before stores into unknown objects. */ + gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref); + if (!gimple_nop_p (def_stmt)) + return; + } + else if (TREE_CODE (lhs_ref.ref) == MEM_REF) + { + tree arg = TREE_OPERAND (lhs_ref.ref, 0); + if (TREE_CODE (arg) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (arg); + if (!gimple_nop_p (def_stmt)) + return; + } + } + else + continue; + + if (stores.add (lhs_ref.ref)) + continue; + + /* FIXME: Handle stores of alloca() and VLA. */ + access_ref rhs_ref; + tree rhs = gimple_assign_rhs1 (stmt); + if (!m_ptr_qry.get_ref (rhs, stmt, &rhs_ref, 0) + || rhs_ref.deref != -1) + continue; + + if (!is_auto_decl (rhs_ref.ref)) + continue; + + location_t loc = gimple_location (stmt); + if (warning_at (loc, OPT_Wdangling_pointer_, + "storing the address of local variable %qD in %qE", + rhs_ref.ref, lhs)) + { + location_t loc = DECL_SOURCE_LOCATION (rhs_ref.ref); + inform (loc, "%qD declared here", rhs_ref.ref); + + if (DECL_P (lhs_ref.ref)) + loc = DECL_SOURCE_LOCATION (lhs_ref.ref); + else if (EXPR_HAS_LOCATION (lhs_ref.ref)) + loc = EXPR_LOCATION (lhs_ref.ref); + + if (loc != UNKNOWN_LOCATION) + inform (loc, "%qE declared here", lhs_ref.ref); + } + } + + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + check_dangling_stores (pred, stores, bbs); + } +} + +/* Diagnose stores of the addresses of local variables into nonlocal + pointers that are left dangling after the function returns. */ + +void +pass_waccess::check_dangling_stores () +{ + auto_bitmap bbs; + hash_set<tree> stores; + check_dangling_stores (EXIT_BLOCK_PTR_FOR_FN (m_func), stores, bbs); +} + +/* Check for and diagnose uses of dangling pointers to auto objects + whose lifetime has ended. */ + +void +pass_waccess::check_dangling_uses () +{ + tree var; + unsigned i; + FOR_EACH_SSA_NAME (i, var, m_func) + { + /* For each SSA_NAME pointer VAR find the DECL it points to. + If the DECL is a clobbered local variable, check to see + if any of VAR's uses (or those of other pointers derived + from VAR) happens after the clobber. If so, warn. */ + tree decl = NULL_TREE; + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + if (is_gimple_assign (def_stmt)) + { + tree rhs = gimple_assign_rhs1 (def_stmt); + if (TREE_CODE (rhs) == ADDR_EXPR) + { + if (!POINTER_TYPE_P (TREE_TYPE (var))) + continue; + decl = TREE_OPERAND (rhs, 0); + } + else + { + /* For other expressions, check the base DECL to see + if it's been clobbered, most likely as a result of </cut>

4 years, 1 month

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain