[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O2 - Build # 9 - Fixed!

7 Jul 2021

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2.  So far, this commit has regressed CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2
Culprit:
<cut>
commit e00f189d392dd9bf95f6a98f05f2d341d06cd65c
Author: Roman Lebedev lebedev.ri@gmail.com
Date:   Mon Oct 5 22:35:59 2020 +0300
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
    it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
    by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
    are not no-op, and are no longer eager to look past them.
    Which e.g. means that given
    ```
    %a = load i32
    %b = inttoptr %a
    %c = inttoptr %a
    ```
    we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
    we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
    And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
    otherwise-good IR, in a way that is not recoverable.
    And that means, this fold isn't helpful in exposing the passes
    that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
    The original motivational RFC does not state what larger problem
    that canonicalization was trying to solve, so i'm not sure
    how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
    increase of asm instructions and final object size by ~+0.05%
    decreases final count of bitcasts by -4.79% (-28990),
    ptrtoint casts by -15.41% (-3423),
    and of inttoptr casts by -25.59% (-6919, *sic*).
    Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
</cut>
Results regressed to (for first_bad == e00f189d392dd9bf95f6a98f05f2d341d06cd65c)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2 -- artifacts/build-e00f189d392dd9bf95f6a98f05f2d341d06cd65c/results_id:
1
# 470.lbm,lbm_base.default                                      regressed by 111
# 470.lbm,[.] LBM_performStreamCollide                          regressed by 111
from (for last_good == 567462b48eba1c2d286ce97117994463f4535d2e)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2 -- artifacts/build-567462b48eba1c2d286ce97117994463f4535d2e/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-...
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/1124
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-...
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O2/1119
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-...
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-e00f189d392dd9bf95f6a98f05f2d341d06cd65c
cd investigate-llvm-e00f189d392dd9bf95f6a98f05f2d341d06cd65c
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-... --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-... --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-... --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach e00f189d392dd9bf95f6a98f05f2d341d06cd65c
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 567462b48eba1c2d286ce97117994463f4535d2e
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-...
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release-...
Full commit (up to 1000 lines):
<cut>
commit e00f189d392dd9bf95f6a98f05f2d341d06cd65c
Author: Roman Lebedev lebedev.ri@gmail.com
Date:   Mon Oct 5 22:35:59 2020 +0300
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
    it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
    by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
    are not no-op, and are no longer eager to look past them.
    Which e.g. means that given
    ```
    %a = load i32
    %b = inttoptr %a
    %c = inttoptr %a
    ```
    we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
    we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
    And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
    otherwise-good IR, in a way that is not recoverable.
    And that means, this fold isn't helpful in exposing the passes
    that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
    The original motivational RFC does not state what larger problem
    that canonicalization was trying to solve, so i'm not sure
    how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
    increase of asm instructions and final object size by ~+0.05%
    decreases final count of bitcasts by -4.79% (-28990),
    ptrtoint casts by -15.41% (-3423),
    and of inttoptr casts by -25.59% (-6919, *sic*).
    Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
---
 .../CodeGen/attr-arm-sve-vector-bits-bitcast.c     | 18 +++--
 clang/test/CodeGen/attr-arm-sve-vector-bits-call.c | 78 ++++++++++------------
 clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c | 24 +++----
 .../CodeGen/attr-arm-sve-vector-bits-globals.c     |  6 +-
 .../InstCombine/InstCombineLoadStoreAlloca.cpp     | 34 ----------
 llvm/test/Transforms/InstCombine/atomic.ll         | 18 ++---
 llvm/test/Transforms/InstCombine/load.ll           | 44 ++++++------
 .../Transforms/InstCombine/loadstore-metadata.ll   | 19 +-----
 .../InstCombine/non-integral-pointers.ll           | 16 ++---
 .../PhaseOrdering/instcombine-sroa-inttoptr.ll     | 32 ++++-----
 10 files changed, 107 insertions(+), 182 deletions(-)

diff --git a/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c b/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
index 3a5628d7f57e..5df1e83b13bf 100644
--- a/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
+++ b/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
@@ -255,22 +255,20 @@ svbool_t read_bool(struct struct_bool *s) {
 // CHECK-256-NEXT:  entry:
 // CHECK-256-NEXT:    [[X_ADDR:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-256-NEXT:    store <vscale x 16 x i1> [[X:%.*]], <vscale x 16 x i1>* [[X_ADDR]], align 16, [[TBAA15:!tbaa !.*]]
-// CHECK-256-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[X_ADDR]] to i32*
-// CHECK-256-NEXT:    [[TMP1:%.*]] = load i32, i32* [[TMP0]], align 16, [[TBAA6]]
-// CHECK-256-NEXT:    [[Y:%.*]] = getelementptr inbounds [[STRUCT_STRUCT_BOOL:%.*]], %struct.struct_bool* [[S:%.*]], i64 0, i32 1
-// CHECK-256-NEXT:    [[TMP2:%.*]] = bitcast [3 x <4 x i8>]* [[Y]] to i32*
-// CHECK-256-NEXT:    store i32 [[TMP1]], i32* [[TMP2]], align 2, [[TBAA6]]
+// CHECK-256-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[X_ADDR]] to <4 x i8>*
+// CHECK-256-NEXT:    [[TMP1:%.*]] = load <4 x i8>, <4 x i8>* [[TMP0]], align 16, [[TBAA6]]
+// CHECK-256-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [[STRUCT_STRUCT_BOOL:%.*]], %struct.struct_bool* [[S:%.*]], i64 0, i32 1, i64 0
+// CHECK-256-NEXT:    store <4 x i8> [[TMP1]], <4 x i8>* [[ARRAYIDX]], align 2, [[TBAA6]]
 // CHECK-256-NEXT:    ret void
 //
 // CHECK-512-LABEL: @write_bool(
 // CHECK-512-NEXT:  entry:
 // CHECK-512-NEXT:    [[X_ADDR:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-512-NEXT:    store <vscale x 16 x i1> [[X:%.*]], <vscale x 16 x i1>* [[X_ADDR]], align 16, [[TBAA15:!tbaa !.*]]
-// CHECK-512-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[X_ADDR]] to i64*
-// CHECK-512-NEXT:    [[TMP1:%.*]] = load i64, i64* [[TMP0]], align 16, [[TBAA6]]
-// CHECK-512-NEXT:    [[Y:%.*]] = getelementptr inbounds [[STRUCT_STRUCT_BOOL:%.*]], %struct.struct_bool* [[S:%.*]], i64 0, i32 1
-// CHECK-512-NEXT:    [[TMP2:%.*]] = bitcast [3 x <8 x i8>]* [[Y]] to i64*
-// CHECK-512-NEXT:    store i64 [[TMP1]], i64* [[TMP2]], align 2, [[TBAA6]]
+// CHECK-512-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[X_ADDR]] to <8 x i8>*
+// CHECK-512-NEXT:    [[TMP1:%.*]] = load <8 x i8>, <8 x i8>* [[TMP0]], align 16, [[TBAA6]]
+// CHECK-512-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [[STRUCT_STRUCT_BOOL:%.*]], %struct.struct_bool* [[S:%.*]], i64 0, i32 1, i64 0
+// CHECK-512-NEXT:    store <8 x i8> [[TMP1]], <8 x i8>* [[ARRAYIDX]], align 2, [[TBAA6]]
 // CHECK-512-NEXT:    ret void
 //
 void write_bool(struct struct_bool *s, svbool_t x) {
diff --git a/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c b/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
index 5442d58e96be..13979197999e 100644
--- a/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
+++ b/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
@@ -169,28 +169,24 @@ fixed_float64_t call_float64_ff(svbool_t pg, fixed_float64_t op1, fixed_float64_
 // CHECK-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i8>* [[OP1]] to <vscale x 16 x i1>*
 // CHECK-NEXT:    store <vscale x 16 x i1> [[OP1_COERCE:%.*]], <vscale x 16 x i1>* [[TMP0]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[OP1]] to i64*
-// CHECK-NEXT:    [[OP113:%.*]] = load i64, i64* [[TMP1]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8>* [[OP2]] to <vscale x 16 x i1>*
-// CHECK-NEXT:    store <vscale x 16 x i1> [[OP2_COERCE:%.*]], <vscale x 16 x i1>* [[TMP2]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = bitcast <8 x i8>* [[OP2]] to i64*
-// CHECK-NEXT:    [[OP224:%.*]] = load i64, i64* [[TMP3]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to i64*
-// CHECK-NEXT:    store i64 [[OP113]], i64* [[TMP4]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP5:%.*]] = bitcast <8 x i8>* [[OP2_ADDR]] to i64*
-// CHECK-NEXT:    store i64 [[OP224]], i64* [[TMP5]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to <vscale x 16 x i1>*
-// CHECK-NEXT:    [[TMP7:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP6]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8>* [[OP2_ADDR]] to <vscale x 16 x i1>*
-// CHECK-NEXT:    [[TMP9:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP8]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 16 x i1> @llvm.aarch64.sve.sel.nxv16i1(<vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i1> [[TMP7]], <vscale x 16 x i1> [[TMP9]])
-// CHECK-NEXT:    store <vscale x 16 x i1> [[TMP10]], <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]], align 16, [[TBAA13:!tbaa !.*]]
-// CHECK-NEXT:    [[TMP11:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to i64*
-// CHECK-NEXT:    [[TMP12:%.*]] = load i64, i64* [[TMP11]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP13:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to i64*
-// CHECK-NEXT:    store i64 [[TMP12]], i64* [[TMP13]], align 16
-// CHECK-NEXT:    [[TMP14:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
-// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP14]]
+// CHECK-NEXT:    [[OP11:%.*]] = load <8 x i8>, <8 x i8>* [[OP1]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[OP2]] to <vscale x 16 x i1>*
+// CHECK-NEXT:    store <vscale x 16 x i1> [[OP2_COERCE:%.*]], <vscale x 16 x i1>* [[TMP1]], align 16
+// CHECK-NEXT:    [[OP22:%.*]] = load <8 x i8>, <8 x i8>* [[OP2]], align 16, [[TBAA6]]
+// CHECK-NEXT:    store <8 x i8> [[OP11]], <8 x i8>* [[OP1_ADDR]], align 16, [[TBAA6]]
+// CHECK-NEXT:    store <8 x i8> [[OP22]], <8 x i8>* [[OP2_ADDR]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to <vscale x 16 x i1>*
+// CHECK-NEXT:    [[TMP3:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP2]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP4:%.*]] = bitcast <8 x i8>* [[OP2_ADDR]] to <vscale x 16 x i1>*
+// CHECK-NEXT:    [[TMP5:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP4]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP6:%.*]] = call <vscale x 16 x i1> @llvm.aarch64.sve.sel.nxv16i1(<vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i1> [[TMP3]], <vscale x 16 x i1> [[TMP5]])
+// CHECK-NEXT:    store <vscale x 16 x i1> [[TMP6]], <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]], align 16, [[TBAA13:!tbaa !.*]]
+// CHECK-NEXT:    [[CASTFIXEDSVE:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to <8 x i8>*
+// CHECK-NEXT:    [[TMP7:%.*]] = load <8 x i8>, <8 x i8>* [[CASTFIXEDSVE]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[RETVAL_0__SROA_CAST:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to <8 x i8>*
+// CHECK-NEXT:    store <8 x i8> [[TMP7]], <8 x i8>* [[RETVAL_0__SROA_CAST]], align 16
+// CHECK-NEXT:    [[TMP8:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
+// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP8]]
 //
 fixed_bool_t call_bool_ff(svbool_t pg, fixed_bool_t op1, fixed_bool_t op2) {
   return svsel(pg, op1, op2);
@@ -260,20 +256,18 @@ fixed_float64_t call_float64_fs(svbool_t pg, fixed_float64_t op1, svfloat64_t op
 // CHECK-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i8>* [[OP1]] to <vscale x 16 x i1>*
 // CHECK-NEXT:    store <vscale x 16 x i1> [[OP1_COERCE:%.*]], <vscale x 16 x i1>* [[TMP0]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[OP1]] to i64*
-// CHECK-NEXT:    [[OP112:%.*]] = load i64, i64* [[TMP1]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to i64*
-// CHECK-NEXT:    store i64 [[OP112]], i64* [[TMP2]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP3:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to <vscale x 16 x i1>*
-// CHECK-NEXT:    [[TMP4:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP3]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP5:%.*]] = call <vscale x 16 x i1> @llvm.aarch64.sve.sel.nxv16i1(<vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i1> [[TMP4]], <vscale x 16 x i1> [[OP2:%.*]])
-// CHECK-NEXT:    store <vscale x 16 x i1> [[TMP5]], <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]], align 16, [[TBAA13]]
-// CHECK-NEXT:    [[TMP6:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to i64*
-// CHECK-NEXT:    [[TMP7:%.*]] = load i64, i64* [[TMP6]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP8:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to i64*
-// CHECK-NEXT:    store i64 [[TMP7]], i64* [[TMP8]], align 16
-// CHECK-NEXT:    [[TMP9:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
-// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP9]]
+// CHECK-NEXT:    [[OP11:%.*]] = load <8 x i8>, <8 x i8>* [[OP1]], align 16, [[TBAA6]]
+// CHECK-NEXT:    store <8 x i8> [[OP11]], <8 x i8>* [[OP1_ADDR]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[OP1_ADDR]] to <vscale x 16 x i1>*
+// CHECK-NEXT:    [[TMP2:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP1]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP3:%.*]] = call <vscale x 16 x i1> @llvm.aarch64.sve.sel.nxv16i1(<vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i1> [[TMP2]], <vscale x 16 x i1> [[OP2:%.*]])
+// CHECK-NEXT:    store <vscale x 16 x i1> [[TMP3]], <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]], align 16, [[TBAA13]]
+// CHECK-NEXT:    [[CASTFIXEDSVE:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to <8 x i8>*
+// CHECK-NEXT:    [[TMP4:%.*]] = load <8 x i8>, <8 x i8>* [[CASTFIXEDSVE]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[RETVAL_0__SROA_CAST:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to <8 x i8>*
+// CHECK-NEXT:    store <8 x i8> [[TMP4]], <8 x i8>* [[RETVAL_0__SROA_CAST]], align 16
+// CHECK-NEXT:    [[TMP5:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
+// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP5]]
 //
 fixed_bool_t call_bool_fs(svbool_t pg, fixed_bool_t op1, svbool_t op2) {
   return svsel(pg, op1, op2);
@@ -325,12 +319,12 @@ fixed_float64_t call_float64_ss(svbool_t pg, svfloat64_t op1, svfloat64_t op2) {
 // CHECK-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = call <vscale x 16 x i1> @llvm.aarch64.sve.sel.nxv16i1(<vscale x 16 x i1> [[PG:%.*]], <vscale x 16 x i1> [[OP1:%.*]], <vscale x 16 x i1> [[OP2:%.*]])
 // CHECK-NEXT:    store <vscale x 16 x i1> [[TMP0]], <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]], align 16, [[TBAA13]]
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to i64*
-// CHECK-NEXT:    [[TMP2:%.*]] = load i64, i64* [[TMP1]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP3:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to i64*
-// CHECK-NEXT:    store i64 [[TMP2]], i64* [[TMP3]], align 16
-// CHECK-NEXT:    [[TMP4:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
-// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP4]]
+// CHECK-NEXT:    [[CASTFIXEDSVE:%.*]] = bitcast <vscale x 16 x i1>* [[SAVED_CALL_RVALUE]] to <8 x i8>*
+// CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i8>, <8 x i8>* [[CASTFIXEDSVE]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[RETVAL_0__SROA_CAST:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to <8 x i8>*
+// CHECK-NEXT:    store <8 x i8> [[TMP1]], <8 x i8>* [[RETVAL_0__SROA_CAST]], align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
+// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP2]]
 //
 fixed_bool_t call_bool_ss(svbool_t pg, svbool_t op1, svbool_t op2) {
   return svsel(pg, op1, op2);
diff --git a/clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c b/clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c
index 17267d6038e4..8568c820ae6f 100644
--- a/clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c
+++ b/clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c
@@ -81,13 +81,11 @@ fixed_float64_t from_svfloat64_t(svfloat64_t type) {
 // CHECK-NEXT:    [[TYPE_ADDR:%.*]] = alloca <8 x i8>, align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i8>* [[TYPE]] to <vscale x 16 x i1>*
 // CHECK-NEXT:    store <vscale x 16 x i1> [[TYPE_COERCE:%.*]], <vscale x 16 x i1>* [[TMP0]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[TYPE]] to i64*
-// CHECK-NEXT:    [[TYPE12:%.*]] = load i64, i64* [[TMP1]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8>* [[TYPE_ADDR]] to i64*
-// CHECK-NEXT:    store i64 [[TYPE12]], i64* [[TMP2]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP3:%.*]] = bitcast <8 x i8>* [[TYPE_ADDR]] to <vscale x 16 x i1>*
-// CHECK-NEXT:    [[TMP4:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP3]], align 16, [[TBAA6]]
-// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP4]]
+// CHECK-NEXT:    [[TYPE1:%.*]] = load <8 x i8>, <8 x i8>* [[TYPE]], align 16, [[TBAA6]]
+// CHECK-NEXT:    store <8 x i8> [[TYPE1]], <8 x i8>* [[TYPE_ADDR]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8>* [[TYPE_ADDR]] to <vscale x 16 x i1>*
+// CHECK-NEXT:    [[TMP2:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[TMP1]], align 16, [[TBAA6]]
+// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP2]]
 //
 svbool_t to_svbool_t(fixed_bool_t type) {
   return type;
@@ -98,12 +96,12 @@ svbool_t to_svbool_t(fixed_bool_t type) {
 // CHECK-NEXT:    [[TYPE_ADDR:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-NEXT:    store <vscale x 16 x i1> [[TYPE:%.*]], <vscale x 16 x i1>* [[TYPE_ADDR]], align 16, [[TBAA13:!tbaa !.*]]
-// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[TYPE_ADDR]] to i64*
-// CHECK-NEXT:    [[TMP1:%.*]] = load i64, i64* [[TMP0]], align 16, [[TBAA6]]
-// CHECK-NEXT:    [[TMP2:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to i64*
-// CHECK-NEXT:    store i64 [[TMP1]], i64* [[TMP2]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
-// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP3]]
+// CHECK-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[TYPE_ADDR]] to <8 x i8>*
+// CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i8>, <8 x i8>* [[TMP0]], align 16, [[TBAA6]]
+// CHECK-NEXT:    [[RETVAL_0__SROA_CAST:%.*]] = bitcast <vscale x 16 x i1>* [[RETVAL_COERCE]] to <8 x i8>*
+// CHECK-NEXT:    store <8 x i8> [[TMP1]], <8 x i8>* [[RETVAL_0__SROA_CAST]], align 16
+// CHECK-NEXT:    [[TMP2:%.*]] = load <vscale x 16 x i1>, <vscale x 16 x i1>* [[RETVAL_COERCE]], align 16
+// CHECK-NEXT:    ret <vscale x 16 x i1> [[TMP2]]
 //
 fixed_bool_t from_svbool_t(svbool_t type) {
   return type;
diff --git a/clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c b/clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
index 5babb9c7c410..8f0d3c9f97e7 100644
--- a/clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
+++ b/clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
@@ -72,9 +72,9 @@ void write_global_bf16(svbfloat16_t v) { global_bf16 = v; }
 // CHECK-512-NEXT:  entry:
 // CHECK-512-NEXT:    [[V_ADDR:%.*]] = alloca <vscale x 16 x i1>, align 16
 // CHECK-512-NEXT:    store <vscale x 16 x i1> [[V:%.*]], <vscale x 16 x i1>* [[V_ADDR]], align 16, [[TBAA13:!tbaa !.*]]
-// CHECK-512-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[V_ADDR]] to i64*
-// CHECK-512-NEXT:    [[TMP1:%.*]] = load i64, i64* [[TMP0]], align 16, [[TBAA10]]
-// CHECK-512-NEXT:    store i64 [[TMP1]], i64* bitcast (<8 x i8>* @global_bool to i64*), align 2, [[TBAA10]]
+// CHECK-512-NEXT:    [[TMP0:%.*]] = bitcast <vscale x 16 x i1>* [[V_ADDR]] to <8 x i8>*
+// CHECK-512-NEXT:    [[TMP1:%.*]] = load <8 x i8>, <8 x i8>* [[TMP0]], align 16, [[TBAA10]]
+// CHECK-512-NEXT:    store <8 x i8> [[TMP1]], <8 x i8>* @global_bool, align 2, [[TBAA10]]
 // CHECK-512-NEXT:    ret void
 //
 void write_global_bool(svbool_t v) { global_bool = v; }
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 328cbde4da67..b7bb34022ecc 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -554,42 +554,8 @@ static Instruction *combineLoadToOperationType(InstCombinerImpl &IC,
   if (LI.getPointerOperand()->isSwiftError())
     return nullptr;
-  Type *Ty = LI.getType();
   const DataLayout &DL = IC.getDataLayout();
-  // Try to canonicalize loads which are only ever stored to operate over
-  // integers instead of any other type. We only do this when the loaded type
-  // is sized and has a size exactly the same as its store size and the store
-  // size is a legal integer type.
-  // Do not perform canonicalization if minmax pattern is found (to avoid
-  // infinite loop).
-  Type *Dummy;
-  if (!Ty->isIntegerTy() && Ty->isSized() && !isa<ScalableVectorType>(Ty) &&
-      DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&
-      DL.typeSizeEqualsStoreSize(Ty) && !DL.isNonIntegralPointerType(Ty) &&
-      !isMinMaxWithLoads(InstCombiner::peekThroughBitcast(
-                             LI.getPointerOperand(), /*OneUseOnly=*/true),
-                         Dummy)) {
-    if (all_of(LI.users(), [&LI](User *U) {
-          auto *SI = dyn_cast<StoreInst>(U);
-          return SI && SI->getPointerOperand() != &LI &&
-                 !SI->getPointerOperand()->isSwiftError();
-        })) {
-      LoadInst *NewLoad = IC.combineLoadToNewType(
-          LI, Type::getIntNTy(LI.getContext(), DL.getTypeStoreSizeInBits(Ty)));
-      // Replace all the stores with stores of the newly loaded value.
-      for (auto UI = LI.user_begin(), UE = LI.user_end(); UI != UE;) {
-        auto *SI = cast<StoreInst>(*UI++);
-        IC.Builder.SetInsertPoint(SI);
-        combineStoreToNewValue(IC, *SI, NewLoad);
-        IC.eraseInstFromFunction(*SI);
-      }
-      assert(LI.use_empty() && "Failed to remove all users of the load!");
-      // Return the old load so the combiner can delete it safely.
-      return &LI;
-    }
-  }
-
   // Fold away bit casts of the loaded value by loading the desired type.
   // We can do this for BitCastInsts as well as casts from and to pointer types,
   // as long as those are noops (i.e., the source or dest type have the same
diff --git a/llvm/test/Transforms/InstCombine/atomic.ll b/llvm/test/Transforms/InstCombine/atomic.ll
index 382d23d15309..fc134bcd7e4b 100644
--- a/llvm/test/Transforms/InstCombine/atomic.ll
+++ b/llvm/test/Transforms/InstCombine/atomic.ll
@@ -325,11 +325,9 @@ declare void @clobber()
define i32 @test18(float* %p) {
 ; CHECK-LABEL: @test18(
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float* [[P:%.*]] to i32*
-; CHECK-NEXT:    [[X1:%.*]] = load atomic i32, i32* [[TMP1]] unordered, align 4
+; CHECK-NEXT:    [[X:%.*]] = load atomic float, float* [[P:%.*]] unordered, align 4
 ; CHECK-NEXT:    call void @clobber()
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast float* [[P]] to i32*
-; CHECK-NEXT:    store atomic i32 [[X1]], i32* [[TMP2]] unordered, align 4
+; CHECK-NEXT:    store atomic float [[X]], float* [[P]] unordered, align 4
 ; CHECK-NEXT:    ret i32 0
 ;
   %x = load atomic float, float* %p unordered, align 4
@@ -376,10 +374,8 @@ define i32 @test21(i32** %p, i8* %v) {
define void @pr27490a(i8** %p1, i8** %p2) {
 ; CHECK-LABEL: @pr27490a(
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8** [[P1:%.*]] to i64*
-; CHECK-NEXT:    [[L1:%.*]] = load i64, i64* [[TMP1]], align 8
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8** [[P2:%.*]] to i64*
-; CHECK-NEXT:    store volatile i64 [[L1]], i64* [[TMP2]], align 8
+; CHECK-NEXT:    [[L:%.*]] = load i8*, i8** [[P1:%.*]], align 8
+; CHECK-NEXT:    store volatile i8* [[L]], i8** [[P2:%.*]], align 8
 ; CHECK-NEXT:    ret void
 ;
   %l = load i8*, i8** %p1
@@ -389,10 +385,8 @@ define void @pr27490a(i8** %p1, i8** %p2) {
define void @pr27490b(i8** %p1, i8** %p2) {
 ; CHECK-LABEL: @pr27490b(
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8** [[P1:%.*]] to i64*
-; CHECK-NEXT:    [[L1:%.*]] = load i64, i64* [[TMP1]], align 8
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8** [[P2:%.*]] to i64*
-; CHECK-NEXT:    store atomic i64 [[L1]], i64* [[TMP2]] seq_cst, align 8
+; CHECK-NEXT:    [[L:%.*]] = load i8*, i8** [[P1:%.*]], align 8
+; CHECK-NEXT:    store atomic i8* [[L]], i8** [[P2:%.*]] seq_cst, align 8
 ; CHECK-NEXT:    ret void
 ;
   %l = load i8*, i8** %p1
diff --git a/llvm/test/Transforms/InstCombine/load.ll b/llvm/test/Transforms/InstCombine/load.ll
index 4fc87219fa83..032da41de6a5 100644
--- a/llvm/test/Transforms/InstCombine/load.ll
+++ b/llvm/test/Transforms/InstCombine/load.ll
@@ -205,18 +205,16 @@ define i8 @test15(i8 %x, i32 %y) {
 define void @test16(i8* %x, i8* %a, i8* %b, i8* %c) {
 ; CHECK-LABEL: @test16(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[X:%.*]] to i32*
-; CHECK-NEXT:    [[X11:%.*]] = load i32, i32* [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8* [[A:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X11]], i32* [[TMP1]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8* [[B:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X11]], i32* [[TMP2]], align 4
-; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i8* [[X]] to i32*
-; CHECK-NEXT:    [[X22:%.*]] = load i32, i32* [[TMP3]], align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i8* [[B]] to i32*
-; CHECK-NEXT:    store i32 [[X22]], i32* [[TMP4]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i8* [[C:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X22]], i32* [[TMP5]], align 4
+; CHECK-NEXT:    [[X_CAST:%.*]] = bitcast i8* [[X:%.*]] to float*
+; CHECK-NEXT:    [[A_CAST:%.*]] = bitcast i8* [[A:%.*]] to float*
+; CHECK-NEXT:    [[B_CAST:%.*]] = bitcast i8* [[B:%.*]] to float*
+; CHECK-NEXT:    [[X1:%.*]] = load float, float* [[X_CAST]], align 4
+; CHECK-NEXT:    store float [[X1]], float* [[A_CAST]], align 4
+; CHECK-NEXT:    store float [[X1]], float* [[B_CAST]], align 4
+; CHECK-NEXT:    [[X2:%.*]] = load float, float* [[X_CAST]], align 4
+; CHECK-NEXT:    store float [[X2]], float* [[B_CAST]], align 4
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[C:%.*]] to float*
+; CHECK-NEXT:    store float [[X2]], float* [[TMP0]], align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -240,18 +238,16 @@ entry:
 define void @test16-vect(i8* %x, i8* %a, i8* %b, i8* %c) {
 ; CHECK-LABEL: @test16-vect(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[X:%.*]] to i32*
-; CHECK-NEXT:    [[X11:%.*]] = load i32, i32* [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8* [[A:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X11]], i32* [[TMP1]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8* [[B:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X11]], i32* [[TMP2]], align 4
-; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i8* [[X]] to i32*
-; CHECK-NEXT:    [[X22:%.*]] = load i32, i32* [[TMP3]], align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i8* [[B]] to i32*
-; CHECK-NEXT:    store i32 [[X22]], i32* [[TMP4]], align 4
-; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i8* [[C:%.*]] to i32*
-; CHECK-NEXT:    store i32 [[X22]], i32* [[TMP5]], align 4
+; CHECK-NEXT:    [[X_CAST:%.*]] = bitcast i8* [[X:%.*]] to <4 x i8>*
+; CHECK-NEXT:    [[A_CAST:%.*]] = bitcast i8* [[A:%.*]] to <4 x i8>*
+; CHECK-NEXT:    [[B_CAST:%.*]] = bitcast i8* [[B:%.*]] to <4 x i8>*
+; CHECK-NEXT:    [[X1:%.*]] = load <4 x i8>, <4 x i8>* [[X_CAST]], align 4
+; CHECK-NEXT:    store <4 x i8> [[X1]], <4 x i8>* [[A_CAST]], align 4
+; CHECK-NEXT:    store <4 x i8> [[X1]], <4 x i8>* [[B_CAST]], align 4
+; CHECK-NEXT:    [[X2:%.*]] = load <4 x i8>, <4 x i8>* [[X_CAST]], align 4
+; CHECK-NEXT:    store <4 x i8> [[X2]], <4 x i8>* [[B_CAST]], align 4
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[C:%.*]] to <4 x i8>*
+; CHECK-NEXT:    store <4 x i8> [[X2]], <4 x i8>* [[TMP0]], align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
diff --git a/llvm/test/Transforms/InstCombine/loadstore-metadata.ll b/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
index e443c6ed0075..42f6900c2ab5 100644
--- a/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
+++ b/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
@@ -161,24 +161,11 @@ exit:
 }
define void @test_load_cast_combine_nonnull(float** %ptr) {
-; We can't preserve nonnull metadata when converting a load of a pointer to
-; a load of an integer. Instead, we translate it to range metadata.
-; FIXME: We should also transform range metadata back into nonnull metadata.
-; FIXME: This test is very fragile. If any LABEL lines are added after
-; this point, the test will fail, because this test depends on a metadata tuple,
-; which is always emitted at the end of the file. At some point, we should
-; consider an option to the IR printer to emit MD tuples after the function
-; that first uses them--this will allow us to refer to them like this and not
-; have the tests break. For now, this function must always come last in this
-; file, and no LABEL lines are to be added after this point.
-;
 ; CHECK-LABEL: @test_load_cast_combine_nonnull(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast float** [[PTR:%.*]] to i64*
-; CHECK-NEXT:    [[P1:%.*]] = load i64, i64* [[TMP0]], align 8, !range ![[MD:[0-9]+]]
+; CHECK-NEXT:    [[P:%.*]] = load float*, float** [[PTR:%.*]], align 8, !nonnull !7
 ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr float*, float** [[PTR]], i64 42
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float** [[GEP]] to i64*
-; CHECK-NEXT:    store i64 [[P1]], i64* [[TMP1]], align 8
+; CHECK-NEXT:    store float* [[P]], float** [[GEP]], align 8
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -188,8 +175,6 @@ entry:
   ret void
 }
-; This is the metadata tuple that we reference above:
-; CHECK: ![[MD]] = !{i64 1, i64 0}
 !0 = !{!1, !1, i64 0}
 !1 = !{!"scalar type", !2}
 !2 = !{!"root"}
diff --git a/llvm/test/Transforms/InstCombine/non-integral-pointers.ll b/llvm/test/Transforms/InstCombine/non-integral-pointers.ll
index e8f0013604a9..a1166bf49193 100644
--- a/llvm/test/Transforms/InstCombine/non-integral-pointers.ll
+++ b/llvm/test/Transforms/InstCombine/non-integral-pointers.ll
@@ -41,10 +41,8 @@ define void @f_3(i8 addrspace(3)** %ptr0, i8 addrspace(3)** %ptr1) {
 ; integers, since pointers in address space 3 are integral.
 ; CHECK-LABEL: @f_3(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8 addrspace(3)** [[PTR0:%.*]] to i64*
-; CHECK-NEXT:    [[VAL1:%.*]] = load i64, i64* [[TMP0]], align 8
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8 addrspace(3)** [[PTR1:%.*]] to i64*
-; CHECK-NEXT:    store i64 [[VAL1]], i64* [[TMP1]], align 8
+; CHECK-NEXT:    [[VAL:%.*]] = load i8 addrspace(3)*, i8 addrspace(3)** [[PTR0:%.*]], align 8
+; CHECK-NEXT:    store i8 addrspace(3)* [[VAL]], i8 addrspace(3)** [[PTR1:%.*]], align 8
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -79,13 +77,13 @@ define i64 @g(i8 addrspace(4)** %gp) {
define i64 @g2(i8* addrspace(4)* %gp) {
 ; CHECK-LABEL: @g2(
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8* addrspace(4)* [[GP:%.*]] to i64 addrspace(4)*
-; CHECK-NEXT:    [[DOTPRE1:%.*]] = load i64, i64 addrspace(4)* [[TMP1]], align 8
+; CHECK-NEXT:    [[DOTPRE:%.*]] = load i8*, i8* addrspace(4)* [[GP:%.*]], align 8
 ; CHECK-NEXT:    [[V74:%.*]] = call i8 addrspace(4)* @alloc()
 ; CHECK-NEXT:    [[V77:%.*]] = getelementptr i8, i8 addrspace(4)* [[V74]], i64 -8
-; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i8 addrspace(4)* [[V77]] to i64 addrspace(4)*
-; CHECK-NEXT:    store i64 [[DOTPRE1]], i64 addrspace(4)* [[TMP2]], align 8
-; CHECK-NEXT:    ret i64 [[DOTPRE1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i8 addrspace(4)* [[V77]] to i8* addrspace(4)*
+; CHECK-NEXT:    store i8* [[DOTPRE]], i8* addrspace(4)* [[TMP1]], align 8
+; CHECK-NEXT:    [[V81_CAST:%.*]] = ptrtoint i8* [[DOTPRE]] to i64
+; CHECK-NEXT:    ret i64 [[V81_CAST]]
 ;
   %.pre = load i8*, i8* addrspace(4)* %gp, align 8
   %v74 = call i8 addrspace(4)* @alloc()
diff --git a/llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll b/llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
index 6de0282e9448..3308a0ecc7fa 100644
--- a/llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
+++ b/llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
@@ -50,10 +50,10 @@ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16
 define dso_local void @_Z3gen1S(%0* noalias sret align 8 %arg, %0* byval(%0) align 8 %arg1) {
 ; CHECK-LABEL: @_Z3gen1S(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast %0* [[ARG1:%.*]] to i64*
-; CHECK-NEXT:    [[I21:%.*]] = load i64, i64* [[TMP0]], align 8
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast %0* [[ARG:%.*]] to i64*
-; CHECK-NEXT:    store i64 [[I21]], i64* [[TMP1]], align 8
+; CHECK-NEXT:    [[I:%.*]] = getelementptr inbounds [[TMP0:%.*]], %0* [[ARG1:%.*]], i64 0, i32 0
+; CHECK-NEXT:    [[I2:%.*]] = load i32*, i32** [[I]], align 8
+; CHECK-NEXT:    [[I3:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:    store i32* [[I2]], i32** [[I3]], align 8
 ; CHECK-NEXT:    ret void
 ;
 bb:
@@ -68,13 +68,12 @@ define dso_local i32* @_Z3foo1S(%0* byval(%0) align 8 %arg) {
 ; CHECK-LABEL: @_Z3foo1S(
 ; CHECK-NEXT:  bb:
 ; CHECK-NEXT:    [[I2:%.*]] = alloca [[TMP0:%.*]], align 8
-; CHECK-NEXT:    [[I1_SROA_0_0_I5_SROA_CAST:%.*]] = bitcast %0* [[ARG:%.*]] to i64*
-; CHECK-NEXT:    [[I1_SROA_0_0_COPYLOAD:%.*]] = load i64, i64* [[I1_SROA_0_0_I5_SROA_CAST]], align 8
-; CHECK-NEXT:    [[I_SROA_0_0_I6_SROA_CAST:%.*]] = bitcast %0* [[I2]] to i64*
-; CHECK-NEXT:    store i64 [[I1_SROA_0_0_COPYLOAD]], i64* [[I_SROA_0_0_I6_SROA_CAST]], align 8
+; CHECK-NEXT:    [[I1_SROA_0_0_I5_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:    [[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I5_SROA_IDX]], align 8
+; CHECK-NEXT:    [[I_SROA_0_0_I6_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[I2]], i64 0, i32 0
+; CHECK-NEXT:    store i32* [[I1_SROA_0_0_COPYLOAD]], i32** [[I_SROA_0_0_I6_SROA_IDX]], align 8
 ; CHECK-NEXT:    tail call void @_Z7escape01S(%0* nonnull byval(%0) align 8 [[I2]])
-; CHECK-NEXT:    [[TMP0]] = inttoptr i64 [[I1_SROA_0_0_COPYLOAD]] to i32*
-; CHECK-NEXT:    ret i32* [[TMP0]]
+; CHECK-NEXT:    ret i32* [[I1_SROA_0_0_COPYLOAD]]
 ;
 bb:
   %i = alloca %0, align 8
@@ -108,24 +107,21 @@ declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture)
 define dso_local i32* @_Z3bar1S(%0* byval(%0) align 8 %arg) {
 ; CHECK-LABEL: @_Z3bar1S(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:    [[I1_SROA_0_0_I4_SROA_CAST:%.*]] = bitcast %0* [[ARG:%.*]] to i64*
-; CHECK-NEXT:    [[I1_SROA_0_0_COPYLOAD:%.*]] = load i64, i64* [[I1_SROA_0_0_I4_SROA_CAST]], align 8
+; CHECK-NEXT:    [[I1_SROA_0_0_I4_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0:%.*]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:    [[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I4_SROA_IDX]], align 8
 ; CHECK-NEXT:    [[I5:%.*]] = tail call i32 @_Z4condv()
 ; CHECK-NEXT:    [[I6_NOT:%.*]] = icmp eq i32 [[I5]], 0
 ; CHECK-NEXT:    br i1 [[I6_NOT]], label [[BB10:%.*]], label [[BB7:%.*]]
 ; CHECK:       bb7:
 ; CHECK-NEXT:    tail call void @_Z5sync0v()
-; CHECK-NEXT:    [[TMP0:%.*]] = inttoptr i64 [[I1_SROA_0_0_COPYLOAD]] to i32*
-; CHECK-NEXT:    tail call void @_Z7escape0Pi(i32* [[TMP0]])
+; CHECK-NEXT:    tail call void @_Z7escape0Pi(i32* [[I1_SROA_0_0_COPYLOAD]])
 ; CHECK-NEXT:    br label [[BB13:%.*]]
 ; CHECK:       bb10:
 ; CHECK-NEXT:    tail call void @_Z5sync1v()
-; CHECK-NEXT:    [[TMP1:%.*]] = inttoptr i64 [[I1_SROA_0_0_COPYLOAD]] to i32*
-; CHECK-NEXT:    tail call void @_Z7escape1Pi(i32* [[TMP1]])
+; CHECK-NEXT:    tail call void @_Z7escape1Pi(i32* [[I1_SROA_0_0_COPYLOAD]])
 ; CHECK-NEXT:    br label [[BB13]]
 ; CHECK:       bb13:
-; CHECK-NEXT:    [[DOTPRE_PHI:%.*]] = phi i32* [ [[TMP1]], [[BB10]] ], [ [[TMP0]], [[BB7]] ]
-; CHECK-NEXT:    ret i32* [[DOTPRE_PHI]]
+; CHECK-NEXT:    ret i32* [[I1_SROA_0_0_COPYLOAD]]
 ;
 bb:
   %i = alloca %0, align 8
</cut>

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-release-aarch64-spec2k6-O2 - Build # 9 - Fixed!