linaro-toolchain August 2021

linaro-toolchain@lists.linaro.org

12 participants
87 discussions

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O2_LTO - Build # 10 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTO Culprit: <cut> commit 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e Author: peter klausler <pklausler(a)nvidia.com> Date: Tue Feb 2 10:51:14 2021 -0800 [flang] Add -fsyntax-only to f18; retain -fparse-only synonym Now that semantics is working, the standard -fsyntax-only option of GNU and Clang should be used as the name of the option that causes f18 to just run the front-end. Support both options in f18, silently accepting the old option as a synonym for the new one (as preferred by the code owner), and replace all instances of the old -fparse-only option with -fsyntax-only throughout the source base. Differential Revision: https://reviews.llvm.org/D95887 </cut> Results regressed to (for first_bad == 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e/results_id: 1 # 445.gobmk,gobmk_base.default regressed by 103 from (for last_good == 81b69879c946533c71cc484bd8d9202bf1e34bfe) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO_marm artifacts/build-81b69879c946533c71cc484bd8d9202bf1e34bfe/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3985 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O2_LTO/3981 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e cd investigate-llvm-34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e ../artifacts/test.sh # Reproduce last_good build git checkout --detach 81b69879c946533c71cc484bd8d9202bf1e34bfe ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 34eb0adaa9cd76f5bb0549f5a32d6ef452fe977e Author: peter klausler <pklausler(a)nvidia.com> Date: Tue Feb 2 10:51:14 2021 -0800 [flang] Add -fsyntax-only to f18; retain -fparse-only synonym Now that semantics is working, the standard -fsyntax-only option of GNU and Clang should be used as the name of the option that causes f18 to just run the front-end. Support both options in f18, silently accepting the old option as a synonym for the new one (as preferred by the code owner), and replace all instances of the old -fparse-only option with -fsyntax-only throughout the source base. Differential Revision: https://reviews.llvm.org/D95887 --- flang/docs/ImplementingASemanticCheck.md | 2 +- flang/docs/Overview.md | 4 ++-- flang/test/Evaluate/test_folding.sh | 2 +- flang/test/Flang-Driver/parse-error.f95 | 2 +- flang/test/Flang-Driver/syntax-only.f90 | 2 +- flang/test/Frontend/prescanner-diag.f90 | 2 +- flang/test/Lower/pre-fir-tree01.f90 | 2 +- flang/test/Lower/pre-fir-tree02.f90 | 2 +- flang/test/Lower/pre-fir-tree03.f90 | 2 +- flang/test/Lower/pre-fir-tree04.f90 | 2 +- flang/test/Lower/pre-fir-tree05.f90 | 2 +- flang/test/Semantics/call17.f90 | 2 +- flang/test/Semantics/data05.f90 | 2 +- flang/test/Semantics/data08.f90 | 2 +- flang/test/Semantics/data09.f90 | 2 +- flang/test/Semantics/empty.f90 | 4 ++-- flang/test/Semantics/final02.f90 | 2 +- flang/test/Semantics/getdefinition01.f90 | 8 ++++---- flang/test/Semantics/getdefinition02.f | 6 +++--- flang/test/Semantics/getdefinition03-a.f90 | 4 ++-- flang/test/Semantics/getdefinition04.f90 | 2 +- flang/test/Semantics/getdefinition05.f90 | 4 ++-- flang/test/Semantics/getsymbols01.f90 | 2 +- flang/test/Semantics/getsymbols02.f90 | 6 +++--- flang/test/Semantics/getsymbols03-a.f90 | 2 +- flang/test/Semantics/getsymbols04.f90 | 2 +- flang/test/Semantics/getsymbols05.f90 | 2 +- flang/test/Semantics/missing_newline.f90 | 4 ++-- flang/test/Semantics/mod-file-rewriter.f90 | 8 ++++---- flang/test/Semantics/modifiable01.f90 | 2 +- flang/test/Semantics/offsets01.f90 | 2 +- flang/test/Semantics/offsets02.f90 | 2 +- flang/test/Semantics/offsets03.f90 | 2 +- flang/test/Semantics/oldparam01.f90 | 2 +- flang/test/Semantics/oldparam02.f90 | 2 +- flang/test/Semantics/oldparam03.f90 | 2 +- flang/test/Semantics/resolve100.f90 | 2 +- flang/test/Semantics/rewrite01.f90 | 2 +- flang/test/Semantics/test_errors.sh | 2 +- flang/test/Semantics/test_modfile.sh | 2 +- flang/test/Semantics/typeinfo01.f90 | 2 +- flang/tools/f18-parse-demo/f18-parse-demo.cpp | 10 +++++----- flang/tools/f18/CMakeLists.txt | 2 +- flang/tools/f18/f18.cpp | 10 +++++----- 44 files changed, 67 insertions(+), 67 deletions(-) diff --git a/flang/docs/ImplementingASemanticCheck.md b/flang/docs/ImplementingASemanticCheck.md index 35b107e4988e..4a6fe133f21f 100644 --- a/flang/docs/ImplementingASemanticCheck.md +++ b/flang/docs/ImplementingASemanticCheck.md @@ -67,7 +67,7 @@ of the call to `intentOutFunc()`: I also used this program to produce a parse tree for the program using the command: ```bash - f18 -fdebug-dump-parse-tree -fparse-only testfun.f90 + f18 -fdebug-dump-parse-tree -fsyntax-only testfun.f90 ``` Here's the relevant fragment of the parse tree produced by the compiler: diff --git a/flang/docs/Overview.md b/flang/docs/Overview.md index 987858943845..4ef04f865083 100644 --- a/flang/docs/Overview.md +++ b/flang/docs/Overview.md @@ -46,7 +46,7 @@ See: [Preprocessing.md](Preprocessing.md). **Entry point:** `parser::Parsing::Parse` **Command:** - - `f18 -fdebug-dump-parse-tree -fparse-only src.f90` dumps the parse tree + - `f18 -fdebug-dump-parse-tree -fsyntax-only src.f90` dumps the parse tree - `f18 -funparse src.f90` converts the parse tree to normalized Fortran ## Validate Labels and Canonicalize Do Statements @@ -74,7 +74,7 @@ See: [Preprocessing.md](Preprocessing.md). **Entry points:** `semantics::ResolveNames`, `semantics::RewriteParseTree` -**Command:** `f18 -fdebug-dump-symbols -fparse-only src.f90` dumps the +**Command:** `f18 -fdebug-dump-symbols -fsyntax-only src.f90` dumps the tree of scopes and symbols in each scope ## Check DO CONCURRENT Constraints diff --git a/flang/test/Evaluate/test_folding.sh b/flang/test/Evaluate/test_folding.sh index 81ecbea55274..951ef36ecd2c 100755 --- a/flang/test/Evaluate/test_folding.sh +++ b/flang/test/Evaluate/test_folding.sh @@ -32,7 +32,7 @@ temp=$1 mkdir -p $temp shift -CMD="$* -fdebug-dump-symbols -fparse-only" +CMD="$* -fdebug-dump-symbols -fsyntax-only" # Check if tests should assume folding is using libpgmath if [[ $LIBPGMATH ]]; then diff --git a/flang/test/Flang-Driver/parse-error.f95 b/flang/test/Flang-Driver/parse-error.f95 index 84a63665659d..8266d042df02 100644 --- a/flang/test/Flang-Driver/parse-error.f95 +++ b/flang/test/Flang-Driver/parse-error.f95 @@ -1,5 +1,5 @@ ! RUN: not %flang-new -fc1 -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix=ERROR -! RUN: not %f18 -parse-only %s 2>&1 | FileCheck %s --check-prefix=ERROR +! RUN: not %f18 -syntax-only %s 2>&1 | FileCheck %s --check-prefix=ERROR ! REQUIRES: new-flang-driver diff --git a/flang/test/Flang-Driver/syntax-only.f90 b/flang/test/Flang-Driver/syntax-only.f90 index f04dd713aeab..0eed3cca2fe9 100644 --- a/flang/test/Flang-Driver/syntax-only.f90 +++ b/flang/test/Flang-Driver/syntax-only.f90 @@ -1,5 +1,5 @@ ! RUN: not %flang-new -fc1 -fsyntax-only %s 2>&1 | FileCheck %s -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! REQUIRES: new-flang-driver diff --git a/flang/test/Frontend/prescanner-diag.f90 b/flang/test/Frontend/prescanner-diag.f90 index 4c7e6e369beb..26536092b41b 100644 --- a/flang/test/Frontend/prescanner-diag.f90 +++ b/flang/test/Frontend/prescanner-diag.f90 @@ -6,7 +6,7 @@ ! RUN: %flang-new -fc1 -E -I %S/Inputs/ %s 2>&1 | FileCheck %s ! Test with -fsyntax-only (i.e. ParseSyntaxOnlyAction, stops after semantic checks) -! RUN: %f18 -fparse-only -I %S/Inputs/ %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s ! RUN: %flang-new -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s ! RUN: %flang-new -fc1 -fsyntax-only -I %S/Inputs/ %s 2>&1 | FileCheck %s diff --git a/flang/test/Lower/pre-fir-tree01.f90 b/flang/test/Lower/pre-fir-tree01.f90 index 6b27add4659f..5e59ff784f97 100644 --- a/flang/test/Lower/pre-fir-tree01.f90 +++ b/flang/test/Lower/pre-fir-tree01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test structure of the Pre-FIR tree diff --git a/flang/test/Lower/pre-fir-tree02.f90 b/flang/test/Lower/pre-fir-tree02.f90 index 2d50a9394985..1db49605d98d 100644 --- a/flang/test/Lower/pre-fir-tree02.f90 +++ b/flang/test/Lower/pre-fir-tree02.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test Pre-FIR Tree captures all the intended nodes from the parse-tree ! Coarray and OpenMP related nodes are tested in other files. diff --git a/flang/test/Lower/pre-fir-tree03.f90 b/flang/test/Lower/pre-fir-tree03.f90 index 1c8651b64f83..efc923a3fe84 100644 --- a/flang/test/Lower/pre-fir-tree03.f90 +++ b/flang/test/Lower/pre-fir-tree03.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only -fopenmp %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only -fopenmp %s | FileCheck %s ! Test Pre-FIR Tree captures OpenMP related constructs diff --git a/flang/test/Lower/pre-fir-tree04.f90 b/flang/test/Lower/pre-fir-tree04.f90 index 34212fbb1ff0..3f2beaf5fb47 100644 --- a/flang/test/Lower/pre-fir-tree04.f90 +++ b/flang/test/Lower/pre-fir-tree04.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only %s | FileCheck %s ! Test Pre-FIR Tree captures all the coarray related statements diff --git a/flang/test/Lower/pre-fir-tree05.f90 b/flang/test/Lower/pre-fir-tree05.f90 index 98af5c2de944..3acc38b20d35 100644 --- a/flang/test/Lower/pre-fir-tree05.f90 +++ b/flang/test/Lower/pre-fir-tree05.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-pre-fir-tree -fparse-only -fopenacc %s | FileCheck %s +! RUN: %f18 -fdebug-pre-fir-tree -fsyntax-only -fopenacc %s | FileCheck %s ! Test structure of the Pre-FIR tree with OpenACC construct diff --git a/flang/test/Semantics/call17.f90 b/flang/test/Semantics/call17.f90 index 1f4d2c4d9186..ace626037dd8 100644 --- a/flang/test/Semantics/call17.f90 +++ b/flang/test/Semantics/call17.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only $s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only $s 2>&1 | FileCheck %s ! Regression test: don't emit a bogus error about an invalid specification expression ! in the declaration of a binding diff --git a/flang/test/Semantics/data05.f90 b/flang/test/Semantics/data05.f90 index a138b067942e..f3c7aa9be6e3 100644 --- a/flang/test/Semantics/data05.f90 +++ b/flang/test/Semantics/data05.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s module m interface integer function ifunc(n) diff --git a/flang/test/Semantics/data08.f90 b/flang/test/Semantics/data08.f90 index 86a87f90163f..4040ac8e148d 100644 --- a/flang/test/Semantics/data08.f90 +++ b/flang/test/Semantics/data08.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK: DATA statement value initializes 'jx' of type 'INTEGER(4)' with CHARACTER ! CHECK: DATA statement value initializes 'jy' of type 'INTEGER(4)' with CHARACTER ! CHECK: DATA statement value initializes 'jz' of type 'INTEGER(4)' with CHARACTER diff --git a/flang/test/Semantics/data09.f90 b/flang/test/Semantics/data09.f90 index 1550787b0d4c..4510b830ef7f 100644 --- a/flang/test/Semantics/data09.f90 +++ b/flang/test/Semantics/data09.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only -fdebug-dump-symbols %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -fdebug-dump-symbols %s 2>&1 | FileCheck %s ! CHECK: init:[INTEGER(4)::1065353216_4,1073741824_4,1077936128_4,1082130432_4] ! Verify that the closure of EQUIVALENCE'd symbols with any DATA ! initialization produces a combined initializer. diff --git a/flang/test/Semantics/empty.f90 b/flang/test/Semantics/empty.f90 index e47c2e65342c..ff8f64258741 100644 --- a/flang/test/Semantics/empty.f90 +++ b/flang/test/Semantics/empty.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only %s +! RUN: %f18 -fsyntax-only %s ! RUN: rm -rf %t && mkdir %t ! RUN: touch %t/empty.f90 -! RUN: %f18 -fparse-only %t/empty.f90 +! RUN: %f18 -fsyntax-only %t/empty.f90 diff --git a/flang/test/Semantics/final02.f90 b/flang/test/Semantics/final02.f90 index b58f91f3a228..f613a4228481 100644 --- a/flang/test/Semantics/final02.f90 +++ b/flang/test/Semantics/final02.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fparse-only %s 2>&1 | FileCheck %s +!RUN: %f18 -fsyntax-only %s 2>&1 | FileCheck %s module m type :: t1 integer :: n diff --git a/flang/test/Semantics/getdefinition01.f90 b/flang/test/Semantics/getdefinition01.f90 index 57141b671754..06f2cc0f8dda 100644 --- a/flang/test/Semantics/getdefinition01.f90 +++ b/flang/test/Semantics/getdefinition01.f90 @@ -16,12 +16,12 @@ contains end module ! RUN and CHECK lines at the bottom as this test is sensitive to line numbers -! RUN: %f18 -fget-definition 6 17 18 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 7 20 23 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s -! RUN: %f18 -fget-definition 14 3 4 -fparse-only %s | FileCheck --check-prefix=CHECK3 %s +! RUN: %f18 -fget-definition 6 17 18 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 7 20 23 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 14 3 4 -fsyntax-only %s | FileCheck --check-prefix=CHECK3 %s ! CHECK1: x:{{.*}}getdefinition01.f90, 5, 21-22 ! CHECK2: yyy:{{.*}}getdefinition01.f90, 5, 24-27 ! CHECK3: x:{{.*}}getdefinition01.f90, 13, 24-25 -! RUN: not %f18 -fget-definition -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s +! RUN: not %f18 -fget-definition -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s ! CHECK-ERROR: Invalid argument to -fget-definitions diff --git a/flang/test/Semantics/getdefinition02.f b/flang/test/Semantics/getdefinition02.f index ee7a96750da5..b32536968dc0 100644 --- a/flang/test/Semantics/getdefinition02.f +++ b/flang/test/Semantics/getdefinition02.f @@ -17,9 +17,9 @@ end module ! RUN and CHECK lines here as test is sensitive to line numbers -! RUN: %f18 -fget-definition 7 9 10 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 8 26 29 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK2 %s -! RUN: %f18 -fget-definition 15 9 10 -fparse-only %s 2>&1 | FileCheck --check-prefix=CHECK3 %s +! RUN: %f18 -fget-definition 7 9 10 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 8 26 29 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 15 9 10 -fsyntax-only %s 2>&1 | FileCheck --check-prefix=CHECK3 %s ! CHECK1: x:{{.*}}getdefinition02.f, 5, 27-28 ! CHECK2: yyy:{{.*}}getdefinition02.f, 5, 30-33 ! CHECK3: x:{{.*}}getdefinition02.f, 14, 30-31 diff --git a/flang/test/Semantics/getdefinition03-a.f90 b/flang/test/Semantics/getdefinition03-a.f90 index ecf8c9b48389..6e61637b3546 100644 --- a/flang/test/Semantics/getdefinition03-a.f90 +++ b/flang/test/Semantics/getdefinition03-a.f90 @@ -7,7 +7,7 @@ program main x = f end program -! RUN: %f18 -fget-definition 7 6 7 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s -! RUN: %f18 -fget-definition 7 2 3 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 7 6 7 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 7 2 3 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s ! CHECK1: f:{{.*}}getdefinition03-b.f90, 2, 12-13 ! CHECK2: x:{{.*}}getdefinition03-a.f90, 6, 13-14 diff --git a/flang/test/Semantics/getdefinition04.f90 b/flang/test/Semantics/getdefinition04.f90 index 72429599595b..bc01f7979b42 100644 --- a/flang/test/Semantics/getdefinition04.f90 +++ b/flang/test/Semantics/getdefinition04.f90 @@ -6,5 +6,5 @@ program main x = y end program -! RUN: %f18 -fget-definition 6 3 4 -fparse-only %s | FileCheck %s +! RUN: %f18 -fget-definition 6 3 4 -fsyntax-only %s | FileCheck %s ! CHECK: x:{{.*}}getdefinition04.f90, 3, 14-15 diff --git a/flang/test/Semantics/getdefinition05.f90 b/flang/test/Semantics/getdefinition05.f90 index 315e9570d352..91952bb7fcc3 100644 --- a/flang/test/Semantics/getdefinition05.f90 +++ b/flang/test/Semantics/getdefinition05.f90 @@ -12,8 +12,8 @@ program main end program !! Inner x -! RUN: %f18 -fget-definition 9 5 6 -fparse-only %s | FileCheck --check-prefix=CHECK1 %s +! RUN: %f18 -fget-definition 9 5 6 -fsyntax-only %s | FileCheck --check-prefix=CHECK1 %s ! CHECK1: x:{{.*}}getdefinition05.f90, 7, 16-17 !! Outer y -! RUN: %f18 -fget-definition 11 7 8 -fparse-only %s | FileCheck --check-prefix=CHECK2 %s +! RUN: %f18 -fget-definition 11 7 8 -fsyntax-only %s | FileCheck --check-prefix=CHECK2 %s ! CHECK2: y:{{.*}}getdefinition05.f90, 5, 14-15 diff --git a/flang/test/Semantics/getsymbols01.f90 b/flang/test/Semantics/getsymbols01.f90 index bdb7bf053823..d26aa774ace4 100644 --- a/flang/test/Semantics/getsymbols01.f90 +++ b/flang/test/Semantics/getsymbols01.f90 @@ -15,7 +15,7 @@ contains end function end module -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK-COUNT-1:f:{{.*}}getsymbols01.f90, 12, 26-27 ! CHECK-COUNT-1:mm1:{{.*}}getsymbols01.f90, 2, 8-11 ! CHECK-COUNT-1:s:{{.*}}getsymbols01.f90, 5, 18-19 diff --git a/flang/test/Semantics/getsymbols02.f90 b/flang/test/Semantics/getsymbols02.f90 index 0119ab16daa8..1667548f81c3 100644 --- a/flang/test/Semantics/getsymbols02.f90 +++ b/flang/test/Semantics/getsymbols02.f90 @@ -7,8 +7,8 @@ PROGRAM helloworld i = callget5() ENDPROGRAM -! RUN: %f18 -fparse-only %S/Inputs/getsymbols02-a.f90 -! RUN: %f18 -fparse-only %S/Inputs/getsymbols02-b.f90 -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only %S/Inputs/getsymbols02-a.f90 +! RUN: %f18 -fsyntax-only %S/Inputs/getsymbols02-b.f90 +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK: callget5: .{{[/\\]}}mm2b.mod, ! CHECK: get5: .{{[/\\]}}mm2a.mod, diff --git a/flang/test/Semantics/getsymbols03-a.f90 b/flang/test/Semantics/getsymbols03-a.f90 index 3cbba425d875..fddf513bcc51 100644 --- a/flang/test/Semantics/getsymbols03-a.f90 +++ b/flang/test/Semantics/getsymbols03-a.f90 @@ -7,7 +7,7 @@ program main x = f end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:f:{{.*}}getsymbols03-b.f90, 2, 12-13 ! CHECK:main:{{.*}}getsymbols03-a.f90, 4, 9-13 ! CHECK:mm3:{{.*}}getsymbols03-a.f90, 5, 6-9 diff --git a/flang/test/Semantics/getsymbols04.f90 b/flang/test/Semantics/getsymbols04.f90 index fc9b177abd90..ac8f2d0a7e44 100644 --- a/flang/test/Semantics/getsymbols04.f90 +++ b/flang/test/Semantics/getsymbols04.f90 @@ -6,7 +6,7 @@ program main x = y end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:x:{{.*}}getsymbols04.f90, 3, 14-15 ! CHECK:x:{{.*}}getsymbols04.f90, 5, 11-12 ! CHECK:y:{{.*}}getsymbols04.f90, 4, 14-15 diff --git a/flang/test/Semantics/getsymbols05.f90 b/flang/test/Semantics/getsymbols05.f90 index 624f37a74b76..6b07678e42d0 100644 --- a/flang/test/Semantics/getsymbols05.f90 +++ b/flang/test/Semantics/getsymbols05.f90 @@ -9,7 +9,7 @@ program main x = y end program -! RUN: %f18 -fget-symbols-sources -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -fget-symbols-sources -fsyntax-only %s 2>&1 | FileCheck %s ! CHECK:x:{{.*}}getsymbols05.f90, 3, 14-15 ! CHECK:x:{{.*}}getsymbols05.f90, 6, 16-17 ! CHECK:y:{{.*}}getsymbols05.f90, 4, 14-15 diff --git a/flang/test/Semantics/missing_newline.f90 b/flang/test/Semantics/missing_newline.f90 index 6dfafba7db86..82f9c9ceb612 100644 --- a/flang/test/Semantics/missing_newline.f90 +++ b/flang/test/Semantics/missing_newline.f90 @@ -1,4 +1,4 @@ ! RUN: echo -n "end program" > %t.f90 -! RUN: %f18 -fparse-only %t.f90 +! RUN: %f18 -fsyntax-only %t.f90 ! RUN: echo -ne "\rend program" > %t.f90 -! RUN: %f18 -fparse-only %t.f90 +! RUN: %f18 -fsyntax-only %t.f90 diff --git a/flang/test/Semantics/mod-file-rewriter.f90 b/flang/test/Semantics/mod-file-rewriter.f90 index 81252910e690..2856dd6dbdf3 100644 --- a/flang/test/Semantics/mod-file-rewriter.f90 +++ b/flang/test/Semantics/mod-file-rewriter.f90 @@ -1,8 +1,8 @@ ! RUN: rm -fr %t && mkdir %t && cd %t -! RUN: %f18 -fparse-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %p/Inputs/mod-file-unchanged.f90 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED -! RUN: %f18 -fparse-only -fdebug-module-writer %p/Inputs/mod-file-changed.f90 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %s 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %p/Inputs/mod-file-unchanged.f90 2>&1 | FileCheck %s --check-prefix CHECK_UNCHANGED +! RUN: %f18 -fsyntax-only -fdebug-module-writer %p/Inputs/mod-file-changed.f90 2>&1 | FileCheck %s --check-prefix CHECK_CHANGED module m real :: x(10) diff --git a/flang/test/Semantics/modifiable01.f90 b/flang/test/Semantics/modifiable01.f90 index 391a643e3368..dfa9396565e0 100644 --- a/flang/test/Semantics/modifiable01.f90 +++ b/flang/test/Semantics/modifiable01.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! Test WhyNotModifiable() explanations module prot diff --git a/flang/test/Semantics/offsets01.f90 b/flang/test/Semantics/offsets01.f90 index f5491f7b9438..78183d5cc0fc 100644 --- a/flang/test/Semantics/offsets01.f90 +++ b/flang/test/Semantics/offsets01.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment of intrinsic types subroutine s1 diff --git a/flang/test/Semantics/offsets02.f90 b/flang/test/Semantics/offsets02.f90 index f2ed1d1dbe71..b76572e1761d 100644 --- a/flang/test/Semantics/offsets02.f90 +++ b/flang/test/Semantics/offsets02.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment of derived types diff --git a/flang/test/Semantics/offsets03.f90 b/flang/test/Semantics/offsets03.f90 index d28b3694bddd..c1c2de464a01 100644 --- a/flang/test/Semantics/offsets03.f90 +++ b/flang/test/Semantics/offsets03.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Size and alignment with EQUIVALENCE and COMMON diff --git a/flang/test/Semantics/oldparam01.f90 b/flang/test/Semantics/oldparam01.f90 index 43f33a52f364..b78869ff4d90 100644 --- a/flang/test/Semantics/oldparam01.f90 +++ b/flang/test/Semantics/oldparam01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -falternative-parameter-statement -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: %f18 -falternative-parameter-statement -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! Non-error tests for "old style" PARAMETER statements diff --git a/flang/test/Semantics/oldparam02.f90 b/flang/test/Semantics/oldparam02.f90 index 72ea5c410c7a..fd58988ba0b0 100644 --- a/flang/test/Semantics/oldparam02.f90 +++ b/flang/test/Semantics/oldparam02.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -falternative-parameter-statement -fdebug-dump-symbols -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -falternative-parameter-statement -fdebug-dump-symbols -fsyntax-only %s 2>&1 | FileCheck %s ! Error tests for "old style" PARAMETER statements subroutine subr(x1,x2,x3,x4,x5) diff --git a/flang/test/Semantics/oldparam03.f90 b/flang/test/Semantics/oldparam03.f90 index cbdb07057226..bc80f00a1966 100644 --- a/flang/test/Semantics/oldparam03.f90 +++ b/flang/test/Semantics/oldparam03.f90 @@ -1,4 +1,4 @@ -! RUN: not %f18 -fparse-only %s 2>&1 | FileCheck %s +! RUN: not %f18 -fsyntax-only %s 2>&1 | FileCheck %s ! Ensure that old-style PARAMETER statements are disabled by default. diff --git a/flang/test/Semantics/resolve100.f90 b/flang/test/Semantics/resolve100.f90 index 1e84be24c5f2..52fca54b94f7 100644 --- a/flang/test/Semantics/resolve100.f90 +++ b/flang/test/Semantics/resolve100.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s program p ! CHECK: a size=4 offset=0: ObjectEntity type: LOGICAL(4) diff --git a/flang/test/Semantics/rewrite01.f90 b/flang/test/Semantics/rewrite01.f90 index 221994593422..cd5453eee6be 100644 --- a/flang/test/Semantics/rewrite01.f90 +++ b/flang/test/Semantics/rewrite01.f90 @@ -1,4 +1,4 @@ -! RUN: %f18 -fparse-only -fdebug-dump-parse-tree %s 2>&1 | FileCheck %s +! RUN: %f18 -fsyntax-only -fdebug-dump-parse-tree %s 2>&1 | FileCheck %s ! Ensure that READ(CVAR) [, item-list] is corrected when CVAR is a ! character variable so as to be a formatted read from the default ! unit, not an unformatted read from an internal unit (which is not diff --git a/flang/test/Semantics/test_errors.sh b/flang/test/Semantics/test_errors.sh index 5411482e4d3b..10feccb2f9f1 100755 --- a/flang/test/Semantics/test_errors.sh +++ b/flang/test/Semantics/test_errors.sh @@ -2,7 +2,7 @@ # Compile a source file and check errors against those listed in the file. # Change the compiler by setting the F18 environment variable. -F18_OPTIONS="-fparse-only" +F18_OPTIONS="-fsyntax-only" srcdir=$(dirname $0) source $srcdir/common.sh [[ ! -f $src ]] && die "File not found: $src" diff --git a/flang/test/Semantics/test_modfile.sh b/flang/test/Semantics/test_modfile.sh index 9205451c176d..a2aef65a101b 100755 --- a/flang/test/Semantics/test_modfile.sh +++ b/flang/test/Semantics/test_modfile.sh @@ -2,7 +2,7 @@ # Compile a source file and compare generated .mod files against expected. set -e -F18_OPTIONS="-fdebug-resolve-names -fparse-only" +F18_OPTIONS="-fdebug-resolve-names -fsyntax-only" srcdir=$(dirname $0) source $srcdir/common.sh diff --git a/flang/test/Semantics/typeinfo01.f90 b/flang/test/Semantics/typeinfo01.f90 index 834120ccb430..2274896fcd68 100644 --- a/flang/test/Semantics/typeinfo01.f90 +++ b/flang/test/Semantics/typeinfo01.f90 @@ -1,4 +1,4 @@ -!RUN: %f18 -fdebug-dump-symbols -fparse-only %s | FileCheck %s +!RUN: %f18 -fdebug-dump-symbols -fsyntax-only %s | FileCheck %s ! Tests for derived type runtime descriptions module m01 diff --git a/flang/tools/f18-parse-demo/f18-parse-demo.cpp b/flang/tools/f18-parse-demo/f18-parse-demo.cpp index 4ccc65e0631d..2033ef6c3bc2 100644 --- a/flang/tools/f18-parse-demo/f18-parse-demo.cpp +++ b/flang/tools/f18-parse-demo/f18-parse-demo.cpp @@ -87,7 +87,7 @@ struct DriverOptions { bool warnOnNonstandardUsage{false}; // -Mstandard bool warningsAreErrors{false}; // -Werror Fortran::parser::Encoding encoding{Fortran::parser::Encoding::LATIN_1}; - bool parseOnly{false}; + bool syntaxOnly{false}; bool dumpProvenance{false}; bool dumpCookedChars{false}; bool dumpUnparse{false}; @@ -217,7 +217,7 @@ std::string CompileFortran( Fortran::common::LanguageFeature::BackslashEscapes)); return {}; } - if (driver.parseOnly) { + if (driver.syntaxOnly) { return {}; } @@ -369,8 +369,8 @@ int main(int argc, char *const argv[]) { driver.dumpUnparse = true; } else if (arg == "-ftime-parse") { driver.timeParse = true; - } else if (arg == "-fparse-only") { - driver.parseOnly = true; + } else if (arg == "-fparse-only" || arg == "-fsyntax-only") { + driver.syntaxOnly = true; } else if (arg == "-c") { driver.compileOnly = true; } else if (arg == "-o") { @@ -405,7 +405,7 @@ int main(int argc, char *const argv[]) { << " -ed enable fixed form D lines\n" << " -E prescan & preprocess only\n" << " -ftime-parse measure parsing time\n" - << " -fparse-only parse only, no output except messages\n" + << " -fsyntax-only parse only, no output except messages\n" << " -funparse parse & reformat only, no code " "generation\n" << " -fdump-provenance dump the provenance table (no code)\n" diff --git a/flang/tools/f18/CMakeLists.txt b/flang/tools/f18/CMakeLists.txt index 2e5350aecdc6..41237fe06003 100644 --- a/flang/tools/f18/CMakeLists.txt +++ b/flang/tools/f18/CMakeLists.txt @@ -46,7 +46,7 @@ foreach(filename ${MODULES}) set(depends ${include}/__fortran_builtins.mod) endif() add_custom_command(OUTPUT ${include}/${filename}.mod - COMMAND f18 -fparse-only -I${include} + COMMAND f18 -fsyntax-only -I${include} ${FLANG_SOURCE_DIR}/module/${filename}.f90 WORKING_DIRECTORY ${include} DEPENDS f18 ${FLANG_SOURCE_DIR}/module/${filename}.f90 ${depends} diff --git a/flang/tools/f18/f18.cpp b/flang/tools/f18/f18.cpp index fecd37d49936..4546353fe010 100644 --- a/flang/tools/f18/f18.cpp +++ b/flang/tools/f18/f18.cpp @@ -92,7 +92,7 @@ struct DriverOptions { bool warningsAreErrors{false}; // -Werror bool byteswapio{false}; // -byteswapio Fortran::parser::Encoding encoding{Fortran::parser::Encoding::UTF_8}; - bool parseOnly{false}; + bool syntaxOnly{false}; bool dumpProvenance{false}; bool dumpCookedChars{false}; bool dumpUnparse{false}; @@ -327,7 +327,7 @@ std::string CompileFortran(std::string path, Fortran::parser::Options options, exitStatus = EXIT_FAILURE; } } - if (driver.parseOnly) { + if (driver.syntaxOnly) { return {}; } @@ -544,8 +544,8 @@ int main(int argc, char *const argv[]) { driver.dumpUnparseWithSymbols = true; } else if (arg == "-funparse-typed-exprs-to-f18-fc") { driver.unparseTypedExprsToF18_FC = true; - } else if (arg == "-fparse-only") { - driver.parseOnly = true; + } else if (arg == "-fparse-only" || arg == "-fsyntax-only") { + driver.syntaxOnly = true; } else if (arg == "-c") { driver.compileOnly = true; } else if (arg == "-o") { @@ -649,7 +649,7 @@ int main(int argc, char *const argv[]) { << " -module dir module output directory (default .)\n" << " -flatin interpret source as Latin-1 (ISO 8859-1) " "rather than UTF-8\n" - << " -fparse-only parse only, no output except messages\n" + << " -fsyntax-only parsing and semantics only, no output except messages\n" << " -funparse parse & reformat only, no code " "generation\n" << " -funparse-with-symbols parse, resolve symbols, and unparse\n" </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-arm-bootstrap - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-arm-bootstrap. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-arm-bootstrap Culprit: <cut> commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b Author: Thomas Schwinge <thomas(a)codesourcery.com> Date: Fri Aug 13 17:53:12 2021 +0200 Add more self-tests for 'hash_map' with Value type with non-trivial constructor/destructor ... to document the current behavior. gcc/ * hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): Extend. (test_map_of_type_with_ctor_and_dtor_expand): Add function. (hash_map_tests_c_tests): Call it. </cut> Results regressed to (for first_bad == e4f16e9f357a38ec702fb69a0ffab9d292a6af9b) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:02:51 cc1: internal compiler error: in fail, at selftest.c:47 # 00:02:51 cc1plus: internal compiler error: in fail, at selftest.c:47 # 00:02:51 make[3]: *** [s-selftest-c] Error 1 # 00:02:51 make[3]: *** [s-selftest-c++] Error 1 # 00:02:51 make[2]: *** [all-stage1-gcc] Error 2 # 00:02:51 make[1]: *** [stage1-bubble] Error 2 # 00:02:51 make: *** [all] Error 2 from (for last_good == 602fca427df6c5f7452677cfcdd16a5b9a3ca86a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/ Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-e4f16e9f357a38ec702fb69a0ffab9d292a6af9b cd investigate-gcc-e4f16e9f357a38ec702fb69a0ffab9d292a6af9b git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach e4f16e9f357a38ec702fb69a0ffab9d292a6af9b ../artifacts/test.sh # Reproduce last_good build git checkout --detach 602fca427df6c5f7452677cfcdd16a5b9a3ca86a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-arm-bootstrap/1/… Full commit (up to 1000 lines): <cut> commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b Author: Thomas Schwinge <thomas(a)codesourcery.com> Date: Fri Aug 13 17:53:12 2021 +0200 Add more self-tests for 'hash_map' with Value type with non-trivial constructor/destructor ... to document the current behavior. gcc/ * hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): Extend. (test_map_of_type_with_ctor_and_dtor_expand): Add function. (hash_map_tests_c_tests): Call it. --- gcc/hash-map-tests.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) diff --git a/gcc/hash-map-tests.c b/gcc/hash-map-tests.c index 5b6b192cd28..257f2be0c26 100644 --- a/gcc/hash-map-tests.c +++ b/gcc/hash-map-tests.c @@ -278,6 +278,156 @@ test_map_of_type_with_ctor_and_dtor () ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor); } + + + /* Verify basic construction and destruction of Value objects. */ + { + /* Configure, arbitrary. */ + const size_t N_init = 0; + const int N_elem = 28; + + void *a[N_elem]; + for (size_t i = 0; i < N_elem; ++i) + a[i] = &a[i]; + + val_t::ndefault = 0; + val_t::ncopy = 0; + val_t::nassign = 0; + val_t::ndtor = 0; + Map m (N_init); + ASSERT_EQ (val_t::ndefault + + val_t::ncopy + + val_t::nassign + + val_t::ndtor, 0); + + for (int i = 0; i < N_elem; ++i) + { + m.get_or_insert (a[i]); + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, 0); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, i); + + m.remove (a[i]); + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, 0); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, 1 + i); + } + } +} + +/* Verify aspects of 'hash_table::expand'. */ + +static void +test_map_of_type_with_ctor_and_dtor_expand (bool remove_some_inline) +{ + /* Configure, so that hash table expansion triggers a few times. */ + const size_t N_init = 0; + const int N_elem = 70; + size_t expand_c_expected = 4; + size_t expand_c = 0; + + void *a[N_elem]; + for (size_t i = 0; i < N_elem; ++i) + a[i] = &a[i]; + + typedef hash_map <void *, val_t> Map; + + /* Note that we are starting with a fresh 'Map'. Even if an existing one has + been cleared out completely, there remain 'deleted' elements, and these + would disturb the following logic, where we don't have access to the + actual 'm_n_deleted' value. */ + size_t m_n_deleted = 0; + + val_t::ndefault = 0; + val_t::ncopy = 0; + val_t::nassign = 0; + val_t::ndtor = 0; + Map m (N_init); + + /* In the following, in particular related to 'expand', we're adapting from + the internal logic of 'hash_table', glossing over "some details" not + relevant for this testing here. */ + + /* Per 'hash_table::hash_table'. */ + size_t m_size; + { + unsigned int size_prime_index_ = hash_table_higher_prime_index (N_init); + m_size = prime_tab[size_prime_index_].prime; + } + + int n_expand_moved = 0; + + for (int i = 0; i < N_elem; ++i) + { + size_t elts = m.elements (); + + /* Per 'hash_table::find_slot_with_hash'. */ + size_t m_n_elements = elts + m_n_deleted; + bool expand = m_size * 3 <= m_n_elements * 4; + + m.get_or_insert (a[i]); + if (expand) + { + ++expand_c; + + /* Per 'hash_table::expand'. */ + { + unsigned int nindex = hash_table_higher_prime_index (elts * 2); + m_size = prime_tab[nindex].prime; + } + m_n_deleted = 0; + + /* All non-deleted elements have been moved. */ + n_expand_moved += i; + if (remove_some_inline) + n_expand_moved -= (i + 2) / 3; + } + + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, n_expand_moved); + ASSERT_EQ (val_t::nassign, 0); + if (remove_some_inline) + ASSERT_EQ (val_t::ndtor, (i + 2) / 3); + else + ASSERT_EQ (val_t::ndtor, 0); + + /* Remove some inline. This never triggers an 'expand' here, but via + 'm_n_deleted' does influence any following one. */ + if (remove_some_inline + && !(i % 3)) + { + m.remove (a[i]); + /* Per 'hash_table::remove_elt_with_hash'. */ + m_n_deleted++; + + ASSERT_EQ (val_t::ndefault, 1 + i); + ASSERT_EQ (val_t::ncopy, n_expand_moved); + ASSERT_EQ (val_t::nassign, 0); + ASSERT_EQ (val_t::ndtor, 1 + (i + 2) / 3); + } + } + ASSERT_EQ (expand_c, expand_c_expected); + + int ndefault = val_t::ndefault; + int ncopy = val_t::ncopy; + int nassign = val_t::nassign; + int ndtor = val_t::ndtor; + + for (int i = 0; i < N_elem; ++i) + { + if (remove_some_inline + && !(i % 3)) + continue; + + m.remove (a[i]); + ++ndtor; + ASSERT_EQ (val_t::ndefault, ndefault); + ASSERT_EQ (val_t::ncopy, ncopy); + ASSERT_EQ (val_t::nassign, nassign); + ASSERT_EQ (val_t::ndtor, ndtor); + } } /* Test calling empty on a hash_map that has a key type with non-zero @@ -309,6 +459,8 @@ hash_map_tests_c_tests () test_map_of_strings_to_int (); test_map_of_int_to_strings (); test_map_of_type_with_ctor_and_dtor (); + test_map_of_type_with_ctor_and_dtor_expand (false); + test_map_of_type_with_ctor_and_dtor_expand (true); test_nonzero_empty_key (); } </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3_LTO - Build # 6 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTO Culprit: <cut> commit 310b35304cdf5a230c042904655583c5532d3e91 Author: Rong Xu <xur(a)google.com> Date: Tue Feb 16 10:53:38 2021 -0800 [SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455 </cut> Results regressed to (for first_bad == 310b35304cdf5a230c042904655583c5532d3e91) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-310b35304cdf5a230c042904655583c5532d3e91/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 106 from (for last_good == cddc53ef088b68586094c9841a76b41bee3994a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-cddc53ef088b68586094c9841a76b41bee3994a4/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3917 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3_LTO/3930 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91 cd investigate-llvm-310b35304cdf5a230c042904655583c5532d3e91 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 310b35304cdf5a230c042904655583c5532d3e91 ../artifacts/test.sh # Reproduce last_good build git checkout --detach cddc53ef088b68586094c9841a76b41bee3994a4 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit 310b35304cdf5a230c042904655583c5532d3e91 Author: Rong Xu <xur(a)google.com> Date: Tue Feb 16 10:53:38 2021 -0800 [SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455 --- .../llvm/ProfileData/SampleProfileLoaderBaseImpl.h | 862 +++++++++++++++++ .../llvm/ProfileData/SampleProfileLoaderBaseUtil.h | 97 ++ llvm/lib/ProfileData/CMakeLists.txt | 1 + .../ProfileData/SampleProfileLoaderBaseUtil.cpp | 192 ++++ llvm/lib/Transforms/IPO/SampleProfile.cpp | 1001 +------------------- 5 files changed, 1161 insertions(+), 992 deletions(-) diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h new file mode 100644 index 000000000000..f02bacb6edc3 --- /dev/null +++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h @@ -0,0 +1,862 @@ +////===- SampleProfileLoadBaseImpl.h - Profile loader base impl --*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +/// \file +/// This file provides the interface for the sampled PGO profile loader base +/// implementation. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H +#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H + +#include "llvm/ADT/ArrayRef.h" +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/DenseSet.h" +#include "llvm/ADT/SmallPtrSet.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/Analysis/LoopInfo.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/PostDominators.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/IR/BasicBlock.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/DebugInfoMetadata.h" +#include "llvm/IR/DebugLoc.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/Function.h" +#include "llvm/IR/Instruction.h" +#include "llvm/IR/Instructions.h" +#include "llvm/IR/Module.h" +#include "llvm/ProfileData/SampleProf.h" +#include "llvm/ProfileData/SampleProfReader.h" +#include "llvm/ProfileData/SampleProfileLoaderBaseUtil.h" +#include "llvm/Support/CommandLine.h" +#include "llvm/Support/GenericDomTree.h" +#include "llvm/Support/raw_ostream.h" + +namespace llvm { +using namespace llvm; +using namespace sampleprof; +using ProfileCount = Function::ProfileCount; +namespace sampleprofutil { +bool callsiteIsHot(const SampleCoverageTracker *CT, + const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI, + bool ProfAccForSymsInList); +} // namespace sampleprofutil +using namespace sampleprofutil; + +#define DEBUG_TYPE "sample-profile-impl" + +using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>; +using EquivalenceClassMap = DenseMap<const BasicBlock *, const BasicBlock *>; +using Edge = std::pair<const BasicBlock *, const BasicBlock *>; +using EdgeWeightMap = DenseMap<Edge, uint64_t>; +using BlockEdgeMap = + DenseMap<const BasicBlock *, SmallVector<const BasicBlock *, 8>>; + +extern cl::opt<unsigned> SampleProfileMaxPropagateIterations; +extern cl::opt<unsigned> SampleProfileRecordCoverage; +extern cl::opt<unsigned> SampleProfileSampleCoverage; +extern cl::opt<bool> NoWarnSampleUnused; + +class SampleProfileLoaderBaseImpl { +public: + SampleProfileLoaderBaseImpl(std::string Name) : Filename(Name) {} + void dump() { Reader->dump(); } + +protected: + friend class SampleCoverageTracker; + + unsigned getFunctionLoc(Function &F); + virtual ErrorOr<uint64_t> getInstWeight(const Instruction &Inst); + ErrorOr<uint64_t> getInstWeightImpl(const Instruction &Inst); + ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB); + mutable DenseMap<const DILocation *, const FunctionSamples *> + DILocation2SampleMap; + virtual const FunctionSamples * + findFunctionSamples(const Instruction &I) const; + void printEdgeWeight(raw_ostream &OS, Edge E); + void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const; + void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB); + bool computeBlockWeights(Function &F); + void findEquivalenceClasses(Function &F); + template <bool IsPostDom> + void findEquivalencesFor(BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants, + DominatorTreeBase<BasicBlock, IsPostDom> *DomTree); + + void propagateWeights(Function &F); + uint64_t visitEdge(Edge E, unsigned *NumUnknownEdges, Edge *UnknownEdge); + void buildEdges(Function &F); + bool propagateThroughEdges(Function &F, bool UpdateBlockCount); + void clearFunctionData(); + void computeDominanceAndLoopInfo(Function &F); + bool + computeAndPropagateWeights(Function &F, + const DenseSet<GlobalValue::GUID> &InlinedGUIDs); + void emitCoverageRemarks(Function &F); + + /// Map basic blocks to their computed weights. + /// + /// The weight of a basic block is defined to be the maximum + /// of all the instruction weights in that block. + BlockWeightMap BlockWeights; + + /// Map edges to their computed weights. + /// + /// Edge weights are computed by propagating basic block weights in + /// SampleProfile::propagateWeights. + EdgeWeightMap EdgeWeights; + + /// Set of visited blocks during propagation. + SmallPtrSet<const BasicBlock *, 32> VisitedBlocks; + + /// Set of visited edges during propagation. + SmallSet<Edge, 32> VisitedEdges; + + /// Equivalence classes for block weights. + /// + /// Two blocks BB1 and BB2 are in the same equivalence class if they + /// dominate and post-dominate each other, and they are in the same loop + /// nest. When this happens, the two blocks are guaranteed to execute + /// the same number of times. + EquivalenceClassMap EquivalenceClass; + + /// Dominance, post-dominance and loop information. + std::unique_ptr<DominatorTree> DT; + std::unique_ptr<PostDominatorTree> PDT; + std::unique_ptr<LoopInfo> LI; + + /// Predecessors for each basic block in the CFG. + BlockEdgeMap Predecessors; + + /// Successors for each basic block in the CFG. + BlockEdgeMap Successors; + + /// Profile coverage tracker. + SampleCoverageTracker CoverageTracker; + + /// Profile reader object. + std::unique_ptr<SampleProfileReader> Reader; + + /// Samples collected for the body of this function. + FunctionSamples *Samples = nullptr; + + /// Name of the profile file to load. + std::string Filename; + + /// Profile Summary Info computed from sample profile. + ProfileSummaryInfo *PSI = nullptr; + + /// Optimization Remark Emitter used to emit diagnostic remarks. + OptimizationRemarkEmitter *ORE = nullptr; +}; + +/// Clear all the per-function data used to load samples and propagate weights. +void SampleProfileLoaderBaseImpl::clearFunctionData() { + BlockWeights.clear(); + EdgeWeights.clear(); + VisitedBlocks.clear(); + VisitedEdges.clear(); + EquivalenceClass.clear(); + DT = nullptr; + PDT = nullptr; + LI = nullptr; + Predecessors.clear(); + Successors.clear(); + CoverageTracker.clear(); +} + +#ifndef NDEBUG +/// Print the weight of edge \p E on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param E Edge to print. +void SampleProfileLoaderBaseImpl::printEdgeWeight(raw_ostream &OS, Edge E) { + OS << "weight[" << E.first->getName() << "->" << E.second->getName() + << "]: " << EdgeWeights[E] << "\n"; +} + +/// Print the equivalence class of block \p BB on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param BB Block to print. +void SampleProfileLoaderBaseImpl::printBlockEquivalence(raw_ostream &OS, + const BasicBlock *BB) { + const BasicBlock *Equiv = EquivalenceClass[BB]; + OS << "equivalence[" << BB->getName() + << "]: " << ((Equiv) ? EquivalenceClass[BB]->getName() : "NONE") << "\n"; +} + +/// Print the weight of block \p BB on stream \p OS. +/// +/// \param OS Stream to emit the output to. +/// \param BB Block to print. +void SampleProfileLoaderBaseImpl::printBlockWeight(raw_ostream &OS, + const BasicBlock *BB) const { + const auto &I = BlockWeights.find(BB); + uint64_t W = (I == BlockWeights.end() ? 0 : I->second); + OS << "weight[" << BB->getName() << "]: " << W << "\n"; +} +#endif + +/// Get the weight for an instruction. +/// +/// The "weight" of an instruction \p Inst is the number of samples +/// collected on that instruction at runtime. To retrieve it, we +/// need to compute the line number of \p Inst relative to the start of its +/// function. We use HeaderLineno to compute the offset. We then +/// look up the samples collected for \p Inst using BodySamples. +/// +/// \param Inst Instruction to query. +/// +/// \returns the weight of \p Inst. +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getInstWeight(const Instruction &Inst) { + return getInstWeightImpl(Inst); +} + +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getInstWeightImpl(const Instruction &Inst) { + const FunctionSamples *FS = findFunctionSamples(Inst); + if (!FS) + return std::error_code(); + + const DebugLoc &DLoc = Inst.getDebugLoc(); + if (!DLoc) + return std::error_code(); + + const DILocation *DIL = DLoc; + uint32_t LineOffset = FunctionSamples::getOffset(DIL); + uint32_t Discriminator = DIL->getBaseDiscriminator(); + ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator); + if (R) { + bool FirstMark = + CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get()); + if (FirstMark) { + ORE->emit([&]() { + OptimizationRemarkAnalysis Remark(DEBUG_TYPE, "AppliedSamples", &Inst); + Remark << "Applied " << ore::NV("NumSamples", *R); + Remark << " samples from profile (offset: "; + Remark << ore::NV("LineOffset", LineOffset); + if (Discriminator) { + Remark << "."; + Remark << ore::NV("Discriminator", Discriminator); + } + Remark << ")"; + return Remark; + }); + } + LLVM_DEBUG(dbgs() << " " << DLoc.getLine() << "." + << DIL->getBaseDiscriminator() << ":" << Inst + << " (line offset: " << LineOffset << "." + << DIL->getBaseDiscriminator() << " - weight: " << R.get() + << ")\n"); + } + return R; +} + +/// Compute the weight of a basic block. +/// +/// The weight of basic block \p BB is the maximum weight of all the +/// instructions in BB. +/// +/// \param BB The basic block to query. +/// +/// \returns the weight for \p BB. +ErrorOr<uint64_t> +SampleProfileLoaderBaseImpl::getBlockWeight(const BasicBlock *BB) { + uint64_t Max = 0; + bool HasWeight = false; + for (auto &I : BB->getInstList()) { + const ErrorOr<uint64_t> &R = getInstWeight(I); + if (R) { + Max = std::max(Max, R.get()); + HasWeight = true; + } + } + return HasWeight ? ErrorOr<uint64_t>(Max) : std::error_code(); +} + +/// Compute and store the weights of every basic block. +/// +/// This populates the BlockWeights map by computing +/// the weights of every basic block in the CFG. +/// +/// \param F The function to query. +bool SampleProfileLoaderBaseImpl::computeBlockWeights(Function &F) { + bool Changed = false; + LLVM_DEBUG(dbgs() << "Block weights\n"); + for (const auto &BB : F) { + ErrorOr<uint64_t> Weight = getBlockWeight(&BB); + if (Weight) { + BlockWeights[&BB] = Weight.get(); + VisitedBlocks.insert(&BB); + Changed = true; + } + LLVM_DEBUG(printBlockWeight(dbgs(), &BB)); + } + + return Changed; +} + +/// Get the FunctionSamples for an instruction. +/// +/// The FunctionSamples of an instruction \p Inst is the inlined instance +/// in which that instruction is coming from. We traverse the inline stack +/// of that instruction, and match it with the tree nodes in the profile. +/// +/// \param Inst Instruction to query. +/// +/// \returns the FunctionSamples pointer to the inlined instance. +const FunctionSamples *SampleProfileLoaderBaseImpl::findFunctionSamples( + const Instruction &Inst) const { + const DILocation *DIL = Inst.getDebugLoc(); + if (!DIL) + return Samples; + + auto it = DILocation2SampleMap.try_emplace(DIL, nullptr); + if (it.second) { + it.first->second = Samples->findFunctionSamples(DIL, Reader->getRemapper()); + } + return it.first->second; +} + +/// Find equivalence classes for the given block. +/// +/// This finds all the blocks that are guaranteed to execute the same +/// number of times as \p BB1. To do this, it traverses all the +/// descendants of \p BB1 in the dominator or post-dominator tree. +/// +/// A block BB2 will be in the same equivalence class as \p BB1 if +/// the following holds: +/// +/// 1- \p BB1 is a descendant of BB2 in the opposite tree. So, if BB2 +/// is a descendant of \p BB1 in the dominator tree, then BB2 should +/// dominate BB1 in the post-dominator tree. +/// +/// 2- Both BB2 and \p BB1 must be in the same loop. +/// +/// For every block BB2 that meets those two requirements, we set BB2's +/// equivalence class to \p BB1. +/// +/// \param BB1 Block to check. +/// \param Descendants Descendants of \p BB1 in either the dom or pdom tree. +/// \param DomTree Opposite dominator tree. If \p Descendants is filled +/// with blocks from \p BB1's dominator tree, then +/// this is the post-dominator tree, and vice versa. +template <bool IsPostDom> +void SampleProfileLoaderBaseImpl::findEquivalencesFor( + BasicBlock *BB1, ArrayRef<BasicBlock *> Descendants, + DominatorTreeBase<BasicBlock, IsPostDom> *DomTree) { + const BasicBlock *EC = EquivalenceClass[BB1]; + uint64_t Weight = BlockWeights[EC]; + for (const auto *BB2 : Descendants) { + bool IsDomParent = DomTree->dominates(BB2, BB1); + bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2); + if (BB1 != BB2 && IsDomParent && IsInSameLoop) { + EquivalenceClass[BB2] = EC; + // If BB2 is visited, then the entire EC should be marked as visited. + if (VisitedBlocks.count(BB2)) { + VisitedBlocks.insert(EC); + } + + // If BB2 is heavier than BB1, make BB2 have the same weight + // as BB1. + // + // Note that we don't worry about the opposite situation here + // (when BB2 is lighter than BB1). We will deal with this + // during the propagation phase. Right now, we just want to + // make sure that BB1 has the largest weight of all the + // members of its equivalence set. + Weight = std::max(Weight, BlockWeights[BB2]); + } + } + if (EC == &EC->getParent()->getEntryBlock()) { + BlockWeights[EC] = Samples->getHeadSamples() + 1; + } else { + BlockWeights[EC] = Weight; + } +} + +/// Find equivalence classes. +/// +/// Since samples may be missing from blocks, we can fill in the gaps by setting +/// the weights of all the blocks in the same equivalence class to the same +/// weight. To compute the concept of equivalence, we use dominance and loop +/// information. Two blocks B1 and B2 are in the same equivalence class if B1 +/// dominates B2, B2 post-dominates B1 and both are in the same loop. +/// +/// \param F The function to query. +void SampleProfileLoaderBaseImpl::findEquivalenceClasses(Function &F) { + SmallVector<BasicBlock *, 8> DominatedBBs; + LLVM_DEBUG(dbgs() << "\nBlock equivalence classes\n"); + // Find equivalence sets based on dominance and post-dominance information. + for (auto &BB : F) { + BasicBlock *BB1 = &BB; + + // Compute BB1's equivalence class once. + if (EquivalenceClass.count(BB1)) { + LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1)); + continue; + } + + // By default, blocks are in their own equivalence class. + EquivalenceClass[BB1] = BB1; + + // Traverse all the blocks dominated by BB1. We are looking for + // every basic block BB2 such that: + // + // 1- BB1 dominates BB2. + // 2- BB2 post-dominates BB1. + // 3- BB1 and BB2 are in the same loop nest. + // + // If all those conditions hold, it means that BB2 is executed + // as many times as BB1, so they are placed in the same equivalence + // class by making BB2's equivalence class be BB1. + DominatedBBs.clear(); + DT->getDescendants(BB1, DominatedBBs); + findEquivalencesFor(BB1, DominatedBBs, PDT.get()); + + LLVM_DEBUG(printBlockEquivalence(dbgs(), BB1)); + } + + // Assign weights to equivalence classes. + // + // All the basic blocks in the same equivalence class will execute + // the same number of times. Since we know that the head block in + // each equivalence class has the largest weight, assign that weight + // to all the blocks in that equivalence class. + LLVM_DEBUG( + dbgs() << "\nAssign the same weight to all blocks in the same class\n"); + for (auto &BI : F) { + const BasicBlock *BB = &BI; + const BasicBlock *EquivBB = EquivalenceClass[BB]; + if (BB != EquivBB) + BlockWeights[BB] = BlockWeights[EquivBB]; + LLVM_DEBUG(printBlockWeight(dbgs(), BB)); + } +} + +/// Visit the given edge to decide if it has a valid weight. +/// +/// If \p E has not been visited before, we copy to \p UnknownEdge +/// and increment the count of unknown edges. +/// +/// \param E Edge to visit. +/// \param NumUnknownEdges Current number of unknown edges. +/// \param UnknownEdge Set if E has not been visited before. +/// +/// \returns E's weight, if known. Otherwise, return 0. +uint64_t SampleProfileLoaderBaseImpl::visitEdge(Edge E, + unsigned *NumUnknownEdges, + Edge *UnknownEdge) { + if (!VisitedEdges.count(E)) { + (*NumUnknownEdges)++; + *UnknownEdge = E; + return 0; + } + + return EdgeWeights[E]; +} + +/// Propagate weights through incoming/outgoing edges. +/// +/// If the weight of a basic block is known, and there is only one edge +/// with an unknown weight, we can calculate the weight of that edge. +/// +/// Similarly, if all the edges have a known count, we can calculate the +/// count of the basic block, if needed. +/// +/// \param F Function to process. +/// \param UpdateBlockCount Whether we should update basic block counts that +/// has already been annotated. +/// +/// \returns True if new weights were assigned to edges or blocks. +bool SampleProfileLoaderBaseImpl::propagateThroughEdges(Function &F, + bool UpdateBlockCount) { + bool Changed = false; + LLVM_DEBUG(dbgs() << "\nPropagation through edges\n"); + for (const auto &BI : F) { + const BasicBlock *BB = &BI; + const BasicBlock *EC = EquivalenceClass[BB]; + + // Visit all the predecessor and successor edges to determine + // which ones have a weight assigned already. Note that it doesn't + // matter that we only keep track of a single unknown edge. The + // only case we are interested in handling is when only a single + // edge is unknown (see setEdgeOrBlockWeight). + for (unsigned i = 0; i < 2; i++) { + uint64_t TotalWeight = 0; + unsigned NumUnknownEdges = 0, NumTotalEdges = 0; + Edge UnknownEdge, SelfReferentialEdge, SingleEdge; + + if (i == 0) { + // First, visit all predecessor edges. + NumTotalEdges = Predecessors[BB].size(); + for (auto *Pred : Predecessors[BB]) { + Edge E = std::make_pair(Pred, BB); + TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge); + if (E.first == E.second) + SelfReferentialEdge = E; + } + if (NumTotalEdges == 1) { + SingleEdge = std::make_pair(Predecessors[BB][0], BB); + } + } else { + // On the second round, visit all successor edges. + NumTotalEdges = Successors[BB].size(); + for (auto *Succ : Successors[BB]) { + Edge E = std::make_pair(BB, Succ); + TotalWeight += visitEdge(E, &NumUnknownEdges, &UnknownEdge); + } + if (NumTotalEdges == 1) { + SingleEdge = std::make_pair(BB, Successors[BB][0]); + } + } + + // After visiting all the edges, there are three cases that we + // can handle immediately: + // + // - All the edge weights are known (i.e., NumUnknownEdges == 0). + // In this case, we simply check that the sum of all the edges + // is the same as BB's weight. If not, we change BB's weight + // to match. Additionally, if BB had not been visited before, + // we mark it visited. + // + // - Only one edge is unknown and BB has already been visited. + // In this case, we can compute the weight of the edge by + // subtracting the total block weight from all the known + // edge weights. If the edges weight more than BB, then the + // edge of the last remaining edge is set to zero. + // + // - There exists a self-referential edge and the weight of BB is + // known. In this case, this edge can be based on BB's weight. + // We add up all the other known edges and set the weight on + // the self-referential edge as we did in the previous case. + // + // In any other case, we must continue iterating. Eventually, + // all edges will get a weight, or iteration will stop when + // it reaches SampleProfileMaxPropagateIterations. + if (NumUnknownEdges <= 1) { + uint64_t &BBWeight = BlockWeights[EC]; + if (NumUnknownEdges == 0) { + if (!VisitedBlocks.count(EC)) { + // If we already know the weight of all edges, the weight of the + // basic block can be computed. It should be no larger than the sum + // of all edge weights. + if (TotalWeight > BBWeight) { + BBWeight = TotalWeight; + Changed = true; + LLVM_DEBUG(dbgs() << "All edge weights for " << BB->getName() + << " known. Set weight for block: "; + printBlockWeight(dbgs(), BB);); + } + } else if (NumTotalEdges == 1 && + EdgeWeights[SingleEdge] < BlockWeights[EC]) { + // If there is only one edge for the visited basic block, use the + // block weight to adjust edge weight if edge weight is smaller. + EdgeWeights[SingleEdge] = BlockWeights[EC]; + Changed = true; + } + } else if (NumUnknownEdges == 1 && VisitedBlocks.count(EC)) { + // If there is a single unknown edge and the block has been + // visited, then we can compute E's weight. + if (BBWeight >= TotalWeight) + EdgeWeights[UnknownEdge] = BBWeight - TotalWeight; + else + EdgeWeights[UnknownEdge] = 0; + const BasicBlock *OtherEC; + if (i == 0) + OtherEC = EquivalenceClass[UnknownEdge.first]; + else + OtherEC = EquivalenceClass[UnknownEdge.second]; + // Edge weights should never exceed the BB weights it connects. + if (VisitedBlocks.count(OtherEC) && + EdgeWeights[UnknownEdge] > BlockWeights[OtherEC]) + EdgeWeights[UnknownEdge] = BlockWeights[OtherEC]; + VisitedEdges.insert(UnknownEdge); + Changed = true; + LLVM_DEBUG(dbgs() << "Set weight for edge: "; + printEdgeWeight(dbgs(), UnknownEdge)); + } + } else if (VisitedBlocks.count(EC) && BlockWeights[EC] == 0) { + // If a block Weights 0, all its in/out edges should weight 0. + if (i == 0) { + for (auto *Pred : Predecessors[BB]) { + Edge E = std::make_pair(Pred, BB); + EdgeWeights[E] = 0; + VisitedEdges.insert(E); + } + } else { + for (auto *Succ : Successors[BB]) { + Edge E = std::make_pair(BB, Succ); + EdgeWeights[E] = 0; + VisitedEdges.insert(E); + } + } + } else if (SelfReferentialEdge.first && VisitedBlocks.count(EC)) { + uint64_t &BBWeight = BlockWeights[BB]; + // We have a self-referential edge and the weight of BB is known. + if (BBWeight >= TotalWeight) + EdgeWeights[SelfReferentialEdge] = BBWeight - TotalWeight; + else + EdgeWeights[SelfReferentialEdge] = 0; + VisitedEdges.insert(SelfReferentialEdge); + Changed = true; + LLVM_DEBUG(dbgs() << "Set self-referential edge weight to: "; + printEdgeWeight(dbgs(), SelfReferentialEdge)); + } + if (UpdateBlockCount && !VisitedBlocks.count(EC) && TotalWeight > 0) { + BlockWeights[EC] = TotalWeight; + VisitedBlocks.insert(EC); + Changed = true; + } + } + } + + return Changed; +} + +/// Build in/out edge lists for each basic block in the CFG. +/// +/// We are interested in unique edges. If a block B1 has multiple +/// edges to another block B2, we only add a single B1->B2 edge. +void SampleProfileLoaderBaseImpl::buildEdges(Function &F) { + for (auto &BI : F) { + BasicBlock *B1 = &BI; + + // Add predecessors for B1. + SmallPtrSet<BasicBlock *, 16> Visited; + if (!Predecessors[B1].empty()) + llvm_unreachable("Found a stale predecessors list in a basic block."); + for (BasicBlock *B2 : predecessors(B1)) + if (Visited.insert(B2).second) + Predecessors[B1].push_back(B2); + + // Add successors for B1. + Visited.clear(); + if (!Successors[B1].empty()) + llvm_unreachable("Found a stale successors list in a basic block."); + for (BasicBlock *B2 : successors(B1)) + if (Visited.insert(B2).second) + Successors[B1].push_back(B2); + } +} + +/// Propagate weights into edges +/// +/// The following rules are applied to every block BB in the CFG: +/// +/// - If BB has a single predecessor/successor, then the weight +/// of that edge is the weight of the block. +/// +/// - If all incoming or outgoing edges are known except one, and the +/// weight of the block is already known, the weight of the unknown +/// edge will be the weight of the block minus the sum of all the known +/// edges. If the sum of all the known edges is larger than BB's weight, +/// we set the unknown edge weight to zero. +/// +/// - If there is a self-referential edge, and the weight of the block is +/// known, the weight for that edge is set to the weight of the block +/// minus the weight of the other incoming edges to that block (if +/// known). +void SampleProfileLoaderBaseImpl::propagateWeights(Function &F) { + bool Changed = true; + unsigned I = 0; + + // If BB weight is larger than its corresponding loop's header BB weight, + // use the BB weight to replace the loop header BB weight. + for (auto &BI : F) { + BasicBlock *BB = &BI; + Loop *L = LI->getLoopFor(BB); + if (!L) { + continue; + } + BasicBlock *Header = L->getHeader(); + if (Header && BlockWeights[BB] > BlockWeights[Header]) { + BlockWeights[Header] = BlockWeights[BB]; + } + } + + // Before propagation starts, build, for each block, a list of + // unique predecessors and successors. This is necessary to handle + // identical edges in multiway branches. Since we visit all blocks and all + // edges of the CFG, it is cleaner to build these lists once at the start + // of the pass. + buildEdges(F); + + // Propagate until we converge or we go past the iteration limit. + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, false); + } + + // The first propagation propagates BB counts from annotated BBs to unknown + // BBs. The 2nd propagation pass resets edges weights, and use all BB weights + // to propagate edge weights. + VisitedEdges.clear(); + Changed = true; + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, false); + } + + // The 3rd propagation pass allows adjust annotated BB weights that are + // obviously wrong. + Changed = true; + while (Changed && I++ < SampleProfileMaxPropagateIterations) { + Changed = propagateThroughEdges(F, true); + } +} + +/// Generate branch weight metadata for all branches in \p F. +/// +/// Branch weights are computed out of instruction samples using a +/// propagation heuristic. Propagation proceeds in 3 phases: +/// +/// 1- Assignment of block weights. All the basic blocks in the function +/// are initial assigned the same weight as their most frequently +/// executed instruction. +/// +/// 2- Creation of equivalence classes. Since samples may be missing from +/// blocks, we can fill in the gaps by setting the weights of all the +/// blocks in the same equivalence class to the same weight. To compute +/// the concept of equivalence, we use dominance and loop information. +/// Two blocks B1 and B2 are in the same equivalence class if B1 +/// dominates B2, B2 post-dominates B1 and both are in the same loop. +/// +/// 3- Propagation of block weights into edges. This uses a simple +/// propagation heuristic. The following rules are applied to every +/// block BB in the CFG: +/// +/// - If BB has a single predecessor/successor, then the weight +/// of that edge is the weight of the block. +/// +/// - If all the edges are known except one, and the weight of the +/// block is already known, the weight of the unknown edge will +/// be the weight of the block minus the sum of all the known +/// edges. If the sum of all the known edges is larger than BB's weight, +/// we set the unknown edge weight to zero. +/// +/// - If there is a self-referential edge, and the weight of the block is +/// known, the weight for that edge is set to the weight of the block +/// minus the weight of the other incoming edges to that block (if +/// known). +/// +/// Since this propagation is not guaranteed to finalize for every CFG, we +/// only allow it to proceed for a limited number of iterations (controlled +/// by -sample-profile-max-propagate-iterations). +/// +/// FIXME: Try to replace this propagation heuristic with a scheme +/// that is guaranteed to finalize. A work-list approach similar to +/// the standard value propagation algorithm used by SSA-CCP might +/// work here. +/// +/// \param F The function to query. +/// +/// \returns true if \p F was modified. Returns false, otherwise. +bool SampleProfileLoaderBaseImpl::computeAndPropagateWeights( + Function &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) { + bool Changed = (InlinedGUIDs.size() != 0); + + // Compute basic block weights. + Changed |= computeBlockWeights(F); + + if (Changed) { + // Add an entry count to the function using the samples gathered at the + // function entry. + // Sets the GUIDs that are inlined in the profiled binary. This is used + // for ThinLink to make correct liveness analysis, and also make the IR + // match the profiled binary before annotation. + F.setEntryCount( + ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real), + &InlinedGUIDs); + + // Compute dominance and loop info needed for propagation. + computeDominanceAndLoopInfo(F); + + // Find equivalence classes. + findEquivalenceClasses(F); + + // Propagate weights to all edges. + propagateWeights(F); + } + + return Changed; +} + +void SampleProfileLoaderBaseImpl::emitCoverageRemarks(Function &F) { + // If coverage checking was requested, compute it now. + if (SampleProfileRecordCoverage) { + unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI); + unsigned Total = CoverageTracker.countBodyRecords(Samples, PSI); + unsigned Coverage = CoverageTracker.computeCoverage(Used, Total); + if (Coverage < SampleProfileRecordCoverage) { + F.getContext().diagnose(DiagnosticInfoSampleProfile( + F.getSubprogram()->getFilename(), getFunctionLoc(F), + Twine(Used) + " of " + Twine(Total) + " available profile records (" + + Twine(Coverage) + "%) were applied", + DS_Warning)); + } + } + + if (SampleProfileSampleCoverage) { + uint64_t Used = CoverageTracker.getTotalUsedSamples(); + uint64_t Total = CoverageTracker.countBodySamples(Samples, PSI); + unsigned Coverage = CoverageTracker.computeCoverage(Used, Total); + if (Coverage < SampleProfileSampleCoverage) { + F.getContext().diagnose(DiagnosticInfoSampleProfile( + F.getSubprogram()->getFilename(), getFunctionLoc(F), + Twine(Used) + " of " + Twine(Total) + " available profile samples (" + + Twine(Coverage) + "%) were applied", + DS_Warning)); + } + } +} + +/// Get the line number for the function header. +/// +/// This looks up function \p F in the current compilation unit and +/// retrieves the line number where the function is defined. This is +/// line 0 for all the samples read from the profile file. Every line +/// number is relative to this line. +/// +/// \param F Function object to query. +/// +/// \returns the line number where \p F is defined. If it returns 0, +/// it means that there is no debug information available for \p F. +unsigned SampleProfileLoaderBaseImpl::getFunctionLoc(Function &F) { + if (DISubprogram *S = F.getSubprogram()) + return S->getLine(); + + if (NoWarnSampleUnused) + return 0; + + // If the start of \p F is missing, emit a diagnostic to inform the user + // about the missed opportunity. + F.getContext().diagnose(DiagnosticInfoSampleProfile( + "No debug information found in function " + F.getName() + + ": Function profile not used", + DS_Warning)); + return 0; +} + +void SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(Function &F) { + DT.reset(new DominatorTree); + DT->recalculate(F); + + PDT.reset(new PostDominatorTree(F)); + + LI.reset(new LoopInfo); + LI->analyze(*DT); +} + +#undef DEBUG_TYPE + +} // namespace llvm +#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERIMPL_H diff --git a/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h new file mode 100644 index 000000000000..37dc8d8187d9 --- /dev/null +++ b/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseUtil.h @@ -0,0 +1,97 @@ +////===- SampleProfileLoadBaseUtil.h - Profile loader util func --*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +/// \file +/// This file provides the utility functions for the sampled PGO loader base +/// implementation. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H +#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/IR/BasicBlock.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/DebugLoc.h" +#include "llvm/IR/Function.h" +#include "llvm/ProfileData/SampleProf.h" +#include "llvm/Support/CommandLine.h" + +namespace llvm { +using namespace sampleprof; + +extern cl::opt<unsigned> SampleProfileMaxPropagateIterations; +extern cl::opt<unsigned> SampleProfileRecordCoverage; +extern cl::opt<unsigned> SampleProfileSampleCoverage; +extern cl::opt<bool> NoWarnSampleUnused; + +namespace sampleprofutil { + +class SampleCoverageTracker { +public: + bool markSamplesUsed(const FunctionSamples *FS, uint32_t LineOffset, + uint32_t Discriminator, uint64_t Samples); + unsigned computeCoverage(unsigned Used, unsigned Total) const; + unsigned countUsedRecords(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + unsigned countBodyRecords(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + uint64_t getTotalUsedSamples() const { return TotalUsedSamples; } + uint64_t countBodySamples(const FunctionSamples *FS, + ProfileSummaryInfo *PSI) const; + + void clear() { + SampleCoverage.clear(); + TotalUsedSamples = 0; + } + void setProfAccForSymsInList(bool V) { ProfAccForSymsInList = V; } + +private: + using BodySampleCoverageMap = std::map<LineLocation, unsigned>; + using FunctionSamplesCoverageMap = + DenseMap<const FunctionSamples *, BodySampleCoverageMap>; + + /// Coverage map for sampling records. + /// + /// This map keeps a record of sampling records that have been matched to + /// an IR instruction. This is used to detect some form of staleness in + /// profiles (see flag -sample-profile-check-coverage). + /// + /// Each entry in the map corresponds to a FunctionSamples instance. This is + /// another map that counts how many times the sample record at the + /// given location has been used. + FunctionSamplesCoverageMap SampleCoverage; + + /// Number of samples used from the profile. + /// + /// When a sampling record is used for the first time, the samples from + /// that record are added to this accumulator. Coverage is later computed + /// based on the total number of samples available in this function and + /// its callsites. + /// + /// Note that this accumulator tracks samples used from a single function + /// and all the inlined callsites. Strictly, we should have a map of counters + /// keyed by FunctionSamples pointers, but these stats are cleared after + /// every function, so we just need to keep a single counter. + uint64_t TotalUsedSamples = 0; + + // For symbol in profile symbol list, whether to regard their profiles + // to be accurate. This is passed from the SampleLoader instance. + bool ProfAccForSymsInList = false; +}; + +/// Return true if the given callsite is hot wrt to hot cutoff threshold. +bool callsiteIsHot(const FunctionSamples *CallsiteFS, ProfileSummaryInfo *PSI, + bool ProfAccForSymsInList); + +} // end of namespace sampleprofutil +} // end of namespace llvm + +#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILELOADERUTIL_H diff --git a/llvm/lib/ProfileData/CMakeLists.txt b/llvm/lib/ProfileData/CMakeLists.txt index 2a377e4d74d3..4125fac918ab 100644 --- a/llvm/lib/ProfileData/CMakeLists.txt +++ b/llvm/lib/ProfileData/CMakeLists.txt @@ -5,6 +5,7 @@ add_llvm_component_library(LLVMProfileData InstrProfWriter.cpp ProfileSummaryBuilder.cpp </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2 - Build # 15 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Culprit: <cut> commit a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 Author: Sanjay Patel <spatel(a)rotateright.com> Date: Wed Aug 11 12:41:47 2021 -0400 [InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics. </cut> Results regressed to (for first_bad == a0a9c9e188f5b97ff8b74287d1536f57ec5dda54) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54/results_id: 1 # 464.h264ref,h264ref_base.default regressed by 106 # 464.h264ref,[.] FastFullPelBlockMotionSearch regressed by 146 from (for last_good == 5bf4ab0e79e1a8552019918a662bdf7af8b3825a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-5bf4ab0e79e1a8552019918a662bdf7af8b3825a/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/3875 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2/3877 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 cd investigate-llvm-a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 5bf4ab0e79e1a8552019918a662bdf7af8b3825a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit a0a9c9e188f5b97ff8b74287d1536f57ec5dda54 Author: Sanjay Patel <spatel(a)rotateright.com> Date: Wed Aug 11 12:41:47 2021 -0400 [InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics. --- llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp | 12 +++++++++--- llvm/test/Transforms/InstCombine/icmp-add.ll | 13 ++++++------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp index 2e20bca300d3..71037616585c 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp @@ -5755,9 +5755,6 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) { if (Instruction *Res = foldICmpWithDominatingICmp(I)) return Res; - if (Instruction *Res = foldICmpBinOp(I, Q)) - return Res; - if (Instruction *Res = foldICmpUsingKnownBits(I)) return Res; @@ -5803,6 +5800,15 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) { } } + // The folds in here may rely on wrapping flags and special constants, so + // they can break up min/max idioms in some cases but not seemingly similar + // patterns. + // FIXME: It may be possible to enhance select folding to make this + // unnecessary. It may also be moot if we canonicalize to min/max + // intrinsics. + if (Instruction *Res = foldICmpBinOp(I, Q)) + return Res; + if (Instruction *Res = foldICmpInstWithConstant(I)) return Res; diff --git a/llvm/test/Transforms/InstCombine/icmp-add.ll b/llvm/test/Transforms/InstCombine/icmp-add.ll index 187e0ad1a31b..1750b5685c50 100644 --- a/llvm/test/Transforms/InstCombine/icmp-add.ll +++ b/llvm/test/Transforms/InstCombine/icmp-add.ll @@ -972,7 +972,6 @@ define i1 @slt_offset_nsw(i8 %a, i8 %c) { ret i1 %ov } -; FIXME: ; In the following 4 tests, we could push the inc/dec ; through the min/max, but we should not break up the ; min/max idiom by using different icmp and select @@ -980,9 +979,9 @@ define i1 @slt_offset_nsw(i8 %a, i8 %c) { define i32 @increment_max(i32 %x) { ; CHECK-LABEL: @increment_max( -; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[X:%.*]], 1 -; CHECK-NEXT: [[C_INV:%.*]] = icmp slt i32 [[X]], 0 -; CHECK-NEXT: [[S:%.*]] = select i1 [[C_INV]], i32 0, i32 [[A]] +; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], -1 +; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 -1 +; CHECK-NEXT: [[S:%.*]] = add nsw i32 [[TMP2]], 1 ; CHECK-NEXT: ret i32 [[S]] ; %a = add nsw i32 %x, 1 @@ -1019,9 +1018,9 @@ define i32 @increment_min(i32 %x) { define i32 @decrement_min(i32 %x) { ; CHECK-LABEL: @decrement_min( -; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[X:%.*]], -1 -; CHECK-NEXT: [[C_INV:%.*]] = icmp sgt i32 [[X]], 0 -; CHECK-NEXT: [[S:%.*]] = select i1 [[C_INV]], i32 0, i32 [[A]] +; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[X:%.*]], 1 +; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 1 +; CHECK-NEXT: [[S:%.*]] = add nsw i32 [[TMP2]], -1 ; CHECK-NEXT: ret i32 [[S]] ; %a = add nsw i32 %x, -1 </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_apm/llvm-master-aarch64-spec2k6-Os - Build # 4 - Fixed!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os Culprit: <cut> commit dae7adda949993bd96aa50c551dc64ddebba7923 Author: Matt Jacobson <mhjacobson(a)me.com> Date: Fri Aug 6 10:12:00 2021 +0800 [AVR][clang] Pass '-fno-use-init-array' to cc1 as default On AVR, '.ctors' is used, not '.init_array'. Make this the default unless specifically overridden by driver argument. This matches gcc, and it matches the behavior in (e.g.) the NetBSD driver (for certain OS variants). Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107610 </cut> Results regressed to (for first_bad == dae7adda949993bd96aa50c551dc64ddebba7923) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-dae7adda949993bd96aa50c551dc64ddebba7923/results_id: 1 # 400.perlbench,perlbench_base.default regressed by 94393 # 453.povray,povray_base.default regressed by 102 # 470.lbm,lbm_base.default regressed by 103 # 470.lbm,[.] LBM_performStreamCollide regressed by 118 from (for last_good == 66b1e629d89543cb7542c184f7dfb32deee732e1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -Os artifacts/build-66b1e629d89543cb7542c184f7dfb32deee732e1/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Os/3855 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Os/3852 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-dae7adda949993bd96aa50c551dc64ddebba7923 cd investigate-llvm-dae7adda949993bd96aa50c551dc64ddebba7923 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach dae7adda949993bd96aa50c551dc64ddebba7923 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 66b1e629d89543cb7542c184f7dfb32deee732e1 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Full commit (up to 1000 lines): <cut> commit dae7adda949993bd96aa50c551dc64ddebba7923 Author: Matt Jacobson <mhjacobson(a)me.com> Date: Fri Aug 6 10:12:00 2021 +0800 [AVR][clang] Pass '-fno-use-init-array' to cc1 as default On AVR, '.ctors' is used, not '.init_array'. Make this the default unless specifically overridden by driver argument. This matches gcc, and it matches the behavior in (e.g.) the NetBSD driver (for certain OS variants). Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107610 --- clang/lib/Driver/ToolChains/AVR.cpp | 10 ++++++++++ clang/lib/Driver/ToolChains/AVR.h | 7 ++++++- clang/test/Driver/avr-toolchain.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/clang/lib/Driver/ToolChains/AVR.cpp b/clang/lib/Driver/ToolChains/AVR.cpp index 5b097f9b2ed9..18c6f41e22b1 100644 --- a/clang/lib/Driver/ToolChains/AVR.cpp +++ b/clang/lib/Driver/ToolChains/AVR.cpp @@ -370,6 +370,16 @@ void AVRToolChain::AddClangSystemIncludeArgs(const ArgList &DriverArgs, addSystemInclude(DriverArgs, CC1Args, AVRInc); } +void AVRToolChain::addClangTargetOptions( + const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args, + Action::OffloadKind DeviceOffloadKind) const { + // By default, use `.ctors` (not `.init_array`), as required by libgcc, which + // runs constructors/destructors on AVR. + if (!DriverArgs.hasFlag(options::OPT_fuse_init_array, + options::OPT_fno_use_init_array, false)) + CC1Args.push_back("-fno-use-init-array"); +} + Tool *AVRToolChain::buildLinker() const { return new tools::AVR::Linker(getTriple(), *this, LinkStdlib); } diff --git a/clang/lib/Driver/ToolChains/AVR.h b/clang/lib/Driver/ToolChains/AVR.h index f612aa691182..2d027957ed76 100644 --- a/clang/lib/Driver/ToolChains/AVR.h +++ b/clang/lib/Driver/ToolChains/AVR.h @@ -11,8 +11,8 @@ #include "Gnu.h" #include "clang/Driver/InputInfo.h" -#include "clang/Driver/ToolChain.h" #include "clang/Driver/Tool.h" +#include "clang/Driver/ToolChain.h" namespace clang { namespace driver { @@ -26,6 +26,11 @@ public: AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args) const override; + void + addClangTargetOptions(const llvm::opt::ArgList &DriverArgs, + llvm::opt::ArgStringList &CC1Args, + Action::OffloadKind DeviceOffloadKind) const override; + protected: Tool *buildLinker() const override; diff --git a/clang/test/Driver/avr-toolchain.c b/clang/test/Driver/avr-toolchain.c index 692063dc2c34..877f650a3d02 100644 --- a/clang/test/Driver/avr-toolchain.c +++ b/clang/test/Driver/avr-toolchain.c @@ -1,7 +1,7 @@ // A basic clang -cc1 command-line. // RUN: %clang %s -### -no-canonical-prefixes -target avr 2>&1 | FileCheck -check-prefix=CC1 %s -// CC1: clang{{.*}} "-cc1" "-triple" "avr" +// CC1: clang{{.*}} "-cc1" "-triple" "avr" {{.*}} "-fno-use-init-array" // RUN: %clang %s -### -no-canonical-prefixes -target avr --sysroot %S/Inputs/basic_avr_tree 2>&1 | FileCheck -check-prefix CC1A %s // CC1A: clang{{.*}} "-cc1" "-triple" "avr" {{.*}} "-internal-isystem" {{".*avr/include"}} </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tk1/llvm-release-arm-spec2k6-O3 - Build # 9 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3 Culprit: <cut> commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed Author: Sjoerd Meijer <sjoerd.meijer(a)arm.com> Date: Fri Feb 12 15:15:05 2021 +0000 [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600 </cut> Results regressed to (for first_bad == cd6de0e8de4a5fd558580be4b1a07116914fc8ed) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-cd6de0e8de4a5fd558580be4b1a07116914fc8ed/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 103 # 482.sphinx3,[.] vector_gautbl_eval_logs3 regressed by 111 from (for last_good == 4bd5bd40094c7b8b691cf394d813efc48d82acfd) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-4bd5bd40094c7b8b691cf394d813efc48d82acfd/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3835 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3840 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-cd6de0e8de4a5fd558580be4b1a07116914fc8ed cd investigate-llvm-cd6de0e8de4a5fd558580be4b1a07116914fc8ed git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach cd6de0e8de4a5fd558580be4b1a07116914fc8ed ../artifacts/test.sh # Reproduce last_good build git checkout --detach 4bd5bd40094c7b8b691cf394d813efc48d82acfd ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… Full commit (up to 1000 lines): <cut> commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed Author: Sjoerd Meijer <sjoerd.meijer(a)arm.com> Date: Fri Feb 12 15:15:05 2021 +0000 [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600 --- llvm/include/llvm/Analysis/TargetTransformInfo.h | 25 ++++++++++++---------- .../llvm/Analysis/TargetTransformInfoImpl.h | 7 +++--- llvm/lib/Analysis/TargetTransformInfo.cpp | 10 ++++----- llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 22 ++++++++++--------- llvm/lib/Target/ARM/ARMTargetTransformInfo.h | 4 ++-- .../Target/Hexagon/HexagonTargetTransformInfo.cpp | 5 +++-- .../Target/Hexagon/HexagonTargetTransformInfo.h | 3 ++- llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp | 24 ++++++++++++--------- 8 files changed, 55 insertions(+), 45 deletions(-) diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index c3d7d2cc80a4..79303dab92a2 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -638,13 +638,15 @@ public: DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo) const; - /// \return True is LSR should make efforts to create/preserve post-inc - /// addressing mode expressions. - bool shouldFavorPostInc() const; + enum AddressingModeKind { + AMK_PreIndexed, + AMK_PostIndexed, + AMK_None + }; - /// Return true if LSR should make efforts to generate indexed addressing - /// modes that operate across loop iterations. - bool shouldFavorBackedgeIndex(const Loop *L) const; + /// Return the preferred addressing mode LSR should make efforts to generate. + AddressingModeKind getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const; /// Return true if the target supports masked store. bool isLegalMaskedStore(Type *DataType, Align Alignment) const; @@ -1454,8 +1456,8 @@ public: virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE, LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC, TargetLibraryInfo *LibInfo) = 0; - virtual bool shouldFavorPostInc() const = 0; - virtual bool shouldFavorBackedgeIndex(const Loop *L) const = 0; + virtual AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const = 0; virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0; virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0; virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0; @@ -1796,9 +1798,10 @@ public: TargetLibraryInfo *LibInfo) override { return Impl.canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo); } - bool shouldFavorPostInc() const override { return Impl.shouldFavorPostInc(); } - bool shouldFavorBackedgeIndex(const Loop *L) const override { - return Impl.shouldFavorBackedgeIndex(L); + AddressingModeKind + getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const override { + return Impl.getPreferredAddressingMode(L, SE); } bool isLegalMaskedStore(Type *DataType, Align Alignment) override { return Impl.isLegalMaskedStore(DataType, Alignment); diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 84de5038df42..a9c9d3cb9f4f 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -209,9 +209,10 @@ public: return false; } - bool shouldFavorPostInc() const { return false; } - - bool shouldFavorBackedgeIndex(const Loop *L) const { return false; } + TTI::AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const { + return TTI::AMK_None; + } bool isLegalMaskedStore(Type *DataType, Align Alignment) const { return false; diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 16992d099e0a..3db4b0b0d553 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -409,12 +409,10 @@ bool TargetTransformInfo::canSaveCmp(Loop *L, BranchInst **BI, return TTIImpl->canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo); } -bool TargetTransformInfo::shouldFavorPostInc() const { - return TTIImpl->shouldFavorPostInc(); -} - -bool TargetTransformInfo::shouldFavorBackedgeIndex(const Loop *L) const { - return TTIImpl->shouldFavorBackedgeIndex(L); +TTI::AddressingModeKind +TargetTransformInfo::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { + return TTIImpl->getPreferredAddressingMode(L, SE); } bool TargetTransformInfo::isLegalMaskedStore(Type *DataType, diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp index 80f1f2a2a8f7..8c2a79efc674 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp @@ -100,18 +100,20 @@ bool ARMTTIImpl::areInlineCompatible(const Function *Caller, return MatchExact && MatchSubset; } -bool ARMTTIImpl::shouldFavorBackedgeIndex(const Loop *L) const { - if (L->getHeader()->getParent()->hasOptSize()) - return false; +TTI::AddressingModeKind +ARMTTIImpl::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { if (ST->hasMVEIntegerOps()) - return false; - return ST->isMClass() && ST->isThumb2() && L->getNumBlocks() == 1; -} + return TTI::AMK_PostIndexed; -bool ARMTTIImpl::shouldFavorPostInc() const { - if (ST->hasMVEIntegerOps()) - return true; - return false; + if (L->getHeader()->getParent()->hasOptSize()) + return TTI::AMK_None; + + if (ST->isMClass() && ST->isThumb2() && + L->getNumBlocks() == 1) + return TTI::AMK_PreIndexed; + + return TTI::AMK_None; } Optional<Instruction *> diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h index b8de27101a61..808128929000 100644 --- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h +++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h @@ -103,8 +103,8 @@ public: bool enableInterleavedAccessVectorization() { return true; } - bool shouldFavorBackedgeIndex(const Loop *L) const; - bool shouldFavorPostInc() const; + TTI::AddressingModeKind + getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const; /// Floating-point computation using ARMv8 AArch32 Advanced /// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp index af7bc4682249..89e7df0aa27e 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp @@ -80,8 +80,9 @@ void HexagonTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE, } } -bool HexagonTTIImpl::shouldFavorPostInc() const { - return true; +AddressingModeKind::getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const { + return AMK_PostIndexed; } /// --- Vector TTI begin --- diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h index dc075d6147b6..ebaa619837f0 100644 --- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h +++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h @@ -67,7 +67,8 @@ public: TTI::PeelingPreferences &PP); /// Bias LSR towards creating post-increment opportunities. - bool shouldFavorPostInc() const; + AddressingModeKind getPreferredAddressingMode(const Loop *L, + ScalarEvolution *SE) const; // L1 cache prefetch. unsigned getPrefetchDistance() const override; diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp index 5dec9b542076..2f90df70a3c3 100644 --- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp +++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp @@ -1227,13 +1227,15 @@ static unsigned getSetupCost(const SCEV *Reg, unsigned Depth) { /// Tally up interesting quantities from the given register. void Cost::RateRegister(const Formula &F, const SCEV *Reg, SmallPtrSetImpl<const SCEV *> &Regs) { + TTI::AddressingModeKind AMK = TTI->getPreferredAddressingMode(L, SE); + if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) { // If this is an addrec for another loop, it should be an invariant // with respect to L since L is the innermost loop (at least // for now LSR only handles innermost loops). if (AR->getLoop() != L) { // If the AddRec exists, consider it's register free and leave it alone. - if (isExistingPhi(AR, *SE) && !TTI->shouldFavorPostInc()) + if (isExistingPhi(AR, *SE) && AMK != TTI::AMK_PostIndexed) return; // It is bad to allow LSR for current loop to add induction variables @@ -1254,13 +1256,11 @@ void Cost::RateRegister(const Formula &F, const SCEV *Reg, // If the step size matches the base offset, we could use pre-indexed // addressing. - if (TTI->shouldFavorBackedgeIndex(L)) { + if (AMK == TTI::AMK_PreIndexed) { if (auto *Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*SE))) if (Step->getAPInt() == F.BaseOffset) LoopCost = 0; - } - - if (TTI->shouldFavorPostInc()) { + } else if (AMK == TTI::AMK_PostIndexed) { const SCEV *LoopStep = AR->getStepRecurrence(*SE); if (isa<SCEVConstant>(LoopStep)) { const SCEV *LoopStart = AR->getStart(); @@ -3575,7 +3575,8 @@ void LSRInstance::GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx, // may generate a post-increment operator. The reason is that the // reassociations cause extra base+register formula to be created, // and possibly chosen, but the post-increment is more efficient. - if (TTI.shouldFavorPostInc() && mayUsePostIncMode(TTI, LU, BaseReg, L, SE)) + TTI::AddressingModeKind AMK = TTI.getPreferredAddressingMode(L, &SE); + if (AMK == TTI::AMK_PostIndexed && mayUsePostIncMode(TTI, LU, BaseReg, L, SE)) return; SmallVector<const SCEV *, 8> AddOps; const SCEV *Remainder = CollectSubexprs(BaseReg, nullptr, AddOps, L, SE); @@ -4239,7 +4240,8 @@ void LSRInstance::GenerateCrossUseConstantOffsets() { NewF.BaseOffset = (uint64_t)NewF.BaseOffset + Imm; if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, NewF)) { - if (TTI.shouldFavorPostInc() && + if (TTI.getPreferredAddressingMode(this->L, &SE) == + TTI::AMK_PostIndexed && mayUsePostIncMode(TTI, LU, OrigReg, this->L, SE)) continue; if (!TTI.isLegalAddImmediate((uint64_t)NewF.UnfoldedOffset + Imm)) @@ -4679,7 +4681,7 @@ void LSRInstance::NarrowSearchSpaceByFilterFormulaWithSameScaledReg() { /// If we are over the complexity limit, filter out any post-inc prefering /// variables to only post-inc values. void LSRInstance::NarrowSearchSpaceByFilterPostInc() { - if (!TTI.shouldFavorPostInc()) + if (TTI.getPreferredAddressingMode(L, &SE) != TTI::AMK_PostIndexed) return; if (EstimateSearchSpaceComplexity() < ComplexityLimit) return; @@ -4978,7 +4980,8 @@ void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution, // This can sometimes (notably when trying to favour postinc) lead to // sub-optimial decisions. There it is best left to the cost modelling to // get correct. - if (!TTI.shouldFavorPostInc() || LU.Kind != LSRUse::Address) { + if (TTI.getPreferredAddressingMode(L, &SE) != TTI::AMK_PostIndexed || + LU.Kind != LSRUse::Address) { int NumReqRegsToFind = std::min(F.getNumRegs(), ReqRegs.size()); for (const SCEV *Reg : ReqRegs) { if ((F.ScaledReg && F.ScaledReg == Reg) || @@ -5560,7 +5563,8 @@ LSRInstance::LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE, TargetLibraryInfo &TLI, MemorySSAUpdater *MSSAU) : IU(IU), SE(SE), DT(DT), LI(LI), AC(AC), TLI(TLI), TTI(TTI), L(L), MSSAU(MSSAU), FavorBackedgeIndex(EnableBackedgeIndexing && - TTI.shouldFavorBackedgeIndex(L)) { + TTI.getPreferredAddressingMode(L, &SE) == + TTI::AMK_PreIndexed) { // If LoopSimplify form is not available, stay out of trouble. if (!L->isLoopSimplifyForm()) return; </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-aarch64-lts-allmodconfig - Build # 6 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-lts-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-master-aarch64-lts-allmodconfig Culprit: <cut> commit 132a8267adabd645476b542b3b132c1b91988fe8 Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Date: Thu Aug 12 13:22:21 2021 +0200 Linux 5.10.58 Link: https://lore.kernel.org/r/20210810172955.660225700@linuxfoundation.org Tested-by: Fox Chen <foxhlchen(a)gmail.com> Tested-by: Hulk Robot <hulkrobot(a)huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee(a)codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Tested-by: Shuah Khan <skhan(a)linuxfoundation.org> Tested-by: Aakash Hemadri <aakashhemadri123(a)gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> </cut> Results regressed to (for first_bad == 132a8267adabd645476b542b3b132c1b91988fe8) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 28702 # linux build successful: all # First few build errors in logs: from (for last_good == 3d7d1b0f5f41d66a2d177f9fdcdb32e23a4b2513) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 28702 # linux build successful: all # linux boot successful: boot Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Configuration details: Reproduce builds: <cut> mkdir investigate-linux-132a8267adabd645476b542b3b132c1b91988fe8 cd investigate-linux-132a8267adabd645476b542b3b132c1b91988fe8 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach 132a8267adabd645476b542b3b132c1b91988fe8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 3d7d1b0f5f41d66a2d177f9fdcdb32e23a4b2513 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-lts-a… Full commit (up to 1000 lines): <cut> commit 132a8267adabd645476b542b3b132c1b91988fe8 Author: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Date: Thu Aug 12 13:22:21 2021 +0200 Linux 5.10.58 Link: https://lore.kernel.org/r/20210810172955.660225700@linuxfoundation.org Tested-by: Fox Chen <foxhlchen(a)gmail.com> Tested-by: Hulk Robot <hulkrobot(a)huawei.com> Tested-by: Sudip Mukherjee <sudip.mukherjee(a)codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Tested-by: Shuah Khan <skhan(a)linuxfoundation.org> Tested-by: Aakash Hemadri <aakashhemadri123(a)gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e9621a90e752..232dee1140c1 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 57 +SUBLEVEL = 58 EXTRAVERSION = NAME = Dare mighty things </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O2_LTO - Build # 22 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO Culprit: <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. </cut> Results regressed to (for first_bad == 4389a413e2129d7d55ee779638b649aa852b6f8a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-4389a413e2129d7d55ee779638b649aa852b6f8a/results_id: 1 # 444.namd,namd_base.default regressed by 104 # 447.dealII,dealII_base.default regressed by 105 from (for last_good == dfce2909ee1ea1523ec27b834a0e56429e9c2beb) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2_LTO artifacts/build-dfce2909ee1ea1523ec27b834a0e56429e9c2beb/results_id: 1 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/3823 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/3819 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration details: Reproduce builds: <cut> mkdir investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a cd investigate-llvm-4389a413e2129d7d55ee779638b649aa852b6f8a git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 4389a413e2129d7d55ee779638b649aa852b6f8a ../artifacts/test.sh # Reproduce last_good build git checkout --detach dfce2909ee1ea1523ec27b834a0e56429e9c2beb ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Full commit (up to 1000 lines): <cut> commit 4389a413e2129d7d55ee779638b649aa852b6f8a Author: Zahira Ammarguellat <zahira.ammarguellat(a)intel.com> Date: Fri Aug 6 12:01:47 2021 -0700 Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on" This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f. --- clang/docs/UsersManual.rst | 48 ++----------------------- clang/lib/Driver/ToolChains/Clang.cpp | 33 ++++++++--------- clang/test/CodeGen/ffp-contract-option.c | 47 +++--------------------- clang/test/CodeGen/ppc-emmintrin.c | 4 +-- clang/test/CodeGen/ppc-xmmintrin.c | 4 +-- clang/test/Driver/fp-model.c | 61 +++++++++++++++----------------- 6 files changed, 58 insertions(+), 139 deletions(-) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 838669794ea8..980d0ab45975 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -1260,50 +1260,8 @@ installed. Controlling Floating Point Behavior ----------------------------------- -Clang provides a number of ways to control floating point behavior, including -with command line options and source pragmas. This section -describes the various floating point semantic modes and the corresponding options. - -.. csv-table:: Floating Point Semantic Modes - :header: "Mode", "Values" - :widths: 15, 30, 30 - - "except_behavior", "{ignore, strict, may_trap}", "ffp-exception-behavior" - "fenv_access", "{off, on}", "(none)" - "rounding_mode", "{dynamic, tonearest, downward, upward, towardzero}", "frounding-math" - "contract", "{on, off, fast}", "ffp-contract" - "denormal_fp_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math" - "denormal_fp32_math", "{IEEE, PreserveSign, PositiveZero}", "fdenormal-fp-math-fp32" - "support_math_errno", "{on, off}", "fmath-errno" - "no_honor_nans", "{on, off}", "fhonor-nans" - "no_honor_infinities", "{on, off}", "fhonor-infinities" - "no_signed_zeros", "{on, off}", "fsigned-zeros" - "allow_reciprocal", "{on, off}", "freciprocal-math" - "allow_approximate_fns", "{on, off}", "(none)" - "allow_reassociation", "{on, off}", "fassociative-math" - - -This table describes the option settings that correspond to the three -floating point semantic models: precise (the default), strict, and fast. - - -.. csv-table:: Floating Point Models - :header: "Mode", "Precise", "Strict", "Fast" - :widths: 25, 15, 15, 15 - - "except_behavior", "ignore", "strict", "ignore" - "fenv_access", "off", "on", "off" - "rounding_mode", "tonearest", "dynamic", "tonearest" - "contract", "on", "off", "fast" - "denormal_fp_math", "IEEE", "IEEE", "PreserveSign" - "denormal_fp32_math", "IEEE","IEEE", "PreserveSign" - "support_math_errno", "on", "on", "off" - "no_honor_nans", "off", "off", "on" - "no_honor_infinities", "off", "off", "on" - "no_signed_zeros", "off", "off", "on" - "allow_reciprocal", "off", "off", "on" - "allow_approximate_fns", "off", "off", "on" - "allow_reassociation", "off", "off", "on" +Clang provides a number of ways to control floating point behavior. The options +are listed below. .. option:: -ffast-math @@ -1498,7 +1456,7 @@ Note that floating-point operations performed as part of constant initialization and ``fast``. Details: - * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``). This is the default behavior. + * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``). This is the default behavior. * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA). All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file. * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast`` diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index 1c79640be80f..96bbc0250126 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -2641,7 +2641,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath; llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math; - StringRef FPContract = "on"; + StringRef FPContract = ""; bool StrictFPModel = false; @@ -2666,7 +2666,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, ReciprocalMath = false; SignedZeros = true; // -fno_fast_math restores default denormal and fpcontract handling - FPContract = "on"; + FPContract = ""; DenormalFPMath = llvm::DenormalMode::getIEEE(); // FIXME: The target may have picked a non-IEEE default mode here based on @@ -2686,18 +2686,20 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // ffp-model= is a Driver option, it is entirely rewritten into more // granular options before being passed into cc1. // Use the gcc option in the switch below. - if (!FPModel.empty() && !FPModel.equals(Val)) + if (!FPModel.empty() && !FPModel.equals(Val)) { D.Diag(clang::diag::warn_drv_overriding_flag_option) << Args.MakeArgString("-ffp-model=" + FPModel) << Args.MakeArgString("-ffp-model=" + Val); + FPContract = ""; + } if (Val.equals("fast")) { optID = options::OPT_ffast_math; FPModel = Val; - FPContract = Val; + FPContract = "fast"; } else if (Val.equals("precise")) { optID = options::OPT_ffp_contract; FPModel = Val; - FPContract = "on"; + FPContract = "fast"; PreciseFPModel = true; } else if (Val.equals("strict")) { StrictFPModel = true; @@ -2783,11 +2785,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, case options::OPT_ffp_contract: { StringRef Val = A->getValue(); if (PreciseFPModel) { - // When -ffp-model=precise is seen on the command line, - // the boolean PreciseFPModel is set to true which indicates - // "the current option is actually PreciseFPModel". The optID - // is changed to OPT_ffp_contract and FPContract is set to "on". - // the argument Val string is "precise": it shouldn't be checked. + // -ffp-model=precise enables ffp-contract=fast as a side effect + // the FPContract value has already been set to a string literal + // and the Val string isn't a pertinent value. ; } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off")) FPContract = Val; @@ -2897,17 +2897,18 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, // -fno_fast_math restores default denormal and fpcontract handling DenormalFPMath = DefaultDenormalFPMath; DenormalFP32Math = llvm::DenormalMode::getIEEE(); - FPContract = "on"; + FPContract = ""; break; } if (StrictFPModel) { // If -ffp-model=strict has been specified on command line but // subsequent options conflict then emit warning diagnostic. - if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath && - SignedZeros && TrappingMath && RoundingFPMath && - DenormalFPMath == llvm::DenormalMode::getIEEE() && - DenormalFP32Math == llvm::DenormalMode::getIEEE() && - FPContract.equals("off")) + if (HonorINFs && HonorNaNs && + !AssociativeMath && !ReciprocalMath && + SignedZeros && TrappingMath && RoundingFPMath && + (FPContract.equals("off") || FPContract.empty()) && + DenormalFPMath == llvm::DenormalMode::getIEEE() && + DenormalFP32Math == llvm::DenormalMode::getIEEE()) // OK: Current Arg doesn't conflict with -ffp-model=strict ; else { diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c index efc72c2b5461..52b750795940 100644 --- a/clang/test/CodeGen/ffp-contract-option.c +++ b/clang/test/CodeGen/ffp-contract-option.c @@ -1,46 +1,9 @@ -// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s +// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s // REQUIRES: aarch64-registered-target float fma_test1(float a, float b, float c) { -// CHECK-FMADD: fmadd - float x = a * b; - float y = x + c; - return y; -} - -// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULT %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ON %s -// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-CONTRACTFAST %s -// -// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-DEFAULTFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-ONFAST %s -// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm -o - \ -// RUN:| FileCheck --check-prefix=CHECK-FASTFAST %s -float mymuladd( float x, float y, float z ) { - return x * y + z; - // CHECK-DEFAULT: = fmul float - // CHECK-DEFAULT: = fadd float - - // CHECK-ON: = call float @llvm.fmuladd.f32 - - // CHECK-CONTRACTFAST: = fmul contract float - // CHECK-CONTRACTFAST: = fadd contract float - - // CHECK-DEFAULTFAST: = fmul reassoc nnan ninf nsz arcp afn float - // CHECK-DEFAULTFAST: = fadd reassoc nnan ninf nsz arcp afn float - - // CHECK-ONFAST: = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32 - - // CHECK-FASTFAST: = fmul fast float - // CHECK-FASTFAST: = fadd fast float +// CHECK: fmadd + float x = a * b; + float y = x + c; + return y; } diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c index 4a246ff92d76..fa3801f50a01 100644 --- a/clang/test/CodeGen/ppc-emmintrin.c +++ b/clang/test/CodeGen/ppc-emmintrin.c @@ -2,9 +2,9 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4 diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c index a7f6ed6e0e67..d3f18bfbb1e5 100644 --- a/clang/test/CodeGen/ppc-xmmintrin.c +++ b/clang/test/CodeGen/ppc-xmmintrin.c @@ -2,11 +2,11 @@ // REQUIRES: powerpc-registered-target // RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ -// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE +// RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \ // RUN: -fno-discard-value-names -mllvm -disable-llvm-optzns diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c index c6d683e25c0b..5fa9d110dd83 100644 --- a/clang/test/Driver/fp-model.c +++ b/clang/test/Driver/fp-model.c @@ -1,90 +1,88 @@ // Test that incompatible combinations of -ffp-model= options // and other floating point options get a warning diagnostic. +// +// REQUIRES: clang-driver -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=off -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN %s // WARN: warning: overriding '-ffp-model=fast' option with '-ffp-contract=off' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=fast -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN1 %s // WARN1: warning: overriding '-ffp-model=fast' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fassociative-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN2 %s // WARN2: warning: overriding '-ffp-model=strict' option with '-fassociative-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffast-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffast-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN3 %s // WARN3: warning: overriding '-ffp-model=strict' option with '-ffast-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffinite-math-only -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN4 %s // WARN4: warning: overriding '-ffp-model=strict' option with '-ffinite-math-only' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN5 %s // WARN5: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=fast -c %s 2>&1 \ -// RUN: | FileCheck --check-prefix=WARN6 %s -// WARN6: warning: overriding '-ffp-model=strict' option with '-ffp-contract=fast' [-Woverriding-t-option] - -// RUN: %clang -target x86_64 -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -ffp-contract=on -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN7 %s // WARN7: warning: overriding '-ffp-model=strict' option with '-ffp-contract=on' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-infinities -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN8 %s // WARN8: warning: overriding '-ffp-model=strict' option with '-fno-honor-infinities' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-honor-nans -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN9 %s // WARN9: warning: overriding '-ffp-model=strict' option with '-fno-honor-nans' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-rounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNa %s // WARNa: warning: overriding '-ffp-model=strict' option with '-fno-rounding-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-signed-zeros -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNb %s // WARNb: warning: overriding '-ffp-model=strict' option with '-fno-signed-zeros' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fno-trapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNc %s // WARNc: warning: overriding '-ffp-model=strict' option with '-fno-trapping-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -freciprocal-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNd %s // WARNd: warning: overriding '-ffp-model=strict' option with '-freciprocal-math' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -funsafe-math-optimizations -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNe %s // WARNe: warning: overriding '-ffp-model=strict' option with '-funsafe-math-optimizations' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -Ofast -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -Ofast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARNf %s // WARNf: warning: overriding '-ffp-model=strict' option with '-Ofast' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ +// RUN: %clang -### -ffp-model=strict -fdenormal-fp-math=preserve-sign,preserve-sign -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=WARN10 %s // WARN10: warning: overriding '-ffp-model=strict' option with '-fdenormal-fp-math=preserve-sign,preserve-sign' [-Woverriding-t-option] -// RUN: %clang -target x86_64 -### -c %s 2>&1 \ +// RUN: %clang -### -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-NOROUND %s // CHECK-NOROUND: "-cc1" // CHECK-NOROUND: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -frounding-math -c %s 2>&1 \ +// RUN: %clang -### -frounding-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-ROUND --implicit-check-not ffp-exception-behavior=strict %s // CHECK-ROUND: "-cc1" // CHECK-ROUND: "-frounding-math" -// RUN: %clang -target x86_64 -### -ftrapping-math -c %s 2>&1 \ +// RUN: %clang -### -ftrapping-math -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-TRAP %s // CHECK-TRAP: "-cc1" // CHECK-TRAP: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=fast -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=fast -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-FAST %s // CHECK-FPM-FAST: "-cc1" // CHECK-FPM-FAST: "-menable-no-infs" @@ -98,35 +96,34 @@ // CHECK-FPM-FAST: "-ffast-math" // CHECK-FPM-FAST: "-ffinite-math-only" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=precise -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-PRECISE %s // CHECK-FPM-PRECISE: "-cc1" -// CHECK-FPM-PRECISE: "-ffp-contract=on" +// CHECK-FPM-PRECISE: "-ffp-contract=fast" // CHECK-FPM-PRECISE: "-fno-rounding-math" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-model=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FPM-STRICT %s // CHECK-FPM-STRICT: "-cc1" -// CHECK-FPM-STRICT: "-fmath-errno" -// CHECK-FPM-STRICT: "-ffp-contract=off" // CHECK-FPM-STRICT: "-frounding-math" // CHECK-FPM-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=strict -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-STRICT %s // CHECK-FEB-STRICT: "-cc1" // CHECK-FEB-STRICT: "-fno-rounding-math" // CHECK-FEB-STRICT: "-ffp-exception-behavior=strict" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=maytrap -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-MAYTRAP %s // CHECK-FEB-MAYTRAP: "-cc1" // CHECK-FEB-MAYTRAP: "-fno-rounding-math" // CHECK-FEB-MAYTRAP: "-ffp-exception-behavior=maytrap" -// RUN: %clang -target x86_64 -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ +// RUN: %clang -### -nostdinc -ffp-exception-behavior=ignore -c %s 2>&1 \ // RUN: | FileCheck --check-prefix=CHECK-FEB-IGNORE %s // CHECK-FEB-IGNORE: "-cc1" // CHECK-FEB-IGNORE: "-fno-rounding-math" // CHECK-FEB-IGNORE: "-ffp-exception-behavior=ignore" + </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan - Build # 1 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *gcc* in CI configuration tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan. So far, this commit has regressed CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan Culprit: <cut> commit d1819df86fbe42125cccb2fc2959a0bf51e524d6 Author: Jonathan Wright <jonathan.wright(a)arm.com> Date: Mon Aug 16 14:37:18 2021 +0100 aarch64: Remove macros for vld4[q]_lane Neon intrinsics Remove macros for vld4[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. gcc/ChangeLog: 2021-08-16 Jonathan Wright <jonathan.wright(a)arm.com> * config/aarch64/arm_neon.h (__LD4_LANE_FUNC): Delete. (__LD4Q_LANE_FUNC): Likewise. (vld4_lane_u8): Define without macro. (vld4_lane_u16): Likewise. (vld4_lane_u32): Likewise. (vld4_lane_u64): Likewise. (vld4_lane_s8): Likewise. (vld4_lane_s16): Likewise. (vld4_lane_s32): Likewise. (vld4_lane_s64): Likewise. (vld4_lane_f16): Likewise. (vld4_lane_f32): Likewise. (vld4_lane_f64): Likewise. (vld4_lane_p8): Likewise. (vld4_lane_p16): Likewise. (vld4_lane_p64): Likewise. (vld4q_lane_u8): Likewise. (vld4q_lane_u16): Likewise. (vld4q_lane_u32): Likewise. (vld4q_lane_u64): Likewise. (vld4q_lane_s8): Likewise. (vld4q_lane_s16): Likewise. (vld4q_lane_s32): Likewise. (vld4q_lane_s64): Likewise. (vld4q_lane_f16): Likewise. (vld4q_lane_f32): Likewise. (vld4q_lane_f64): Likewise. (vld4q_lane_p8): Likewise. (vld4q_lane_p16): Likewise. (vld4q_lane_p64): Likewise. (vld4_lane_bf16): Likewise. (vld4q_lane_bf16): Likewise. </cut> Results regressed to (for first_bad == d1819df86fbe42125cccb2fc2959a0bf51e524d6) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: # 00:10:53 make[3]: [Makefile:1769: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored) # 00:10:53 make[3]: [Makefile:1770: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored) # 00:26:15 /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:21081:11: error: cannot convert ‘float*’ to ‘const int*’ # 00:26:15 /home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:21384:9: error: cannot convert ‘long int*’ to ‘const double*’ # 00:26:16 make[3]: *** [Makefile:226: lex.o] Error 1 # 00:26:30 make[2]: *** [Makefile:9758: all-stage2-libcpp] Error 2 # 00:28:15 make[1]: *** [Makefile:25899: stage2-bubble] Error 2 # 00:28:15 make: *** [Makefile:1010: all] Error 2 from (for last_good == 08f83812e5c5fdd9a7a4a1b9e46bb33725185c5a) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap_ubsan: 2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build top page/logs: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Configuration details: Reproduce builds: <cut> mkdir investigate-gcc-d1819df86fbe42125cccb2fc2959a0bf51e524d6 cd investigate-gcc-d1819df86fbe42125cccb2fc2959a0bf51e524d6 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach d1819df86fbe42125cccb2fc2959a0bf51e524d6 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 08f83812e5c5fdd9a7a4a1b9e46bb33725185c5a ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Build log: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Full commit (up to 1000 lines): <cut> commit d1819df86fbe42125cccb2fc2959a0bf51e524d6 Author: Jonathan Wright <jonathan.wright(a)arm.com> Date: Mon Aug 16 14:37:18 2021 +0100 aarch64: Remove macros for vld4[q]_lane Neon intrinsics Remove macros for vld4[q]_lane Neon intrinsics. This is a preparatory step before adding new modes for structures of Advanced SIMD vectors. gcc/ChangeLog: 2021-08-16 Jonathan Wright <jonathan.wright(a)arm.com> * config/aarch64/arm_neon.h (__LD4_LANE_FUNC): Delete. (__LD4Q_LANE_FUNC): Likewise. (vld4_lane_u8): Define without macro. (vld4_lane_u16): Likewise. (vld4_lane_u32): Likewise. (vld4_lane_u64): Likewise. (vld4_lane_s8): Likewise. (vld4_lane_s16): Likewise. (vld4_lane_s32): Likewise. (vld4_lane_s64): Likewise. (vld4_lane_f16): Likewise. (vld4_lane_f32): Likewise. (vld4_lane_f64): Likewise. (vld4_lane_p8): Likewise. (vld4_lane_p16): Likewise. (vld4_lane_p64): Likewise. (vld4q_lane_u8): Likewise. (vld4q_lane_u16): Likewise. (vld4q_lane_u32): Likewise. (vld4q_lane_u64): Likewise. (vld4q_lane_s8): Likewise. (vld4q_lane_s16): Likewise. (vld4q_lane_s32): Likewise. (vld4q_lane_s64): Likewise. (vld4q_lane_f16): Likewise. (vld4q_lane_f32): Likewise. (vld4q_lane_f64): Likewise. (vld4q_lane_p8): Likewise. (vld4q_lane_p16): Likewise. (vld4q_lane_p64): Likewise. (vld4_lane_bf16): Likewise. (vld4q_lane_bf16): Likewise. --- gcc/config/aarch64/arm_neon.h | 728 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 624 insertions(+), 104 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 29b62988a91..d8b29706a20 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -20856,110 +20856,595 @@ vld3q_lane_p64 (const poly64_t * __ptr, poly64x2x3_t __b, const int __c) /* vld4_lane */ -#define __LD4_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \ - qmode, ptrmode, funcsuffix, signedtype) \ -__extension__ extern __inline intype \ -__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \ -vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ -{ \ - __builtin_aarch64_simd_xi __o; \ - largetype __temp; \ - __temp.val[0] = \ - vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \ - __temp.val[1] = \ - vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \ - __temp.val[2] = \ - vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \ - __temp.val[3] = \ - vcombine_##funcsuffix (__b.val[3], vcreate_##funcsuffix (0)); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[0], \ - 0); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[1], \ - 1); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[2], \ - 2); \ - __o = __builtin_aarch64_set_qregxi##qmode (__o, \ - (signedtype) __temp.val[3], \ - 3); \ - __o = __builtin_aarch64_ld4_lane##mode ( \ - (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ - __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \ - __b.val[1] = (vectype) __builtin_aarch64_get_dregxidi (__o, 1); \ - __b.val[2] = (vectype) __builtin_aarch64_get_dregxidi (__o, 2); \ - __b.val[3] = (vectype) __builtin_aarch64_get_dregxidi (__o, 3); \ - return __b; \ +__extension__ extern __inline uint8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u8 (const uint8_t * __ptr, uint8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint8x16x4_t __temp; + __temp.val[0] = vcombine_u8 (__b.val[0], vcreate_u8 (0)); + __temp.val[1] = vcombine_u8 (__b.val[1], vcreate_u8 (0)); + __temp.val[2] = vcombine_u8 (__b.val[2], vcreate_u8 (0)); + __temp.val[3] = vcombine_u8 (__b.val[3], vcreate_u8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; } -/* vld4q_lane */ +__extension__ extern __inline uint16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u16 (const uint16_t * __ptr, uint16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint16x8x4_t __temp; + __temp.val[0] = vcombine_u16 (__b.val[0], vcreate_u16 (0)); + __temp.val[1] = vcombine_u16 (__b.val[1], vcreate_u16 (0)); + __temp.val[2] = vcombine_u16 (__b.val[2], vcreate_u16 (0)); + __temp.val[3] = vcombine_u16 (__b.val[3], vcreate_u16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline uint32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u32 (const uint32_t * __ptr, uint32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint32x4x4_t __temp; + __temp.val[0] = vcombine_u32 (__b.val[0], vcreate_u32 (0)); + __temp.val[1] = vcombine_u32 (__b.val[1], vcreate_u32 (0)); + __temp.val[2] = vcombine_u32 (__b.val[2], vcreate_u32 (0)); + __temp.val[3] = vcombine_u32 (__b.val[3], vcreate_u32 (0)); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + __b.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline uint64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_u64 (const uint64_t * __ptr, uint64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint64x2x4_t __temp; + __temp.val[0] = vcombine_u64 (__b.val[0], vcreate_u64 (0)); + __temp.val[1] = vcombine_u64 (__b.val[1], vcreate_u64 (0)); + __temp.val[2] = vcombine_u64 (__b.val[2], vcreate_u64 (0)); + __temp.val[3] = vcombine_u64 (__b.val[3], vcreate_u64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s8 (const int8_t * __ptr, int8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int8x16x4_t __temp; + __temp.val[0] = vcombine_s8 (__b.val[0], vcreate_s8 (0)); + __temp.val[1] = vcombine_s8 (__b.val[1], vcreate_s8 (0)); + __temp.val[2] = vcombine_s8 (__b.val[2], vcreate_s8 (0)); + __temp.val[3] = vcombine_s8 (__b.val[3], vcreate_s8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s16 (const int16_t * __ptr, int16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int16x8x4_t __temp; + __temp.val[0] = vcombine_s16 (__b.val[0], vcreate_s16 (0)); + __temp.val[1] = vcombine_s16 (__b.val[1], vcreate_s16 (0)); + __temp.val[2] = vcombine_s16 (__b.val[2], vcreate_s16 (0)); + __temp.val[3] = vcombine_s16 (__b.val[3], vcreate_s16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s32 (const int32_t * __ptr, int32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int32x4x4_t __temp; + __temp.val[0] = vcombine_s32 (__b.val[0], vcreate_s32 (0)); + __temp.val[1] = vcombine_s32 (__b.val[1], vcreate_s32 (0)); + __temp.val[2] = vcombine_s32 (__b.val[2], vcreate_s32 (0)); + __temp.val[3] = vcombine_s32 (__b.val[3], vcreate_s32 (0)); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + __b.val[0] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline int64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_s64 (const int64_t * __ptr, int64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int64x2x4_t __temp; + __temp.val[0] = vcombine_s64 (__b.val[0], vcreate_s64 (0)); + __temp.val[1] = vcombine_s64 (__b.val[1], vcreate_s64 (0)); + __temp.val[2] = vcombine_s64 (__b.val[2], vcreate_s64 (0)); + __temp.val[3] = vcombine_s64 (__b.val[3], vcreate_s64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f16 (const float16_t * __ptr, float16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float16x8x4_t __temp; + __temp.val[0] = vcombine_f16 (__b.val[0], vcreate_f16 (0)); + __temp.val[1] = vcombine_f16 (__b.val[1], vcreate_f16 (0)); + __temp.val[2] = vcombine_f16 (__b.val[2], vcreate_f16 (0)); + __temp.val[3] = vcombine_f16 (__b.val[3], vcreate_f16 (0)); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hf ( + (__builtin_aarch64_simd_hf *) __ptr, __o, __c); + __b.val[0] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float32x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f32 (const float32_t * __ptr, float32x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float32x4x4_t __temp; + __temp.val[0] = vcombine_f32 (__b.val[0], vcreate_f32 (0)); + __temp.val[1] = vcombine_f32 (__b.val[1], vcreate_f32 (0)); + __temp.val[2] = vcombine_f32 (__b.val[2], vcreate_f32 (0)); + __temp.val[3] = vcombine_f32 (__b.val[3], vcreate_f32 (0)); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_sf *) __ptr, __o, __c); + __b.val[0] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline float64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_f64 (const float64_t * __ptr, float64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float64x2x4_t __temp; + __temp.val[0] = vcombine_f64 (__b.val[0], vcreate_f64 (0)); + __temp.val[1] = vcombine_f64 (__b.val[1], vcreate_f64 (0)); + __temp.val[2] = vcombine_f64 (__b.val[2], vcreate_f64 (0)); + __temp.val[3] = vcombine_f64 (__b.val[3], vcreate_f64 (0)); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedf ( + (__builtin_aarch64_simd_df *) __ptr, __o, __c); + __b.val[0] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline poly8x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p8 (const poly8_t * __ptr, poly8x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly8x16x4_t __temp; + __temp.val[0] = vcombine_p8 (__b.val[0], vcreate_p8 (0)); + __temp.val[1] = vcombine_p8 (__b.val[1], vcreate_p8 (0)); + __temp.val[2] = vcombine_p8 (__b.val[2], vcreate_p8 (0)); + __temp.val[3] = vcombine_p8 (__b.val[3], vcreate_p8 (0)); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + __b.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} -__LD4_LANE_FUNC (float16x4x4_t, float16x4_t, float16x8x4_t, float16_t, v4hf, - v8hf, hf, f16, float16x8_t) -__LD4_LANE_FUNC (float32x2x4_t, float32x2_t, float32x4x4_t, float32_t, v2sf, v4sf, - sf, f32, float32x4_t) -__LD4_LANE_FUNC (float64x1x4_t, float64x1_t, float64x2x4_t, float64_t, df, v2df, - df, f64, float64x2_t) -__LD4_LANE_FUNC (poly8x8x4_t, poly8x8_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8, - int8x16_t) -__LD4_LANE_FUNC (poly16x4x4_t, poly16x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi, - p16, int16x8_t) -__LD4_LANE_FUNC (poly64x1x4_t, poly64x1_t, poly64x2x4_t, poly64_t, di, - v2di_ssps, di, p64, poly64x2_t) -__LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8, - int8x16_t) -__LD4_LANE_FUNC (int16x4x4_t, int16x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16, - int16x8_t) -__LD4_LANE_FUNC (int32x2x4_t, int32x2_t, int32x4x4_t, int32_t, v2si, v4si, si, s32, - int32x4_t) -__LD4_LANE_FUNC (int64x1x4_t, int64x1_t, int64x2x4_t, int64_t, di, v2di, di, s64, - int64x2_t) -__LD4_LANE_FUNC (uint8x8x4_t, uint8x8_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8, - int8x16_t) -__LD4_LANE_FUNC (uint16x4x4_t, uint16x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi, - u16, int16x8_t) -__LD4_LANE_FUNC (uint32x2x4_t, uint32x2_t, uint32x4x4_t, uint32_t, v2si, v4si, si, - u32, int32x4_t) -__LD4_LANE_FUNC (uint64x1x4_t, uint64x1_t, uint64x2x4_t, uint64_t, di, v2di, di, - u64, int64x2_t) +__extension__ extern __inline poly16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p16 (const poly16_t * __ptr, poly16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly16x8x4_t __temp; + __temp.val[0] = vcombine_p16 (__b.val[0], vcreate_p16 (0)); + __temp.val[1] = vcombine_p16 (__b.val[1], vcreate_p16 (0)); + __temp.val[2] = vcombine_p16 (__b.val[2], vcreate_p16 (0)); + __temp.val[3] = vcombine_p16 (__b.val[3], vcreate_p16 (0)); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + __b.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline poly64x1x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_p64 (const poly64_t * __ptr, poly64x1x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly64x2x4_t __temp; + __temp.val[0] = vcombine_p64 (__b.val[0], vcreate_p64 (0)); + __temp.val[1] = vcombine_p64 (__b.val[1], vcreate_p64 (0)); + __temp.val[2] = vcombine_p64 (__b.val[2], vcreate_p64 (0)); + __temp.val[3] = vcombine_p64 (__b.val[3], vcreate_p64 (0)); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + __b.val[0] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} /* vld4q_lane */ -#define __LD4Q_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \ -__extension__ extern __inline intype \ -__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \ -vld4q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ -{ \ - __builtin_aarch64_simd_xi __o; \ - intype ret; \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); \ - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); \ - __o = __builtin_aarch64_ld4_lane##mode ( \ - (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \ - ret.val[0] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 0); \ - ret.val[1] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 1); \ - ret.val[2] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 2); \ - ret.val[3] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 3); \ - return ret; \ -} - -__LD4Q_LANE_FUNC (float16x8x4_t, float16x8_t, float16_t, v8hf, hf, f16) -__LD4Q_LANE_FUNC (float32x4x4_t, float32x4_t, float32_t, v4sf, sf, f32) -__LD4Q_LANE_FUNC (float64x2x4_t, float64x2_t, float64_t, v2df, df, f64) -__LD4Q_LANE_FUNC (poly8x16x4_t, poly8x16_t, poly8_t, v16qi, qi, p8) -__LD4Q_LANE_FUNC (poly16x8x4_t, poly16x8_t, poly16_t, v8hi, hi, p16) -__LD4Q_LANE_FUNC (poly64x2x4_t, poly64x2_t, poly64_t, v2di, di, p64) -__LD4Q_LANE_FUNC (int8x16x4_t, int8x16_t, int8_t, v16qi, qi, s8) -__LD4Q_LANE_FUNC (int16x8x4_t, int16x8_t, int16_t, v8hi, hi, s16) -__LD4Q_LANE_FUNC (int32x4x4_t, int32x4_t, int32_t, v4si, si, s32) -__LD4Q_LANE_FUNC (int64x2x4_t, int64x2_t, int64_t, v2di, di, s64) -__LD4Q_LANE_FUNC (uint8x16x4_t, uint8x16_t, uint8_t, v16qi, qi, u8) -__LD4Q_LANE_FUNC (uint16x8x4_t, uint16x8_t, uint16_t, v8hi, hi, u16) -__LD4Q_LANE_FUNC (uint32x4x4_t, uint32x4_t, uint32_t, v4si, si, u32) -__LD4Q_LANE_FUNC (uint64x2x4_t, uint64x2_t, uint64_t, v2di, di, u64) +__extension__ extern __inline uint8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u8 (const uint8_t * __ptr, uint8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u16 (const uint16_t * __ptr, uint16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u32 (const uint32_t * __ptr, uint32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline uint64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_u64 (const uint64_t * __ptr, uint64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + uint64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s8 (const int8_t * __ptr, int8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s16 (const int16_t * __ptr, int16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s32 (const int32_t * __ptr, int32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __o, __c); + ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline int64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_s64 (const int64_t * __ptr, int64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + int64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f16 (const float16_t * __ptr, float16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hf ( + (__builtin_aarch64_simd_hf *) __ptr, __o, __c); + ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float32x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f32 (const float32_t * __ptr, float32x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float32x4x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4sf ( + (__builtin_aarch64_simd_sf *) __ptr, __o, __c); + ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline float64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_f64 (const float64_t * __ptr, float64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + float64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2df ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly8x16x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p8 (const poly8_t * __ptr, poly8x16x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly8x16x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __o, __c); + ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p16 (const poly16_t * __ptr, poly16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __o, __c); + ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} + +__extension__ extern __inline poly64x2x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_p64 (const poly64_t * __ptr, poly64x2x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + poly64x2x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __o, __c); + ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} /* vmax */ @@ -35441,9 +35926,47 @@ vld3q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x3_t __b, const int __c) return ret; } -__LD4_LANE_FUNC (bfloat16x4x4_t, bfloat16x4_t, bfloat16x8x4_t, bfloat16_t, v4bf, - v8bf, bf, bf16, bfloat16x8_t) -__LD4Q_LANE_FUNC (bfloat16x8x4_t, bfloat16x8_t, bfloat16_t, v8bf, bf, bf16) +__extension__ extern __inline bfloat16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4_lane_bf16 (const bfloat16_t * __ptr, bfloat16x4x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + bfloat16x8x4_t __temp; + __temp.val[0] = vcombine_bf16 (__b.val[0], vcreate_bf16 (0)); + __temp.val[1] = vcombine_bf16 (__b.val[1], vcreate_bf16 (0)); + __temp.val[2] = vcombine_bf16 (__b.val[2], vcreate_bf16 (0)); + __temp.val[3] = vcombine_bf16 (__b.val[3], vcreate_bf16 (0)); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[0], 0); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[1], 1); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[2], 2); + __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[3], 3); + __o = __builtin_aarch64_ld4_lanev4bf ( + (__builtin_aarch64_simd_bf *) __ptr, __o, __c); + __b.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); + __b.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); + __b.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); + __b.val[3] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); + return __b; +} + +__extension__ extern __inline bfloat16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) +vld4q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x4_t __b, const int __c) +{ + __builtin_aarch64_simd_xi __o; + bfloat16x8x4_t ret; + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); + __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); + __o = __builtin_aarch64_ld4_lanev8bf ( + (__builtin_aarch64_simd_bf *) __ptr, __o, __c); + ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); + ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); + ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); + ret.val[3] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); + return ret; +} __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) @@ -35739,7 +36262,4 @@ vaddq_p128 (poly128_t __a, poly128_t __b) #undef __aarch64_vdupq_laneq_u32 #undef __aarch64_vdupq_laneq_u64 -#undef __LD4_LANE_FUNC -#undef __LD4Q_LANE_FUNC - #endif </cut>

3 years, 10 months

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-release-arm-next-allmodconfig - Build # 31 - Successful!

by ci_notify＠linaro.org

Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-arm-next-allmodconfig. So far, this commit has regressed CI configurations: - tcwg_kernel/llvm-release-arm-next-allmodconfig Culprit: <cut> commit fad7cd3310db3099f95dd34312c77740fbc455e5 Author: Baokun Li <libaokun1(a)huawei.com> Date: Wed Aug 4 10:12:12 2021 +0800 nbd: add the check to prevent overflow in __nbd_ioctl() If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe(a)kernel.dk> </cut> Results regressed to (for first_bad == fad7cd3310db3099f95dd34312c77740fbc455e5) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 21709 # First few build errors in logs: # 00:07:12 make[1]: *** [modules-only.symvers] Error 1 # 00:07:12 make: *** [modules] Error 2 from (for last_good == da20b58d5bbbb0d23ae9530992a37d0f0d1787a4) # reset_artifacts: -10 # build_abe binutils: -9 # build_llvm: -5 # build_abe qemu: -2 # linux_n_obj: 29751 # linux build successful: all Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Configuration details: rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#ecf9343…" Reproduce builds: <cut> mkdir investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5 cd investigate-linux-fad7cd3310db3099f95dd34312c77740fbc455e5 git clone https://git.linaro.org/toolchain/jenkins-scripts mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/ cd linux # Reproduce first_bad build git checkout --detach fad7cd3310db3099f95dd34312c77740fbc455e5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach da20b58d5bbbb0d23ae9530992a37d0f0d1787a4 ../artifacts/test.sh cd .. </cut> History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/… Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-release-arm-next-all… Full commit (up to 1000 lines): <cut> commit fad7cd3310db3099f95dd34312c77740fbc455e5 Author: Baokun Li <libaokun1(a)huawei.com> Date: Wed Aug 4 10:12:12 2021 +0800 nbd: add the check to prevent overflow in __nbd_ioctl() If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe(a)kernel.dk> --- drivers/block/nbd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c38317979f74..f82264835794 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1384,6 +1384,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) { struct nbd_config *config = nbd->config; + loff_t bytesize; switch (cmd) { case NBD_DISCONNECT: @@ -1398,8 +1399,9 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, case NBD_SET_SIZE: return nbd_set_size(nbd, arg, config->blksize); case NBD_SET_SIZE_BLOCKS: - return nbd_set_size(nbd, arg * config->blksize, - config->blksize); + if (check_mul_overflow((loff_t)arg, config->blksize, &bytesize)) + return -EINVAL; + return nbd_set_size(nbd, bytesize, config->blksize); case NBD_SET_TIMEOUT: nbd_set_cmd_timeout(nbd, arg); return 0; </cut>

3 years, 10 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain August 2021