[TCWG CI] 482.sphinx3 slowed down by 6% after llvm: Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT(FMA is enabled). - linaro-toolchain

12 Nov 2021

After llvm commit f04e387055e495e3e14570087d68e93593cf2918
Author: Zahira Ammarguellat zahira.ammarguellat@intel.com
Making the code compliant to the documentation about Floating Point
    support default values for C/C++. FPP-MODEL=PRECISE enables
    FFP-CONTRACT(FMA is enabled).
the following benchmarks slowed down by more than 2%:
- 482.sphinx3 slowed down by 6% from 25875 to 27484 perf samples
  - 482.sphinx3:[.] mgau_eval slowed down by 12% from 9996 to 11165 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Reproduce builds:
<cut>
mkdir investigate-llvm-f04e387055e495e3e14570087d68e93593cf2918
cd investigate-llvm-f04e387055e495e3e14570087d68e93593cf2918
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f04e387055e495e3e14570087d68e93593cf2918
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 491beae71d69960a3bb0298b17d4ef1f3119b767
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f04e387055e495e3e14570087d68e93593cf2918
Author: Zahira Ammarguellat zahira.ammarguellat@intel.com
Date:   Tue Nov 9 09:35:25 2021 -0500
Making the code compliant to the documentation about Floating Point
    support default values for C/C++. FPP-MODEL=PRECISE enables
    FFP-CONTRACT(FMA is enabled).
Fix for https://bugs.llvm.org/show_bug.cgi?id=50222
---
 clang/docs/ReleaseNotes.rst              |  10 +++
 clang/docs/UsersManual.rst               |  47 +++++++++++-
 clang/lib/Driver/ToolChains/Clang.cpp    |  48 +++++++-----
 clang/test/CodeGen/ffp-contract-option.c | 127 +++++++++++++++++++++++++++++--
 clang/test/CodeGen/ffp-model.c           |  48 ++++++++++++
 clang/test/CodeGen/ppc-emmintrin.c       |   4 +-
 clang/test/CodeGen/ppc-xmmintrin.c       |   4 +-
 clang/test/Driver/fp-model.c             |   2 +-
 clang/test/Misc/ffp-contract.c           |  10 +++
 9 files changed, 263 insertions(+), 37 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 57c5150becae..00582b689862 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -202,6 +202,16 @@ Arm and AArch64 Support in Clang
   architecture features, but will enable certain optimizations specific to
   Cortex-A57 CPUs and enable the use of a more accurate scheduling model.
+
+Floating Point Support in Clang
+-------------------------------
+- The -ffp-model=precise now implies -ffp-contract=on rather than
+  -ffp-contract=fast, and the documentation of these features has been
+  clarified. Previously, the documentation claimed that -ffp-model=precise was
+  the default, but this was incorrect because the precise model implied
+  -ffp-contract=fast, whereas the default behavior is -ffp-contract=on.
+  -ffp-model=precise is now exactly the default mode of the compiler.
+
 Internal API Changes
 --------------------
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 8c6922db6b37..406efb093d55 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -1260,8 +1260,49 @@ installed.
 Controlling Floating Point Behavior
 -----------------------------------
-Clang provides a number of ways to control floating point behavior. The options
-are listed below.
+Clang provides a number of ways to control floating point behavior, including
+with command line options and source pragmas. This section
+describes the various floating point semantic modes and the corresponding options.
+
+.. csv-table:: Floating Point Semantic Modes
+  :header: "Mode", "Values"
+  :widths: 15, 30, 30
+
+  "ffp-exception-behavior", "{ignore, strict, may_trap}",
+  "fenv_access", "{off, on}", "(none)"
+  "frounding-math", "{dynamic, tonearest, downward, upward, towardzero}"
+  "ffp-contract", "{on, off, fast, fast-honor-pragmas}"
+  "fdenormal-fp-math", "{IEEE, PreserveSign, PositiveZero}"
+  "fdenormal-fp-math-fp32", "{IEEE, PreserveSign, PositiveZero}"
+  "fmath-errno", "{on, off}"
+  "fhonor-nans", "{on, off}"
+  "fhonor-infinities", "{on, off}"
+  "fsigned-zeros", "{on, off}"
+  "freciprocal-math", "{on, off}"
+  "allow_approximate_fns", "{on, off}"
+  "fassociative-math", "{on, off}"
+
+This table describes the option settings that correspond to the three
+floating point semantic models: precise (the default), strict, and fast.
+
+
+.. csv-table:: Floating Point Models
+  :header: "Mode", "Precise", "Strict", "Fast"
+  :widths: 25, 15, 15, 15
+
+  "except_behavior", "ignore", "strict", "ignore"
+  "fenv_access", "off", "on", "off"
+  "rounding_mode", "tonearest", "dynamic", "tonearest"
+  "contract", "on", "off", "fast"
+  "denormal_fp_math", "IEEE", "IEEE", "PreserveSign"
+  "denormal_fp32_math", "IEEE","IEEE", "PreserveSign"
+  "support_math_errno", "on", "on", "off"
+  "no_honor_nans", "off", "off", "on"
+  "no_honor_infinities", "off", "off", "on"
+  "no_signed_zeros", "off", "off", "on"
+  "allow_reciprocal", "off", "off", "on"
+  "allow_approximate_fns", "off", "off", "on"
+  "allow_reassociation", "off", "off", "on"
.. option:: -ffast-math
@@ -1467,7 +1508,7 @@ Note that floating-point operations performed as part of constant initialization
    and ``fast``.
    Details:
-   * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=fast``).  This is the default behavior.
+   * ``precise`` Disables optimizations that are not value-safe on floating-point data, although FP contraction (FMA) is enabled (``-ffp-contract=on``).  This is the default behavior.
    * ``strict`` Enables ``-frounding-math`` and ``-ffp-exception-behavior=strict``, and disables contractions (FMA).  All of the ``-ffast-math`` enablements are disabled. Enables ``STDC FENV_ACCESS``: by default ``FENV_ACCESS`` is disabled. This option setting behaves as though ``#pragma STDC FENV_ACESS ON`` appeared at the top of the source file.
    * ``fast`` Behaves identically to specifying both ``-ffast-math`` and ``ffp-contract=fast``
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index e8ad105a7829..5d6f8e9fba0e 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -2666,10 +2666,14 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath;
   llvm::DenormalMode DenormalFP32Math = DefaultDenormalFP32Math;
-  StringRef FPContract = "";
+  // CUDA and HIP don't rely on the frontend to pass an ffp-contract option.
+  // If one wasn't given by the user, don't pass it here.
+  StringRef FPContract;
+  if (!JA.isDeviceOffloading(Action::OFK_Cuda) &&
+      !JA.isOffloading(Action::OFK_HIP))
+    FPContract = "on";
   bool StrictFPModel = false;
-
   if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {
     CmdArgs.push_back("-mlimit-float-precision");
     CmdArgs.push_back(A->getValue());
@@ -2691,7 +2695,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       ReciprocalMath = false;
       SignedZeros = true;
       // -fno_fast_math restores default denormal and fpcontract handling
-      FPContract = "";
+      FPContract = "on";
       DenormalFPMath = llvm::DenormalMode::getIEEE();
// FIXME: The target may have picked a non-IEEE default mode here based on
@@ -2711,12 +2715,10 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       // ffp-model= is a Driver option, it is entirely rewritten into more
       // granular options before being passed into cc1.
       // Use the gcc option in the switch below.
-      if (!FPModel.empty() && !FPModel.equals(Val)) {
+      if (!FPModel.empty() && !FPModel.equals(Val))
         D.Diag(clang::diag::warn_drv_overriding_flag_option)
-          << Args.MakeArgString("-ffp-model=" + FPModel)
-          << Args.MakeArgString("-ffp-model=" + Val);
-        FPContract = "";
-      }
+            << Args.MakeArgString("-ffp-model=" + FPModel)
+            << Args.MakeArgString("-ffp-model=" + Val);
       if (Val.equals("fast")) {
         optID = options::OPT_ffast_math;
         FPModel = Val;
@@ -2724,7 +2726,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       } else if (Val.equals("precise")) {
         optID = options::OPT_ffp_contract;
         FPModel = Val;
-        FPContract = "fast";
+        FPContract = "on";
         PreciseFPModel = true;
       } else if (Val.equals("strict")) {
         StrictFPModel = true;
@@ -2812,9 +2814,9 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
     case options::OPT_ffp_contract: {
       StringRef Val = A->getValue();
       if (PreciseFPModel) {
-        // -ffp-model=precise enables ffp-contract=fast as a side effect
-        // the FPContract value has already been set to a string literal
-        // and the Val string isn't a pertinent value.
+        // -ffp-model=precise enables ffp-contract=on.
+        // -ffp-model=precise sets PreciseFPModel to on and Val to
+        // "precise". FPContract is set.
         ;
       } else if (Val.equals("fast") || Val.equals("on") || Val.equals("off"))
         FPContract = Val;
@@ -2907,23 +2909,27 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       AssociativeMath = false;
       ReciprocalMath = false;
       SignedZeros = true;
-      TrappingMath = false;
-      RoundingFPMath = false;
       // -fno_fast_math restores default denormal and fpcontract handling
       DenormalFPMath = DefaultDenormalFPMath;
       DenormalFP32Math = llvm::DenormalMode::getIEEE();
-      FPContract = "";
+      if (!JA.isDeviceOffloading(Action::OFK_Cuda) &&
+          !JA.isOffloading(Action::OFK_HIP))
+        if (FPContract == "fast") {
+          FPContract = "on";
+          D.Diag(clang::diag::warn_drv_overriding_flag_option)
+              << "-ffp-contract=fast"
+              << "-ffp-contract=on";
+        }
       break;
     }
     if (StrictFPModel) {
       // If -ffp-model=strict has been specified on command line but
       // subsequent options conflict then emit warning diagnostic.
-      if (HonorINFs && HonorNaNs &&
-        !AssociativeMath && !ReciprocalMath &&
-        SignedZeros && TrappingMath && RoundingFPMath &&
-        (FPContract.equals("off") || FPContract.empty()) &&
-        DenormalFPMath == llvm::DenormalMode::getIEEE() &&
-        DenormalFP32Math == llvm::DenormalMode::getIEEE())
+      if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath &&
+          SignedZeros && TrappingMath && RoundingFPMath &&
+          DenormalFPMath == llvm::DenormalMode::getIEEE() &&
+          DenormalFP32Math == llvm::DenormalMode::getIEEE() &&
+          FPContract.equals("off"))
         // OK: Current Arg doesn't conflict with -ffp-model=strict
         ;
       else {
diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c
index 52b750795940..857a9c2369ef 100644
--- a/clang/test/CodeGen/ffp-contract-option.c
+++ b/clang/test/CodeGen/ffp-contract-option.c
@@ -1,9 +1,120 @@
-// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin -S -o - %s | FileCheck %s
-// REQUIRES: aarch64-registered-target
-
-float fma_test1(float a, float b, float c) {
-// CHECK: fmadd
-  float x = a * b;
-  float y = x + c;
-  return y;
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \
+// RUN:| FileCheck --check-prefixes CHECK,CHECK-DEFAULT  %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off %s -emit-llvm -o - \
+// RUN:| FileCheck --check-prefixes CHECK,CHECK-DEFAULT  %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=on %s -emit-llvm -o - \
+// RUN:| FileCheck --check-prefixes CHECK,CHECK-ON  %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast %s -emit-llvm -o - \
+// RUN:| FileCheck --check-prefixes CHECK,CHECK-CONTRACTFAST  %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffast-math %s -emit-llvm -o - \
+// RUN:| FileCheck --check-prefixes CHECK,CHECK-CONTRACTOFF %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=off %s -emit-llvm \
+// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-CONTRACTOFF %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=on %s -emit-llvm \
+// RUN: -o - | FileCheck --check-prefixes CHECK,CHECK-ONFAST %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffast-math -ffp-contract=fast %s -emit-llvm \
+// RUN:  -o - | FileCheck --check-prefixes CHECK,CHECK-FASTFAST %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=fast -ffast-math  %s \
+// RUN: -emit-llvm \
+// RUN:  -o - | FileCheck --check-prefixes CHECK,CHECK-FASTFAST %s
+
+// RUN: %clang_cc1 -triple=x86_64 -ffp-contract=off -fmath-errno \
+// RUN: -fno-rounding-math %s -emit-llvm -o - \
+// RUN:  -o - | FileCheck --check-prefixes CHECK,CHECK-NOFAST %s
+
+// RUN: %clang -S -emit-llvm -fno-fast-math %s -o - \
+// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+// RUN: %clang -S -emit-llvm -ffp-contract=fast -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+// RUN: %clang -S -emit-llvm -ffp-contract=on -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+// RUN: %clang -S -emit-llvm -ffp-contract=off -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-OFF
+
+// RUN: %clang -S -emit-llvm -ffp-model=fast -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+// RUN: %clang -S -emit-llvm -ffp-model=precise -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+// RUN: %clang -S -emit-llvm -ffp-model=strict -fno-fast-math \
+// RUN: -target x86_64 %s -o - | FileCheck %s \
+// RUN: --check-prefixes=CHECK,CHECK-FPSC-OFF
+
+// RUN: %clang -S -emit-llvm -ffast-math -fno-fast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
+
+float mymuladd(float x, float y, float z) {
+  // CHECK: define{{.*}} float @mymuladd
+  return x * y + z;
+  // expected-warning{{overriding '-ffp-contract=fast' option with '-ffp-contract=on'}}
+
+  // CHECK-DEFAULT: load float, float*
+  // CHECK-DEFAULT: fmul float
+  // CHECK-DEFAULT: load float, float*
+  // CHECK-DEFAULT: fadd float
+
+  // CHECK-ON: load float, float*
+  // CHECK-ON: load float, float*
+  // CHECK-ON: load float, float*
+  // CHECK-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+
+  // CHECK-CONTRACTFAST: load float, float*
+  // CHECK-CONTRACTFAST: load float, float*
+  // CHECK-CONTRACTFAST: fmul contract float
+  // CHECK-CONTRACTFAST: load float, float*
+  // CHECK-CONTRACTFAST: fadd contract float
+
+  // CHECK-CONTRACTOFF: load float, float*
+  // CHECK-CONTRACTOFF: load float, float*
+  // CHECK-CONTRACTOFF: fmul reassoc nnan ninf nsz arcp afn float
+  // CHECK-CONTRACTOFF: load float, float*
+  // CHECK-CONTRACTOFF: fadd reassoc nnan ninf nsz arcp afn float {{.*}}, {{.*}}
+
+  // CHECK-ONFAST: load float, float*
+  // CHECK-ONFAST: load float, float*
+  // CHECK-ONFAST: load float, float*
+  // CHECK-ONFAST: call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+
+  // CHECK-FASTFAST: load float, float*
+  // CHECK-FASTFAST: load float, float*
+  // CHECK-FASTFAST: fmul fast float
+  // CHECK-FASTFAST: load float, float*
+  // CHECK-FASTFAST: fadd fast float {{.*}}, {{.*}}
+
+  // CHECK-NOFAST: load float, float*
+  // CHECK-NOFAST: load float, float*
+  // CHECK-NOFAST: fmul float {{.*}}, {{.*}}
+  // CHECK-NOFAST: load float, float*
+  // CHECK-NOFAST: fadd float {{.*}}, {{.*}}
+
+  // CHECK-FPC-ON: load float, float*
+  // CHECK-FPC-ON: load float, float*
+  // CHECK-FPC-ON: load float, float*
+  // CHECK-FPC-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+
+  // CHECK-FPC-OFF: load float, float*
+  // CHECK-FPC-OFF: load float, float*
+  // CHECK-FPC-OFF: fmul float
+  // CHECK-FPC-OFF: load float, float*
+  // CHECK-FPC-OFF: fadd float {{.*}}, {{.*}}
+
+  // CHECK-FFPC-OFF: load float, float*
+  // CHECK-FFPC-OFF: load float, float*
+  // CHECK-FPSC-OFF: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}})
+  // CHECK-FPSC-OFF: load float, float*
+  // CHECK-FPSC-OFF: [[RES:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}})
+
 }
diff --git a/clang/test/CodeGen/ffp-model.c b/clang/test/CodeGen/ffp-model.c
new file mode 100644
index 000000000000..21dbc8de99aa
--- /dev/null
+++ b/clang/test/CodeGen/ffp-model.c
@@ -0,0 +1,48 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang -S -emit-llvm -ffp-model=fast -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-FAST
+
+// RUN: %clang -S -emit-llvm -ffp-model=precise %s -o - \
+// RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-PRECISE
+
+// RUN: %clang -S -emit-llvm -ffp-model=strict %s -o - \
+// RUN: -target x86_64 | FileCheck %s --check-prefixes=CHECK,CHECK-STRICT
+
+// RUN: %clang -S -emit-llvm -ffp-model=strict -ffast-math \
+// RUN: -target x86_64 %s -o - | FileCheck %s \
+// RUN: --check-prefixes CHECK,CHECK-STRICT-FAST
+
+// RUN: %clang -S -emit-llvm -ffp-model=precise -ffast-math \
+// RUN: %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-FAST1
+
+float mymuladd(float x, float y, float z) {
+  // CHECK: define{{.*}} float @mymuladd
+  return x * y + z;
+
+  // CHECK-FAST: fmul fast float
+  // CHECK-FAST: load float, float*
+  // CHECK-FAST: fadd fast float
+
+  // CHECK-PRECISE: load float, float*
+  // CHECK-PRECISE: load float, float*
+  // CHECK-PRECISE: load float, float*
+  // CHECK-PRECISE: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+
+  // CHECK-STRICT: load float, float*
+  // CHECK-STRICT: load float, float*
+  // CHECK-STRICT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}})
+  // CHECK-STRICT: load float, float*
+  // CHECK-STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}})
+
+  // CHECK-STRICT-FAST: load float, float*
+  // CHECK-STRICT-FAST: load float, float*
+  // CHECK-STRICT-FAST: call fast float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}})
+  // CHECK-STRICT-FAST: load float, float*
+  // CHECK-STRICT-FAST: call fast float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}}
+
+  // CHECK-FAST1: load float, float*
+  // CHECK-FAST1: load float, float*
+  // CHECK-FAST1: fmul fast float {{.*}}, {{.*}}
+  // CHECK-FAST1: load float, float* {{.*}}
+  // CHECK-FAST1: fadd fast float {{.*}}, {{.*}}
+}
diff --git a/clang/test/CodeGen/ppc-emmintrin.c b/clang/test/CodeGen/ppc-emmintrin.c
index fa3801f50a01..4a246ff92d76 100644
--- a/clang/test/CodeGen/ppc-emmintrin.c
+++ b/clang/test/CodeGen/ppc-emmintrin.c
@@ -2,9 +2,9 @@
 // REQUIRES: powerpc-registered-target
// RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
-// RUN:  -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE
+// RUN:  -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE
 // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
-// RUN:   -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE
+// RUN:   -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE
// CHECK-BE-DAG: @_mm_movemask_pd.perm_mask = internal constant <4 x i32> <i32 -2139062144, i32 -2139062144, i32 -2139062144, i32 -2139078656>, align 16
 // CHECK-BE-DAG: @_mm_shuffle_epi32.permute_selectors = internal constant [4 x i32] [i32 66051, i32 67438087, i32 134810123, i32 202182159], align 4
diff --git a/clang/test/CodeGen/ppc-xmmintrin.c b/clang/test/CodeGen/ppc-xmmintrin.c
index d3f18bfbb1e5..4aff7a7c3dda 100644
--- a/clang/test/CodeGen/ppc-xmmintrin.c
+++ b/clang/test/CodeGen/ppc-xmmintrin.c
@@ -2,11 +2,11 @@
 // REQUIRES: powerpc-registered-target
// RUN: %clang -S -emit-llvm -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
-// RUN:   -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE
+// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-BE
 // RUN: %clang -x c++ -fsyntax-only -target powerpc64-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
 // RUN:   -fno-discard-value-names -mllvm -disable-llvm-optzns
 // RUN: %clang -S -emit-llvm -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
-// RUN:   -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE
+// RUN: -ffp-contract=off -fno-discard-value-names -mllvm -disable-llvm-optzns -o - | llvm-cxxfilt -n | FileCheck %s --check-prefixes=CHECK,CHECK-LE
 // RUN: %clang -x c++ -fsyntax-only -target powerpc64le-unknown-linux-gnu -mcpu=pwr8 -ffreestanding -DNO_WARN_X86_INTRINSICS %s \
 // RUN:   -fno-discard-value-names -mllvm -disable-llvm-optzns
diff --git a/clang/test/Driver/fp-model.c b/clang/test/Driver/fp-model.c
index 5fa9d110dd83..0824b3e2c596 100644
--- a/clang/test/Driver/fp-model.c
+++ b/clang/test/Driver/fp-model.c
@@ -99,7 +99,7 @@
 // RUN: %clang -### -nostdinc -ffp-model=precise -c %s 2>&1 \
 // RUN:   | FileCheck --check-prefix=CHECK-FPM-PRECISE %s
 // CHECK-FPM-PRECISE: "-cc1"
-// CHECK-FPM-PRECISE: "-ffp-contract=fast"
+// CHECK-FPM-PRECISE: "-ffp-contract=on"
 // CHECK-FPM-PRECISE: "-fno-rounding-math"
// RUN: %clang -### -nostdinc -ffp-model=strict -c %s 2>&1 \
diff --git a/clang/test/Misc/ffp-contract.c b/clang/test/Misc/ffp-contract.c
new file mode 100644
index 000000000000..0d26905d4ef2
--- /dev/null
+++ b/clang/test/Misc/ffp-contract.c
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -O3 -ffp-contract=fast -triple=aarch64-apple-darwin \
+// RUN: -S -o - %s | FileCheck --check-prefix=CHECK-FMADD %s
+// REQUIRES: aarch64-registered-target
+
+float fma_test1(float a, float b, float c) {
+  // CHECK-FMADD: fmadd
+  float x = a * b;
+  float y = x + c;
+  return y;
+}
</cut>