Progress
* UM-2 [QEMU upstream maintainership]
+ Still looking at the mess that is non-unique bus names. Worked
through exactly which devices and machine types are affected for
the i2c bus.
+ Sent a patchset which tries to make the "create a bus" function
names a bit more regular across different bus types.
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Luis figured out why GDB was crashing when fed the MVE XML by
QEMU's gdbstub; this was a combination of QEMU giving GDB some
non-standard extra registers in its "vfp" XML feature and GDB
not being robust enough against those unexpected extras. Sent
out a patchset which cleans up QEMU's XML in this area and also
implements the extra XML for MVE. (This will only go into QEMU
once the GDB patches have landed and the XML format is nailed down.)
-- PMM
After gcc commit f92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther(a)suse.de>
tree-optimization/65206 - dependence analysis on mixed pointer/array
the following benchmarks slowed down by more than 2%:
- 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de
cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach f92901a508305f291fcf2acae0825379477724de
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther(a)suse.de>
Date: Wed Sep 8 14:42:31 2021 +0200
tree-optimization/65206 - dependence analysis on mixed pointer/array
This adds the capability to analyze the dependence of mixed
pointer/array accesses. The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based. Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.
The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based. The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference. That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.
There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there. Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).
2021-09-08 Richard Biener <rguenther(a)suse.de>
PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (free_data_ref): Release alt_indices.
(dr_analyze_indices): Work on struct indices and get DR_REF as tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail. When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.
* gcc.dg/torture/20210916.c: New testcase.
* gcc.dg/vect/pr65206.c: Likewise.
---
gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++
gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++
gcc/tree-data-ref.c | 174 +++++++++++++++++++++-----------
gcc/tree-data-ref.h | 9 +-
gcc/tree-vectorizer.c | 3 +-
5 files changed, 168 insertions(+), 60 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c
new file mode 100644
index 00000000000..0ea6d45e463
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20210916.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+typedef union tree_node *tree;
+struct tree_base {
+ unsigned : 1;
+ unsigned lang_flag_2 : 1;
+};
+struct tree_type {
+ tree main_variant;
+};
+union tree_node {
+ struct tree_base base;
+ struct tree_type type;
+};
+tree finish_struct_t, finish_struct_x;
+void finish_struct()
+{
+ for (; finish_struct_t->type.main_variant;)
+ finish_struct_x->base.lang_flag_2 = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 00000000000..3b6262622c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+ for (int i = 0; i < N; ++i)
+ if (b[i] < 3.)
+ a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+ the store would introduce store data races. Make sure we still
+ can handle the data dependence with zero distance. */
+
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e061baa7c20..18307a554fc 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see
#include "internal-fn.h"
#include "vr-values.h"
#include "range-op.h"
+#include "tree-ssa-loop-ivopts.h"
static struct datadep_stats
{
@@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
DR, analyzed in LOOP and instantiated before NEST. */
static void
-dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
+dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
{
- vec<tree> access_fns = vNULL;
- tree ref, op;
- tree base, off, access_fn;
-
/* If analyzing a basic-block there are no indices to analyze
and thus no access functions. */
if (!nest)
{
- DR_BASE_OBJECT (dr) = DR_REF (dr);
- DR_ACCESS_FNS (dr).create (0);
+ dri->base_object = ref;
+ dri->access_fns.create (0);
return;
}
- ref = DR_REF (dr);
+ vec<tree> access_fns = vNULL;
/* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
into a two element array with a constant index. The base is
@@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
{
if (TREE_CODE (ref) == ARRAY_REF)
{
- op = TREE_OPERAND (ref, 1);
- access_fn = analyze_scalar_evolution (loop, op);
+ tree op = TREE_OPERAND (ref, 1);
+ tree access_fn = analyze_scalar_evolution (loop, op);
access_fn = instantiate_scev (nest, loop, access_fn);
access_fns.safe_push (access_fn);
}
@@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
analyzed nest, add it as an additional independent access-function. */
if (TREE_CODE (ref) == MEM_REF)
{
- op = TREE_OPERAND (ref, 0);
- access_fn = analyze_scalar_evolution (loop, op);
+ tree op = TREE_OPERAND (ref, 0);
+ tree access_fn = analyze_scalar_evolution (loop, op);
access_fn = instantiate_scev (nest, loop, access_fn);
if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
{
- tree orig_type;
tree memoff = TREE_OPERAND (ref, 1);
- base = initial_condition (access_fn);
- orig_type = TREE_TYPE (base);
+ tree base = initial_condition (access_fn);
+ tree orig_type = TREE_TYPE (base);
STRIP_USELESS_TYPE_CONVERSION (base);
+ tree off;
split_constant_offset (base, &base, &off);
STRIP_USELESS_TYPE_CONVERSION (base);
/* Fold the MEM_REF offset into the evolutions initial
@@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
base, memoff);
MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
- DR_UNCONSTRAINED_BASE (dr) = true;
+ dri->unconstrained_base = true;
access_fns.safe_push (access_fn);
}
}
@@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
build_int_cst (reference_alias_ptr_type (ref), 0));
}
- DR_BASE_OBJECT (dr) = ref;
- DR_ACCESS_FNS (dr) = access_fns;
+ dri->base_object = ref;
+ dri->access_fns = access_fns;
}
/* Extracts the alias analysis information from the memory reference DR. */
@@ -1463,6 +1460,8 @@ void
free_data_ref (data_reference_p dr)
{
DR_ACCESS_FNS (dr).release ();
+ if (dr->alt_indices.base_object)
+ dr->alt_indices.access_fns.release ();
free (dr);
}
@@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
dr_analyze_innermost (&DR_INNERMOST (dr), memref,
nest != NULL ? loop : NULL, stmt);
- dr_analyze_indices (dr, nest, loop);
+ dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
dr_analyze_alias (dr);
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b)
TREE_TYPE (TREE_OPERAND (ref_b, 0)));
}
-/* Initialize a data dependence relation between data accesses A and
- B. NB_LOOPS is the number of loops surrounding the references: the
- size of the classic distance/direction vectors. */
+/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES
+ is true when the main indices of A and B were not comparable so we try again
+ with alternate indices computed on an indirect reference. */
struct data_dependence_relation *
-initialize_data_dependence_relation (struct data_reference *a,
- struct data_reference *b,
- vec<loop_p> loop_nest)
+initialize_data_dependence_relation (struct data_dependence_relation *res,
+ vec<loop_p> loop_nest,
+ bool use_alt_indices)
{
- struct data_dependence_relation *res;
+ struct data_reference *a = DDR_A (res);
+ struct data_reference *b = DDR_B (res);
unsigned int i;
- res = XCNEW (struct data_dependence_relation);
- DDR_A (res) = a;
- DDR_B (res) = b;
- DDR_LOOP_NEST (res).create (0);
- DDR_SUBSCRIPTS (res).create (0);
- DDR_DIR_VECTS (res).create (0);
- DDR_DIST_VECTS (res).create (0);
-
- if (a == NULL || b == NULL)
+ struct indices *indices_a = &a->indices;
+ struct indices *indices_b = &b->indices;
+ if (use_alt_indices)
{
- DDR_ARE_DEPENDENT (res) = chrec_dont_know;
- return res;
+ if (TREE_CODE (DR_REF (a)) != MEM_REF)
+ indices_a = &a->alt_indices;
+ if (TREE_CODE (DR_REF (b)) != MEM_REF)
+ indices_b = &b->alt_indices;
}
-
- /* If the data references do not alias, then they are independent. */
- if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
- {
- DDR_ARE_DEPENDENT (res) = chrec_known;
- return res;
- }
-
- unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
- unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
+ unsigned int num_dimensions_a = indices_a->access_fns.length ();
+ unsigned int num_dimensions_b = indices_b->access_fns.length ();
if (num_dimensions_a == 0 || num_dimensions_b == 0)
{
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
@@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a,
the a and b accesses have a single ARRAY_REF component reference [0]
but have two subscripts. */
- if (DR_UNCONSTRAINED_BASE (a))
+ if (indices_a->unconstrained_base)
num_dimensions_a -= 1;
- if (DR_UNCONSTRAINED_BASE (b))
+ if (indices_b->unconstrained_base)
num_dimensions_b -= 1;
/* These structures describe sequences of component references in
@@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a,
B: [3, 4] (i.e. s.e) */
while (index_a < num_dimensions_a && index_b < num_dimensions_b)
{
+ /* The alternate indices form always has a single dimension
+ with unconstrained base. */
+ gcc_assert (!use_alt_indices);
+
/* REF_A and REF_B must be one of the component access types
allowed by dr_analyze_indices. */
gcc_checking_assert (access_fn_component_p (ref_a));
@@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a,
/* See whether FULL_SEQ ends at the base and whether the two bases
are equal. We do not care about TBAA or alignment info so we can
use OEP_ADDRESS_OF to avoid false negatives. */
- tree base_a = DR_BASE_OBJECT (a);
- tree base_b = DR_BASE_OBJECT (b);
+ tree base_a = indices_a->base_object;
+ tree base_b = indices_b->base_object;
bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
&& full_seq.start_b + full_seq.length == num_dimensions_b
- && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
+ && (indices_a->unconstrained_base
+ == indices_b->unconstrained_base)
&& operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
&& (types_compatible_p (TREE_TYPE (base_a),
TREE_TYPE (base_b))
@@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a,
both lvalues are distinct from the object's declared type. */
if (same_base_p)
{
- if (DR_UNCONSTRAINED_BASE (a))
+ if (indices_a->unconstrained_base)
full_seq.length += 1;
}
else
@@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a,
/* Punt if we didn't find a suitable sequence. */
if (full_seq.length == 0)
{
- DDR_ARE_DEPENDENT (res) = chrec_dont_know;
- return res;
+ if (use_alt_indices
+ || (TREE_CODE (DR_REF (a)) == MEM_REF
+ && TREE_CODE (DR_REF (b)) == MEM_REF)
+ || may_be_nonaddressable_p (DR_REF (a))
+ || may_be_nonaddressable_p (DR_REF (b)))
+ {
+ /* Fully exhausted possibilities. */
+ DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+ return res;
+ }
+
+ /* Try evaluating both DRs as dereferences of pointers. */
+ if (!a->alt_indices.base_object
+ && TREE_CODE (DR_REF (a)) != MEM_REF)
+ {
+ tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
+ build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
+ build_int_cst
+ (reference_alias_ptr_type (DR_REF (a)), 0));
+ dr_analyze_indices (&a->alt_indices, alt_ref,
+ loop_preheader_edge (loop_nest[0]),
+ loop_containing_stmt (DR_STMT (a)));
+ }
+ if (!b->alt_indices.base_object
+ && TREE_CODE (DR_REF (b)) != MEM_REF)
+ {
+ tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
+ build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
+ build_int_cst
+ (reference_alias_ptr_type (DR_REF (b)), 0));
+ dr_analyze_indices (&b->alt_indices, alt_ref,
+ loop_preheader_edge (loop_nest[0]),
+ loop_containing_stmt (DR_STMT (b)));
+ }
+ return initialize_data_dependence_relation (res, loop_nest, true);
}
if (!same_base_p)
@@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a,
struct subscript *subscript;
subscript = XNEW (struct subscript);
- SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
- SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
+ SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
+ SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i];
SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
@@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a,
return res;
}
+/* Initialize a data dependence relation between data accesses A and
+ B. NB_LOOPS is the number of loops surrounding the references: the
+ size of the classic distance/direction vectors. */
+
+struct data_dependence_relation *
+initialize_data_dependence_relation (struct data_reference *a,
+ struct data_reference *b,
+ vec<loop_p> loop_nest)
+{
+ data_dependence_relation *res = XCNEW (struct data_dependence_relation);
+ DDR_A (res) = a;
+ DDR_B (res) = b;
+ DDR_LOOP_NEST (res).create (0);
+ DDR_SUBSCRIPTS (res).create (0);
+ DDR_DIR_VECTS (res).create (0);
+ DDR_DIST_VECTS (res).create (0);
+
+ if (a == NULL || b == NULL)
+ {
+ DDR_ARE_DEPENDENT (res) = chrec_dont_know;
+ return res;
+ }
+
+ /* If the data references do not alias, then they are independent. */
+ if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
+ {
+ DDR_ARE_DEPENDENT (res) = chrec_known;
+ return res;
+ }
+
+ return initialize_data_dependence_relation (res, loop_nest, false);
+}
+
+
/* Frees memory used by the conflict function F. */
static void
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 685f33d85ae..74f579c9f3f 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -166,14 +166,19 @@ struct data_reference
and runs to completion. */
bool is_conditional_in_stmt;
+ /* Alias information for the data reference. */
+ struct dr_alias alias;
+
/* Behavior of the memory reference in the innermost loop. */
struct innermost_loop_behavior innermost;
/* Subscripts of this data reference. */
struct indices indices;
- /* Alias information for the data reference. */
- struct dr_alias alias;
+ /* Alternate subscripts initialized lazily and used by data-dependence
+ analysis only when the main indices of two DRs are not comparable.
+ Keep last to keep vec_info_shared::check_datarefs happy. */
+ struct indices alt_indices;
};
#define DR_STMT(DR) (DR)->stmt
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 3aa3e2a6783..20daa31187d 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
return;
gcc_assert (datarefs.length () == datarefs_copy.length ());
for (unsigned i = 0; i < datarefs.length (); ++i)
- if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
+ if (memcmp (&datarefs_copy[i], datarefs[i],
+ offsetof (data_reference, alt_indices)) != 0)
gcc_unreachable ();
}
</cut>
After llvm commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Turn on the new pass manager by default
the following benchmarks grew in size by more than 1%:
- 403.gcc grew in size by 2% from 2586180 to 2648252 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their latest release branch
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
---
llvm/CMakeLists.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 1affc289e64b..f5298de9f7ca 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -688,8 +688,8 @@ else()
endif()
option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default})
-set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL
- "Enable the experimental new pass manager by default.")
+set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL
+ "Enable the new pass manager by default.")
include(HandleLLVMOptions)
</cut>
After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
Author: Patrick Palka <ppalka(a)redhat.com>
libstdc++: Add floating-point std::to_chars implementation
the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%):
- 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 bytes
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their latest release branch
- Target: arm-linux-gnueabihf
- Compiler flags: -Os -mthumb
- Hardware: APM Mustang 8x X-Gene1
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
Author: Patrick Palka <ppalka(a)redhat.com>
Date: Thu Dec 17 23:11:34 2020 -0500
libstdc++: Add floating-point std::to_chars implementation
This implements the floating-point std::to_chars overloads for float,
double and long double. We use the Ryu library to compute the shortest
round-trippable fixed and scientific forms for float, double and long
double. We also use Ryu for performing explicit-precision fixed and
scientific formatting for float and double. For explicit-precision
formatting for long double we fall back to using printf. Hexadecimal
formatting for float, double and long double is implemented from
scratch.
The supported long double binary formats are binary64, binary80 (x86
80-bit extended precision), binary128 and ibm128.
Much of the complexity of the implementation is in computing the exact
output length before handing it off to Ryu (which doesn't do bounds
checking). In some cases it's hard to compute the output length
beforehand, so in these cases we instead compute an upper bound on the
output length and use a sufficiently-sized intermediate buffer only if
necessary.
Another source of complexity is in the general-with-precision formatting
mode, where we need to do zero-trimming of the string returned by Ryu,
and where we also take care to avoid having to format the number through
Ryu a second time when the general formatting mode resolves to fixed
(which we determine by doing a scientific formatting first and
inspecting the scientific exponent). We avoid going through Ryu twice
by instead transforming the scientific form to the corresponding fixed
form via in-place string manipulation.
This implementation is non-conforming in a couple of ways:
1. For the shortest hexadecimal formatting, we currently follow the
Microsoft implementation's decision to be consistent with the
output of printf's '%a' specifier at the expense of sometimes not
printing the shortest representation. For example, the shortest hex
form for the number 1.08p+0 is 2.1p-1, but we output the former
instead of the latter, as does printf.
2. The Ryu routine generic_binary_to_decimal that we use for performing
shortest formatting for large floating point types is implemented
using the __int128 type, but some targets with a large long double
type lack __int128 (e.g. i686), so we can't perform shortest
formatting of long double on such targets through Ryu. As a
temporary stopgap this patch makes the long double to_chars overloads
just dispatch to the double overloads on these targets, which means
we lose precision in the output. (We could potentially fix this by
writing a specialized version of Ryu's generic_binary_to_decimal
routine that uses uint64_t instead of __int128.) [Though I wonder if
there's a better way to work around the lack of __int128 on i686
specifically?]
3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip
property if the difference between the high- and low-order exponent
is large. This is because we treat __ibm128 as if it has a
contiguous 105-bit mantissa by merging the mantissas of the high-
and low-order parts (using code extracted from glibc), so we
potentially lose precision from the low-order part. This seems to be
consistent with how glibc printf formats __ibm128.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver: Add new exports.
* include/std/charconv (to_chars): Declare the floating-point
overloads for float, double and long double.
* src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
* src/c++17/Makefile.in: Regenerate.
* src/c++17/floating_to_chars.cc: New file.
(to_chars): Define for float, double and long double.
* testsuite/20_util/to_chars/long_double.cc: New test.
---
libstdc++-v3/config/abi/pre/gnu.ver | 7 +
libstdc++-v3/include/std/charconv | 24 +
libstdc++-v3/src/c++17/Makefile.am | 1 +
libstdc++-v3/src/c++17/Makefile.in | 3 +-
libstdc++-v3/src/c++17/floating_to_chars.cc | 1563 ++++++++++++++++++++
.../testsuite/20_util/to_chars/long_double.cc | 199 +++
6 files changed, 1796 insertions(+), 1 deletion(-)
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 4b4bd8ab6da..05e0a512247 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 {
# std::once_flag::_M_finish(bool)
_ZNSt9once_flag9_M_finishEb;
+ # std::to_chars(char*, char*, [float|double|long double])
+ _ZSt8to_charsPcS_[defg];
+ # std::to_chars(char*, char*, [float|double|long double], chars_format)
+ _ZSt8to_charsPcS_[defg]St12chars_format;
+ # std::to_chars(char*, char*, [float|double|long double], chars_format, int)
+ _ZSt8to_charsPcS_[defg]St12chars_formati;
+
} GLIBCXX_3.4.28;
# Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index dd1ebdf8322..b57b0a16db2 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -702,6 +702,30 @@ namespace __detail
chars_format __fmt = chars_format::general) noexcept;
#endif
+ // Floating-point std::to_chars
+
+ // Overloads for float.
+ to_chars_result to_chars(char* __first, char* __last, float __value) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, float __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, float __value,
+ chars_format __fmt, int __precision) noexcept;
+
+ // Overloads for double.
+ to_chars_result to_chars(char* __first, char* __last, double __value) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, double __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, double __value,
+ chars_format __fmt, int __precision) noexcept;
+
+ // Overloads for long double.
+ to_chars_result to_chars(char* __first, char* __last, long double __value)
+ noexcept;
+ to_chars_result to_chars(char* __first, char* __last, long double __value,
+ chars_format __fmt) noexcept;
+ to_chars_result to_chars(char* __first, char* __last, long double __value,
+ chars_format __fmt, int __precision) noexcept;
+
_GLIBCXX_END_NAMESPACE_VERSION
} // namespace std
#endif // C++14
diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am
index 37cdb53c076..2ec5ed621ca 100644
--- a/libstdc++-v3/src/c++17/Makefile.am
+++ b/libstdc++-v3/src/c++17/Makefile.am
@@ -51,6 +51,7 @@ endif
sources = \
floating_from_chars.cc \
+ floating_to_chars.cc \
fs_dir.cc \
fs_ops.cc \
fs_path.cc \
diff --git a/libstdc++-v3/src/c++17/Makefile.in b/libstdc++-v3/src/c++17/Makefile.in
index ccae721ab3f..9b36b7a916c 100644
--- a/libstdc++-v3/src/c++17/Makefile.in
+++ b/libstdc++-v3/src/c++17/Makefile.in
@@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES)
libc__17convenience_la_LIBADD =
@ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \
@ENABLE_DUAL_ABI_TRUE@ cow-fs_path.lo
-am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \
+am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \
memory_resource.lo $(am__objects_1)
@ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo
@ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \
@@ -440,6 +440,7 @@ headers =
sources = \
floating_from_chars.cc \
+ floating_to_chars.cc \
fs_dir.cc \
fs_ops.cc \
fs_path.cc \
diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc
new file mode 100644
index 00000000000..dd83f5eea93
--- /dev/null
+++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
@@ -0,0 +1,1563 @@
+// std::to_chars implementation for floating-point types -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+// Activate __glibcxx_assert within this file to shake out any bugs.
+#define _GLIBCXX_ASSERTIONS 1
+
+#include <charconv>
+
+#include <bit>
+#include <cfenv>
+#include <cassert>
+#include <cmath>
+#include <cstdio>
+#include <cstring>
+#include <langinfo.h>
+#include <optional>
+#include <string_view>
+#include <type_traits>
+
+// Determine the binary format of 'long double'.
+
+// We support the binary64, float80 (i.e. x86 80-bit extended precision),
+// binary128, and ibm128 formats.
+#define LDK_UNSUPPORTED 0
+#define LDK_BINARY64 1
+#define LDK_FLOAT80 2
+#define LDK_BINARY128 3
+#define LDK_IBM128 4
+
+#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
+# define LONG_DOUBLE_KIND LDK_BINARY64
+#elif defined(__SIZEOF_INT128__)
+// The Ryu routines need a 128-bit integer type in order to do shortest
+// formatting of types larger than 64-bit double, so without __int128 we can't
+// support any large long double format. This is the case for e.g. i386.
+# if __LDBL_MANT_DIG__ == 64
+# define LONG_DOUBLE_KIND LDK_FLOAT80
+# elif __LDBL_MANT_DIG__ == 113
+# define LONG_DOUBLE_KIND LDK_BINARY128
+# elif __LDBL_MANT_DIG__ == 106
+# define LONG_DOUBLE_KIND LDK_IBM128
+# endif
+#endif
+#if !defined(LONG_DOUBLE_KIND)
+# define LONG_DOUBLE_KIND LDK_UNSUPPORTED
+#endif
+
+namespace
+{
+ namespace ryu
+ {
+#include "ryu/common.h"
+#include "ryu/digit_table.h"
+#include "ryu/d2s_intrinsics.h"
+#include "ryu/d2s_full_table.h"
+#include "ryu/d2fixed_full_table.h"
+#include "ryu/f2s_intrinsics.h"
+#include "ryu/d2s.c"
+#include "ryu/d2fixed.c"
+#include "ryu/f2s.c"
+
+#ifdef __SIZEOF_INT128__
+ namespace generic128
+ {
+ // Put the generic Ryu bits in their own namespace to avoid name conflicts.
+# include "ryu/generic_128.h"
+# include "ryu/ryu_generic_128.h"
+# include "ryu/generic_128.c"
+ } // namespace generic128
+
+ using generic128::floating_decimal_128;
+ using generic128::generic_binary_to_decimal;
+
+ int
+ to_chars(const floating_decimal_128 v, char* const result)
+ { return generic128::generic_to_chars(v, result); }
+#endif
+ } // namespace ryu
+
+ // A traits class that contains pertinent information about the binary
+ // format of each of the floating-point types we support.
+ template<typename T>
+ struct floating_type_traits
+ { };
+
+ template<>
+ struct floating_type_traits<float>
+ {
+ // We (and Ryu) assume float has the IEEE binary32 format.
+ static_assert(__FLT_MANT_DIG__ == 24);
+ static constexpr int mantissa_bits = 23;
+ static constexpr int exponent_bits = 8;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = uint32_t;
+ using shortest_scientific_t = ryu::floating_decimal_32;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000011101011100110101100101101110000000000000000000000000 };
+ };
+
+ template<>
+ struct floating_type_traits<double>
+ {
+ // We (and Ryu) assume double has the IEEE binary64 format.
+ static_assert(__DBL_MANT_DIG__ == 53);
+ static constexpr int mantissa_bits = 52;
+ static constexpr int exponent_bits = 11;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = uint64_t;
+ using shortest_scientific_t = ryu::floating_decimal_64;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000011000110101110111000001100101110000111100,
+ 0b0111100011110101011000011110000000110110010101011000001110011111,
+ 0b0101101100000000011100100100111100110110110100010001010101110000,
+ 0b0011110010111000101111110101100011101100010001010000000101100111,
+ 0b0001010000011001011100100001010000010101101000001101000000000000 };
+ };
+
+#if LONG_DOUBLE_KIND == LDK_BINARY64
+ // When long double is equivalent to double, we just forward the long double
+ // overloads to the double overloads, so we don't need to define a a
+ // floating_type_traits<long double> specialization in this case.
+#elif LONG_DOUBLE_KIND == LDK_FLOAT80
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 64;
+ static constexpr int exponent_bits = 15;
+ static constexpr bool has_implicit_leading_bit = false;
+ using mantissa_t = uint64_t;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000110101011111110100010100110000011101,
+ 0b1001100101001111010011011111101000101111110001011001011101110000,
+ 0b0000101111111011110010001000001010111101011110111111010100011001,
+ 0b0011100000011111001101101011111001111100100010000101001111101001,
+ 0b0100100100000000100111010010101110011000110001101101110011001010,
+ 0b0111100111100010100000010011000010010110101111110101000011110100,
+ 0b1010100111100010011110000011011101101100010110000110101010101010,
+ 0b0000001111001111000000101100111011011000101000110011101100110010,
+ 0b0111000011100100101101010100001101111110101111001000010011111111,
+ 0b0010111000100110100100100010101100111010110001101010010111001000,
+ 0b0000100000010110000011001001000111000001111010100101101000001111,
+ 0b0010101011101000111100001011000010011101000101010010010000101111,
+ 0b1011111011101101110010101011010001111000101000101101011001100011,
+ 0b1010111011011011110111110011001010000010011001110100101101000101,
+ 0b0011000001110110011010010000011100100011001011001100001101010110,
+ 0b0100011111011000111111101000011110000010111110101001000000001001,
+ 0b1110000001110001001101101110011000100000001010000111100010111010,
+ 0b1110001001010011101000111000001000010100110000010110100011110000,
+ 0b0000011010110000110001111000011111000011001101001101001001000110,
+ 0b1010010111001000101001100101010110100100100010010010000101000010,
+ 0b1011001110000111100010100110000011100011111001110111001100000101,
+ 0b0110101001001000010110001000010001010101110101100001111100011001,
+ 0b1111100011110101011110011010101001010010100011000010110001101001,
+ 0b0100000100001000111101011100010011011111011001000000001100011000,
+ 0b1110111111000111100101110111110000000011001110011100011011011001,
+ 0b1100001100100000010001100011011000111011110000110011010101000011,
+ 0b1111111011100111011101001111111000010000001111010111110010000100,
+ 0b1110111001111110101111000101000000001010001110011010001000111010,
+ 0b1000010001011000101111111010110011111101110101101001111000111010,
+ 0b0100000111101001000111011001101000001010111011101001101111000100,
+ 0b0000011100110001000111011100111100110001101111111010110111100000,
+ 0b0000011101011100100110010011110101010100010011110010010111010000,
+ 0b0011011001100111110101111100001001101110101101001110110011110110,
+ 0b1011000101000001110100111001100100111100110011110000000001101000,
+ 0b1011100011110100001001110101010110111001000000001011101001011110,
+ 0b1111001010010010100000010110101010101011101000101000000000001100,
+ 0b1000001111100100111001110101100001010011111111000001000011110000,
+ 0b0001011101001000010000101101111000001110101100110011001100110111,
+ 0b1110011100000010101011011111001010111101111110100000011100000011,
+ 0b1001110110011100101010011110100010110001001110110000101011100110,
+ 0b1001101000100011100111010000011011100001000000110101100100001001,
+ 0b1010111000101000101101010111000010001100001010100011111100000100,
+ 0b0111101000100011000101101011111011100010001101110111001111001011,
+ 0b1110100111010110001110110110000000010110100011110000010001111100,
+ 0b1100010100011010001011001000111001010101011110100101011001000000,
+ 0b0000110001111001100110010110111010101101001101000000000010010101,
+ 0b0001110111101000001111101010110010010000111110111100000111110100,
+ 0b0111110111001001111000110001101101001010101110110101111110000100,
+ 0b0000111110111010101111100010111010011100010110011011011001000001,
+ 0b1010010100100100101110111111111000101100000010111111101101000110,
+ 0b1000100111111101100011001101000110001000000100010101010100001101,
+ 0b1100101010101000111100101100001000110001110010100000000010110101,
+ 0b1010000100111101100100101010010110100010000000110101101110000100,
+ 0b1011111011110001110000100100000000001010111010001101100000100100,
+ 0b0111101101100011001110011100000001000101101101111000100111011111,
+ 0b0100111010010011011001010011110100001100111010010101111111100011,
+ 0b0010001001011000111000001100110111110111110010100011000110110110,
+ 0b0101010110000000010000100000110100111011111101000100000111010010,
+ 0b0110000011011101000001010100110101101110011100110101000000001001,
+ 0b1101100110100000011000001111000100100100110001100110101010101100,
+ 0b0010100101010110010010001010101000011111111111001011001010001111,
+ 0b0111001010001111001100111001010101001000110101000011110000001000,
+ 0b0110010011001001001111110001010010001011010010001101110110110011,
+ 0b0110010100111011000100111000001001101011111001110010111110111111,
+ 0b0101110111001001101100110100101001110010101110011001101110001000,
+ 0b0100110101010111011010001100010111100011010011111001010100111000,
+ 0b0111000110110111011110100100010111000110000110110110110001111110,
+ 0b1000101101010100100100111110100011110110110010011001110011110101,
+ 0b1001101110101001010100111101101011000101000010110101101111110000,
+ 0b0100100101001011011001001011000010001101001010010001010110101000,
+ 0b0010100001001011100110101000010110000111000111000011100101011011,
+ 0b0110111000011001111101101011111010001000000010101000101010011110,
+ 0b1000110110100001111011000001111100001001000000010110010100100100,
+ 0b1001110100011111100111101011010000010101011100101000010010100110,
+ 0b0001010110101110100010101010001110110110100011101010001001111100,
+ 0b1010100101101100000010110011100110100010010000100100001110000100,
+ 0b0001000000010000001010000010100110000001110100111001110111101101,
+ 0b1100000000000000000000000000000000000000000000000000000000000000 };
+ };
+#elif LONG_DOUBLE_KIND == LDK_BINARY128
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 112;
+ static constexpr int exponent_bits = 15;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = unsigned __int128;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000000000000000000000000100000010000000,
+ 0b1011001111110100000100010101101110011100100110000110010110011000,
+ 0b1010100010001101111111000000001101010010100010010000111011110111,
+ 0b1011111001110001111000011111000010110111000111110100101010100101,
+ 0b0110100110011110011011000011000010011001110001001001010011100011,
+ 0b0000011111110010101111101011101010000110011111100111001110100111,
+ 0b0100010101010110000010111011110100000010011001001010001110111101,
+ 0b1101110111000010001101100000110100000111001001101011000101011011,
+ 0b0100111011101101010000001101011000101100101110010010110000101011,
+ 0b0100000110111000000110101000010011101000110100010110000011101101,
+ 0b1011001101001000100001010001100100001111011101010101110001010110,
+ 0b1000000001000000101001110010110010001111101101010101001100000110,
+ 0b0101110110100110000110000001001010111110001110010000111111010011,
+ 0b1010001111100111000100011100100100111100100101000001011001000111,
+ 0b1010011000011100110101100111001011100101111111100001110100000100,
+ 0b1100011100100010100000110001001010000000100000001001010111011101,
+ 0b0101110000100011001111101101000000100110000010010111010001111010,
+ 0b0100111100011010110111101000100110000111001001101100000001111100,
+ 0b1100100100111110101011000100000101011010110111000111110100110101,
+ 0b0110010000010111010100110011000000111010000010111011010110000100,
+ 0b0101001001010010110111010111000101011100000111100111000001110010,
+ 0b1101111111001011101010110001000111011010111101001011010110100100,
+ 0b0001000100110000011111101011001101110010110110010000000011100100,
+ 0b0001000000000101001001001000000000011000100011001110101001001110,
+ 0b0010010010001000111010011011100001000110011011011110110100111000,
+ 0b0000100110101100000111100010100100011100110111011100001111001100,
+ 0b1011111010001110001100000011110111111111100000001011111111101100,
+ 0b0000011100001111010101110000100110111100101101110111101001000001,
+ 0b1100010001110110111100001001001101101000011100000010110101001011,
+ 0b0100101001101011111001011110101101100011011111011100101010101111,
+ 0b0001101001111001110000101101101100001011010001011110011101000010,
+ 0b1111000000101001101111011010110011101110100001011011001011100010,
+ 0b0101001010111101101100001111100010010110001101001000001101100100,
+ 0b0101100101011110001100101011111000111001111001001001101101100001,
+ 0b1111001101010010100100011011000110110010001111000111010001001101,
+ 0b0001110010011000000001000110110111011000011100001000011001110111,
+ 0b0100001011011011011011110011101100100101111111101100101000001110,
+ 0b0101011110111101010111100111101111000101111111111110100011011010,
+ 0b1110101010001001110100000010110111010111111010111110100110010110,
+ 0b1010001111100001001100101000110100001100011100110010000011010111,
+ 0b1111111101101111000100111100000101011000001110011011101010111001,
+ 0b1111101100001110100101111101011001000100000101110000110010100011,
+ 0b1001010110110101101101000101010001010000101011011111010011010000,
+ 0b0111001110110011101001100111000001000100001010110000010000001101,
+ 0b0101111100111110100111011001111001111011011110010111010011101010,
+ 0b1110111000000001100100111001100100110001011011001110101111110111,
+ 0b0001010001001101010111101010011111000011110001101101011001111111,
+ 0b0101000011100011010010001101100001011101011010100110101100100010,
+ 0b0001000101011000100101111100110110000101101101111000110001001011,
+ 0b0101100101001011011000010101000000010100011100101101000010011111,
+ 0b1000010010001011101001011010100010111011110100110011011000100111,
+ 0b1000011011100001010111010111010011101100100010010010100100101001,
+ 0b1001001001010111110101000010111010000000101111010100001010010010,
+ 0b0011011110110010010101111011000001000000000011011111000011111011,
+ 0b1011000110100011001110000001000100000001011100010111010010011110,
+ 0b0111101110110101110111110000011000000100011100011000101101101110,
+ 0b1001100101111011011100011110101011001111100111101010101010110111,
+ 0b1100110010010001100011001111010000000100011101001111011101001111,
+ 0b1000111001111010100101000010000100000001001100101010001011001101,
+ 0b0011101011110000110010100101010100110010100001000010101011111101,
+ 0b1100000000000110000010101011000000011101000110011111100010111111,
+ 0b0010100110000011011100010110111100010110101100110011101110001101,
+ 0b0010111101010011111000111001111100110111111100100011110001101110,
+ 0b1001110111001001101001001001011000010100110001000000100011010110,
+ 0b0011110101100111011011111100001000011001010100111100100101111010,
+ 0b0010001101000011000010100101110000010101101000100110000100001010,
+ 0b0010000010100110010101100101110011101111000111111111001001100001,
+ 0b0100111111011011011011100111111011000010011101101111011111110110,
+ 0b1111111111010110101011101000100101110100001110001001101011100111,
+ 0b1011111101000101110000111100100010111010100001010000010010110010,
+ 0b1111010101001011101011101010000100110110001110111100100110111111,
+ 0b1011001101000001001101000010101010010110010001100001011100011010,
+ 0b0101001011011101010001110100010000010001111100100100100001001101,
+ 0b0010100000111001100011000101100101000001111100111001101000000010,
+ 0b1011001111010101011001000100100110100100110111110100000110111000,
+ 0b0101011111010011100011010010111101110010100001111111100010001001,
+ 0b0010111011101100100000000000001111111010011101100111100001001101,
+ 0b1101000000000000000000000000000000000000000000000000000000000000 };
+ };
+#elif LONG_DOUBLE_KIND == LDK_IBM128
+ template<>
+ struct floating_type_traits<long double>
+ {
+ static constexpr int mantissa_bits = 105;
+ static constexpr int exponent_bits = 11;
+ static constexpr bool has_implicit_leading_bit = true;
+ using mantissa_t = unsigned __int128;
+ using shortest_scientific_t = ryu::floating_decimal_128;
+
+ static constexpr uint64_t pow10_adjustment_tab[]
+ = { 0b0000000000000000000000000000000000000000000000001000000100000000,
+ 0b0000000000000000000100000000000000000000001000000000000000000010,
+ 0b0000100000000000000000001001000000000000000001100100000000000000,
+ 0b0011000000000000000000000000000001110000010000000000000000000000,
+ 0b0000100000000000001000000000000000000000000000100000000000000000 };
+ };
+#endif
+
+ // An IEEE-style decomposition of a floating-point value of type T.
+ template<typename T>
+ struct ieee_t
+ {
+ typename floating_type_traits<T>::mantissa_t mantissa;
+ uint32_t biased_exponent;
+ bool sign;
+ };
+
+ // Decompose the floating-point value into its IEEE components.
+ template<typename T>
+ ieee_t<T>
+ get_ieee_repr(const T value)
+ {
+ constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
+ constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
+ constexpr int total_bits = mantissa_bits + exponent_bits + 1;
+
+ constexpr auto get_uint_t = [] {
+ if constexpr (total_bits <= 32)
+ return uint32_t{};
+ else if constexpr (total_bits <= 64)
+ return uint64_t{};
+#ifdef __SIZEOF_INT128__
+ else if constexpr (total_bits <= 128)
+ return (unsigned __int128){};
+#endif
+ };
+ using uint_t = decltype(get_uint_t());
+ uint_t value_bits = 0;
+ memcpy(&value_bits, &value, sizeof(value));
+
+ ieee_t<T> ieee_repr;
+ ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u);
+ ieee_repr.biased_exponent
+ = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u);
+ ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1;
+ return ieee_repr;
+ }
+
+#if LONG_DOUBLE_KIND == LDK_IBM128
+ template<>
+ ieee_t<long double>
+ get_ieee_repr(const long double value)
+ {
+ // The layout of __ibm128 isn't compatible with the standard IEEE format.
+ // So we transform it into an IEEE-compatible format, suitable for
+ // consumption by the generic Ryu API, with an 11-bit exponent and 105-bit
+ // mantissa (plus an implicit leading bit). We use the exponent and sign
+ // of the high part, and we merge the mantissa of the high part with the
+ // mantissa (and the implicit leading bit) of the low part.
+ using uint_t = unsigned __int128;
+ uint_t value_bits = 0;
+ memcpy(&value_bits, &value, sizeof(value_bits));
+
+ const uint64_t value_hi = value_bits;
+ const uint64_t value_lo = value_bits >> 64;
+
+ uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1);
+ unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1);
+ const int sign_hi = (value_hi >> 63) & 1;
+
+ uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1);
+ const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1);
+ const int sign_lo = (value_lo >> 63) & 1;
+
+ {
+ // The following code for adjusting the low-part mantissa to combine
+ // it with the high-part mantissa is taken from the glibc source file
+ // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c.
+ mantissa_lo <<= 7;
+ if (exponent_lo != 0)
+ mantissa_lo |= (1ull << (52 + 7));
+ else
+ mantissa_lo <<= 1;
+
+ const int ediff = exponent_hi - exponent_lo - 53;
+ if (ediff > 63)
+ mantissa_lo = 0;
+ else if (ediff > 0)
+ mantissa_lo >>= ediff;
+ else if (ediff < 0)
+ mantissa_lo <<= -ediff;
+
+ if (sign_lo != sign_hi && mantissa_lo != 0)
+ {
+ mantissa_lo = (1ull << 60) - mantissa_lo;
+ if (mantissa_hi == 0)
+ {
+ mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59);
+ mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1);
+ exponent_hi--;
+ }
+ else
+ mantissa_hi--;
+ }
+ }
+
+ ieee_t<long double> ieee_repr;
+ ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64)
+ | (uint_t{mantissa_lo} << 4)) >> 11;
+ ieee_repr.biased_exponent = exponent_hi;
+ ieee_repr.sign = sign_hi;
+ return ieee_repr;
+ }
+#endif
+
+ // Invoke Ryu to obtain the shortest scientific form for the given
+ // floating-point number.
+ template<typename T>
+ typename floating_type_traits<T>::shortest_scientific_t
+ floating_to_shortest_scientific(const T value)
+ {
+ if constexpr (std::is_same_v<T, float>)
+ return ryu::floating_to_fd32(value);
+ else if constexpr (std::is_same_v<T, double>)
+ return ryu::floating_to_fd64(value);
+#ifdef __SIZEOF_INT128__
+ else if constexpr (std::is_same_v<T, long double>)
+ {
+ constexpr int mantissa_bits
+ = floating_type_traits<T>::mantissa_bits;
+ constexpr int exponent_bits
+ = floating_type_traits<T>::exponent_bits;
+ constexpr bool has_implicit_leading_bit
+ = floating_type_traits<T>::has_implicit_leading_bit;
+
+ const auto [mantissa, exponent, sign] = get_ieee_repr(value);
+ return ryu::generic_binary_to_decimal(mantissa, exponent, sign,
+ mantissa_bits, exponent_bits,
+ !has_implicit_leading_bit);
+ }
+#endif
+ }
+
+ // This subroutine returns true if the shortest scientific form fd is a
+ // positive power of 10, and the floating-point number that has this shortest
+ // scientific form is smaller than this power of 10.
+ //
+ // For instance, the exactly-representable 64-bit number
+ // 99999999999999991611392.0 has the shortest scientific form 1e23, so its
+ // exact value is smaller than its shortest scientific form.
+ //
+ // For these powers of 10 the length of the fixed form is one digit less
+ // than what the scientific exponent suggests.
+ //
+ // This subroutine inspects a lookup table to detect when fd is such a
+ // "rounded up" power of 10.
+ template<typename T>
+ bool
+ is_rounded_up_pow10_p(const typename
+ floating_type_traits<T>::shortest_scientific_t fd)
+ {
+ if (fd.exponent < 0 || fd.mantissa != 1) [[likely]]
+ return false;
+
+ constexpr auto& pow10_adjustment_tab
+ = floating_type_traits<T>::pow10_adjustment_tab;
+ __glibcxx_assert(fd.exponent/64 < (int)std::size(pow10_adjustment_tab));
+ return (pow10_adjustment_tab[fd.exponent/64]
+ & (1ull << (63 - fd.exponent%64)));
+ }
+
+ int
+ get_mantissa_length(const ryu::floating_decimal_32 fd)
+ { return ryu::decimalLength9(fd.mantissa); }
+
+ int
+ get_mantissa_length(const ryu::floating_decimal_64 fd)
+ { return ryu::decimalLength17(fd.mantissa); }
+
+#ifdef __SIZEOF_INT128__
+ int
+ get_mantissa_length(const ryu::floating_decimal_128 fd)
+ { return ryu::generic128::decimalLength(fd.mantissa); }
+#endif
+} // anon namespace
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in
+// all formatting modes.
+template<typename T>
+ static optional<to_chars_result>
+ __handle_special_value(char* first, char* const last, const T value,
+ const chars_format fmt, const int precision)
+ {
+ __glibcxx_assert(precision >= 0);
+
+ string_view str;
+ switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
+ FP_ZERO, value))
+ {
+ case FP_INFINITE:
+ str = "-inf";
+ break;
+
+ case FP_NAN:
+ str = "-nan";
+ break;
+
+ case FP_ZERO:
+ break;
+
+ default:
+ case FP_SUBNORMAL:
+ case FP_NORMAL: [[likely]]
+ return nullopt;
+ }
+
+ if (!str.empty())
+ {
+ // We're formatting +-inf or +-nan.
+ if (!__builtin_signbit(value))
+ str.remove_prefix(strlen("-"));
+
+ if (last - first < (int)str.length())
+ return {{last, errc::value_too_large}};
+
+ memcpy(first, &str[0], str.length());
+ first += str.length();
+ return {{first, errc{}}};
+ }
+
+ // We're formatting 0.
+ __glibcxx_assert(value == 0);
+ const auto orig_first = first;
+ const bool sign = __builtin_signbit(value);
+ int expected_output_length;
+ switch (fmt)
+ {
+ case chars_format::fixed:
+ case chars_format::scientific:
+ case chars_format::hex:
+ expected_output_length = sign + 1;
+ if (precision)
+ expected_output_length += strlen(".") + precision;
+ if (fmt == chars_format::scientific)
+ expected_output_length += strlen("e+00");
+ else if (fmt == chars_format::hex)
+ expected_output_length += strlen("p+0");
+ if (last - first < expected_output_length)
+ return {{last, errc::value_too_large}};
+
+ if (sign)
+ *first++ = '-';
+ *first++ = '0';
+ if (precision)
+ {
+ *first++ = '.';
+ memset(first, '0', precision);
+ first += precision;
+ }
+ if (fmt == chars_format::scientific)
+ {
+ memcpy(first, "e+00", 4);
+ first += 4;
+ }
+ else if (fmt == chars_format::hex)
+ {
+ memcpy(first, "p+0", 3);
+ first += 3;
+ }
+ break;
+
+ case chars_format::general:
+ default: // case chars_format{}:
+ expected_output_length = sign + 1;
+ if (last - first < expected_output_length)
+ return {{last, errc::value_too_large}};
+
+ if (sign)
+ *first++ = '-';
+ *first++ = '0';
+ break;
+ }
+ __glibcxx_assert(first - orig_first == expected_output_length);
+ return {{first, errc{}}};
+ }
+
+// This subroutine of the floating-point to_chars overloads performs
+// hexadecimal formatting.
+template<typename T>
+ static to_chars_result
+ __floating_to_chars_hex(char* first, char* const last, const T value,
+ const optional<int> precision)
+ {
+ if (precision.has_value() && precision.value() < 0) [[unlikely]]
+ // A negative precision argument is treated as if it were omitted.
+ return __floating_to_chars_hex(first, last, value, nullopt);
+
+ __glibcxx_requires_valid_range(first, last);
+
+ constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
+ constexpr bool has_implicit_leading_bit
+ = floating_type_traits<T>::has_implicit_leading_bit;
+ constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
+ constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1;
+ using mantissa_t = typename floating_type_traits<T>::mantissa_t;
+ constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__;
+
+ if (auto result = __handle_special_value(first, last, value,
+ chars_format::hex,
+ precision.value_or(0)))
+ return *result;
+
+ // Extract the sign, mantissa and exponent from the value.
+ const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value);
+ const bool is_normal_number = (biased_exponent != 0);
+
+ // Calculate the unbiased exponent.
+ const int32_t unbiased_exponent = (is_normal_number
+ ? biased_exponent - exponent_bias
+ : 1 - exponent_bias);
+
+ // Shift the mantissa so that its bitwidth is a multiple of 4.
+ constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4;
+ static_assert(mantissa_t_width >= rounded_mantissa_bits);
+ mantissa_t effective_mantissa
+ = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits);
+ if (is_normal_number)
+ {
+ if constexpr (has_implicit_leading_bit)
+ // Restore the mantissa's implicit leading bit.
+ effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits;
+ else
+ // The explicit mantissa bit should already be set.
+ __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits
+ - 1u)));
+ }
+
+ // Compute the shortest precision needed to print this value exactly,
+ // disregarding trailing zeros.
+ constexpr int full_hex_precision = (has_implicit_leading_bit
+ ? (mantissa_bits + 3) / 4
+ // With an explicit leading bit, we
+ // use the four leading nibbles as the
+ // hexit before the decimal point.
+ : (mantissa_bits - 4 + 3) / 4);
+ const int trailing_zeros = __countr_zero(effective_mantissa) / 4;
+ const int shortest_full_precision = full_hex_precision - trailing_zeros;
+ __glibcxx_assert(shortest_full_precision >= 0);
+
+ int written_exponent = unbiased_exponent;
+ const int effective_precision = precision.value_or(shortest_full_precision);
+ if (effective_precision < shortest_full_precision)
+ {
+ // When limiting the precision, we need to determine how to round the
+ // least significant printed hexit. The following branchless
+ // bit-level-parallel technique computes whether to round up the
+ // mantissa bit at index N (according to round-to-nearest rules) when
+ // dropping N bits of precision, for each index N in the bit vector.
+ // This technique is borrowed from the MSVC implementation.
+ using bitvec = mantissa_t;
+ const bitvec round_bit = effective_mantissa << 1;
+ const bitvec has_tail_bits = round_bit - 1;
+ const bitvec lsb_bit = effective_mantissa;
+ const bitvec should_round = round_bit & (has_tail_bits | lsb_bit);
+
+ const int dropped_bits = 4*(full_hex_precision - effective_precision);
+ // Mask out the dropped nibbles.
+ effective_mantissa >>= dropped_bits;
+ effective_mantissa <<= dropped_bits;
+ if (should_round & (mantissa_t{1} << dropped_bits))
+ {
+ // Round up the least significant nibble.
+ effective_mantissa += mantissa_t{1} << dropped_bits;
+ // Check and adjust for overflow of the leading nibble. When the
+ // type has an implicit leading bit, then the leading nibble
+ // before rounding is either 0 or 1, so it can't overflow.
+ if constexpr (!has_implicit_leading_bit)
+ {
+ // The only supported floating-point type with explicit
+ // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit
+ // extended precision, and so we hardcode the below overflow
+ // check+adjustment for this type.
+ static_assert(mantissa_t_width == 64
+ && rounded_mantissa_bits == 64);
+ if (effective_mantissa == 0)
+ {
+ // We rounded up the least significant nibble and the
+ // mantissa overflowed, e.g f.fcp+10 with precision=1
+ // became 10.0p+10. Absorb this extra hexit into the
+ // exponent to obtain 1.0p+14.
+ effective_mantissa
+ = mantissa_t{1} << (rounded_mantissa_bits - 4);
+ written_exponent += 4;
+ }
+ }
+ }
+ }
+
+ // Compute the leading hexit and mask it out from the mantissa.
+ char leading_hexit;
+ if constexpr (has_implicit_leading_bit)
+ {
+ const unsigned nibble = effective_mantissa >> rounded_mantissa_bits;
+ __glibcxx_assert(nibble <= 2);
+ leading_hexit = '0' + nibble;
+ effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits);
+ }
+ else
+ {
+ const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4);
+ __glibcxx_assert(nibble < 16);
+ leading_hexit = "0123456789abcdef"[nibble];
+ effective_mantissa &= ~(mantissa_t{0b1111} << (rounded_mantissa_bits-4));
+ written_exponent -= 3;
+ }
+
+ // Now before we start writing the string, determine the total length of
+ // the output string and perform a single bounds check.
+ int expected_output_length = sign + 1;
+ if (effective_precision != 0)
+ expected_output_length += strlen(".") + effective_precision;
+ const int abs_written_exponent = abs(written_exponent);
+ expected_output_length += (abs_written_exponent >= 10000 ? strlen("p+ddddd")
+ : abs_written_exponent >= 1000 ? strlen("p+dddd")
+ : abs_written_exponent >= 100 ? strlen("p+ddd")
+ : abs_written_exponent >= 10 ? strlen("p+dd")
+ : strlen("p+d"));
+ if (last - first < expected_output_length)
+ return {last, errc::value_too_large};
+
+ const auto saved_first = first;
+ // Write the negative sign and the leading hexit.
+ if (sign)
+ *first++ = '-';
+ *first++ = leading_hexit;
+
+ if (effective_precision > 0)
+ {
+ *first++ = '.';
+ int written_hexits = 0;
+ // Extract and mask out the leading nibble after the decimal point,
+ // write its corresponding hexit, and repeat until the mantissa is
+ // empty.
+ int nibble_offset = rounded_mantissa_bits;
+ if constexpr (!has_implicit_leading_bit)
+ // We already printed the entire leading hexit.
+ nibble_offset -= 4;
+ while (effective_mantissa != 0)
+ {
+ nibble_offset -= 4;
+ const unsigned nibble = effective_mantissa >> nibble_offset;
+ __glibcxx_assert(nibble < 16);
+ *first++ = "0123456789abcdef"[nibble];
+ ++written_hexits;
+ effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset);
+ }
+ __glibcxx_assert(nibble_offset >= 0);
+ __glibcxx_assert(written_hexits <= effective_precision);
+ // Since the mantissa is now empty, every hexit hereafter must be '0'.
+ if (int remaining_hexits = effective_precision - written_hexits)
+ {
+ memset(first, '0', remaining_hexits);
+ first += remaining_hexits;
+ }
+ }
+
+ // Finally, write the exponent.
+ *first++ = 'p';
+ if (written_exponent >= 0)
+ *first++ = '+';
+ const to_chars_result result = to_chars(first, last, written_exponent);
+ __glibcxx_assert(result.ec == errc{}
+ && result.ptr == saved_first + expected_output_length);
+ return result;
+ }
+
+template<typename T>
+ static to_chars_result
+ __floating_to_chars_shortest(char* first, char* const last, const T value,
+ chars_format fmt)
+ {
+ if (fmt == chars_format::hex)
+ return __floating_to_chars_hex(first, last, value, nullopt);
+
+ __glibcxx_assert(fmt == chars_format::fixed
+ || fmt == chars_format::scientific
</cut>