== Progress ==
* Out of office on Thursday
* LLVM 9.0.1
- Uploaded ARM & AArch64 binaries for rc1
- ARM: opened 2 bug reports (asan and cfi tests failing)
* Triaging check-lldb failures on AArch64 [LLVM-512]
- Opened a few more bug reports
- Got one nasty failure that I want to look into a bit more before
committing a patch XFAIL-ing everything so far
* Morello
- Got a VM working, built the toolchain, currently trying to build android
- Setting up all sorts of gerrit accounts and other minutiae
== Plan ==
* More of the same
Hi!
I've attempted to study the implementation of memcpy for 32-bit Arm cores in
Glibc (which is also found in arm-optimized-routines and first appeared in
Linaro's cortex-strings project), and I came across a peculiar snippet:
#ifdef USE_VFP
/* Magic dust alert! Force VFP on Cortex-A9. Experiments show
that the FP pipeline is much better at streaming loads and
stores. This is outside the critical loop. */
vmov.f32 s0, s0
#endif
This seems to imply that this NOP-like instruction affects CPU state and makes
the vldr/vstr instructions that follow use different datapaths that they might
otherwise? Can anyone shed more light on this, please?
I was able to trace history of this code back to revision 100 in cortex-strings
repository, where it appeared as part of a large rewrite by Will Newton:
https://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/rev…
The entire memcpy.S file in Arm optimized-routines repo can be found here:
https://github.com/ARM-software/optimized-routines/blob/master/string/arm/m…
Thanks!
Alexander
Hi Arnd,
I took a look on the stack usage issue in the kernel snippet you provided [1],
and as you have noted the most impact indeed come from -ftree-ch optimization.
It is enabled in all optimization levels besides -Os (since besides possible
increasing the stack usage it also might increase code side).
I am still fulling grasping what free-ch optimization does, but my understanding
so far is it tries to reorganize the loop for later loop optimization phases.
More specifically, what it ends up doing on the specific snippet is create extra
stack variables for the internal membber access in the inner loop (which in its
turns increase stack usage).
This is also why adding the compiler barrier inhibits the optimization, since it
prevents the ftree-ch to optimize the internal loop reorganization and it is
passed as is to later optimizations phases.
It is also a generic pass that affects all architecture, albeit the resulting
stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
usage using -fstack-usage along with -O2:
arm 632
aarch64 448
powerpc 912
powerpc64le 560
s390 600
s390x 632
i386 1376
x86_64 784
Also, -fconserve-stack does not really help with this pass since ftree-ch does
not check the flag usage. The fconserve-stack currently only seems to effect
the inliner by setting both large-stack-frame and large-stack-frame-growth to
some conservative values.
The straightforward change I am checking is just to disable tree-ch optimization
if fconserve-stack is also enabled:
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index b894a7e0918..b14dd66257c 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -291,7 +291,8 @@ public:
{}
/* opt_pass methods: */
- virtual bool gate (function *) { return flag_tree_ch != 0; }
+ virtual bool gate (function *) { return flag_tree_ch != 0
+ && flag_conserve_stack == 0; }
/* Initialize and finalize loop structures, copying headers inbetween. */
virtual unsigned int execute (function *);
On powerpc64le with gcc master:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 496 static
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage -fconserve-stack && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 176 static
The reference for minimal stack usage is with -Os:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 32 static
I will try to check if also enable the same test for -fgcse and -free-ter
do make sense.
[1] https://godbolt.org/z/WKa-Bd
# Progress #
o Upstream GDB
* Make remote packet length in debugging output adjustable (as
opposed to fix to 512 bytes).
* Investigated ARM sim build issues with the GCC default moving to
-fno-common.
o GDB:
* GNU-644 - [GDB, AArch64] gdb.base/step-over-syscalls.exp failures
- No progress yet. Waiting for Kernel feedback.
* [RESOLVED] GNU-645 - gdbserver is not using SVE register
descriptions properly
- Pushed a fix upstream.
* GNU-170 - GDB BZ #21221 - gdb hangs while stepping an empty loop
- On hold for now.
o Friday off
# Plan #
o Upstream GDB
* Fox -fno-common build issues with ARM sim.
o GDB
* GNU-644 - [GDB, AArch64] gdb.base/step-over-syscalls.exp failures
- Continue working on a fix.
== This Week ==
* GCC
- PR92554: Spent some time triaging the issue, but gave up after
Richard posted better fix.
- PR89007: Addressing upstream suggestions.
- PR92608: Committed fix to trunk.
- GNU-583: Looking thru Kugan's patch and upstream discussion.
* Validation
- Submitted patch to add --gcc_patch_file option to abe.
- Submitted patch to remove --interactive from abe.
== Next Week ==
- Continue ongoing tasks
== Progress ==
* GCC:
- -mpure-code on v6m: sent an updated patch, waiting for approval.
* BFD Linker:
- non-contiguous memory support: partial prototype working on the
use-case, but causes regressions.
* GCC upstream validation:
- reported several issues
* misc:
- infra fixes / troubleshooting / reviews
== Next ==
* GCC: pure-code/v6m, handle feedback
* Binutils: support non-contiguous memory regions in linker
QEMU Tooling ([VIRT-252])
=========================
Extend gdbstub for SVE ([VIRT-281])
- worked on [v2 rebase addressing comments]
[VIRT-281] https://projects.linaro.org/browse/VIRT-281
[v2 rebase addressing comments]
https://github.com/stsquad/qemu/tree/gdbstub/sve-registers-v2
Upstream Work ([VIRT-109])
==========================
- general poking around and stress testing on the run up to release
- documented some outstanding issues [on the planning page]
- posted {PULL for rc3 0/5} a few doc and testing tweaks Message-Id:
<20191120105801.2735-1-alex.bennee(a)linaro.org>
- posted {PATCH for 4.2 v1 0/3} some tests/vm fixes Message-Id:
<20191122112231.18431-1-alex.bennee(a)linaro.org>
[VIRT-109] https://projects.linaro.org/browse/VIRT-109
[on the planning page] https://wiki.qemu.org/Planning/4.2
Other Activities
================
- finalising [draft of KVM Forum conference report]
- will publish on Monday once Beata adds the last note
[draft of KVM Forum conference report]
https://collaborate.linaro.org/pages/resumedraft.action?draftId=128647720
Completed Reviews [3/3]
=======================
{PATCH v2 0/6} Make the qemu_logfile handle thread safe.
Message-Id: <20191115131040.2834-1-robert.foley(a)linaro.org>
- CLOSING NOTE [2019-11-22 Fri 17:15]
Ad of v3 this is ready to go, just awaiting the tree to open again
Added: <2019-11-15 Fri>
{PATCH v3 0/6} Make the qemu_logfile handle thread safe.
Message-Id: <20191118211528.3221-1-robert.foley(a)linaro.org>
{PATCH 0/6} Enable Travis builds on arm64, ppc64le and s390x
Message-Id: <20191119170822.45649-1-thuth(a)redhat.com>
Current Review Queue
====================
* {PATCH 0/1} tests/vm: Allow to set path to qemu-img
Message-Id: <20191114134246.12073-1-wainersm(a)redhat.com>
Added: <2019-11-14 Thu>
* {PATCH v7 0/8} Acceptance test: Add "boot_linux" acceptance test
Message-Id: <20191104151323.9883-1-crosa(a)redhat.com>
Added: <2019-11-04 Mon>
* {RFC 0/3} tests/vhost-user-fs-test: add vhost-user-fs test case
Message-Id: <20191025100152.6638-1-stefanha(a)redhat.com>
Added: <2019-10-25 Fri>
* {PATCH v5 00/22} target/arm: Implement ARMv8.5-MemTag, system mode
Message-Id: <20191011134744.2477-1-richard.henderson(a)linaro.org>
Added: <2019-10-11 Fri>
--
Alex Bennée
[Morello]
- LLD finished pre-review refactoring and splitting up into reviewable chunks
- Implemented range-extension and interworking thunks to test
interaction with aligning .text to comply with Cheri Concentrate
- Answered some questions from Linaro tech-leads about Morello
Plans
- Rebase once CUCL merge has been completed and submit for review.
Planned Absences:
Christmas Holiday 16th December - 3rd January inclusive
Progress:
* VIRT-65 [QEMU upstream maintainership]
- code review:
+ Marc-Andre's series trying to get rid of QOM pointer properties
+ various minor bits for rc2
- unsuccessfully tried to work out why one of QEMU's test
cases asserts on BSD hosts only
- some time consumed by office-move
thanks
-- PMM