All,
During Connect the suggestion was made that each working group should have
its own IRC Channel for discussions and topics relating to the group in
particular (as opposed to #linaro which is 'generic' Linaro conversations).
Therefore I have just set up #linaro-tcwg on Freenode for the Toolchain
Working Group.
This channel is public and open to anyone who wants to talk with the TCWG
group about anything toolchain related.
Thanks,
Matt
--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
7 working days, then Thanksgiving.
[VIRT-262 # ARMv8.1-PAN Privileged Access Never]
Finished, still need to post.
[VIRT-273 # ARMv8.2-ATS1E1, AT S1E1R and AT S1E1W instruction variants ]
Finished, still need to post.
[VIRT-276 # ARMv8.2-UAO, PSTATE override of Unprivileged Load/Store ]
Finished, still need to post.
[VIRT-263 # ARMv8.1-VHE Virtual Host Extensions ]
FIXED! Welsh sprint with AJB; found and fixed two bugs.
Final bug causing guest kernel crash while booting fixed
upstream by Marc Zyngier vs ptrauth.
Will do some more thorough testing during rc4 and post
once the development phase opens up again.
[VIRT-327 # Richard's upstream QEMU work ]
Review of target/hexagon skeleton.
Review of arm dcpop patch set for beata.
Fixed a couple of arm translator bug for clyon.
Some investigation into a reported hppa-linux-user bug.
While I can reproduce locally, so far I have not tracked
down anything that I can prove is a translation bug.
r~
QEMU Tooling ([VIRT-252])
=========================
Extend gdbstub for SVE ([VIRT-281])
- worked on [v2 rebase addressing comments]
- posted {PATCH v2 00/14} gdbstub refactor and SVE support Message-Id:
<20191130084602.10818-1-alex.bennee(a)linaro.org>
[VIRT-281] https://projects.linaro.org/browse/VIRT-281
[working prototype]
https://github.com/stsquad/qemu/tree/gdbstub/sve-registers
[v2 rebase addressing comments]
https://github.com/stsquad/qemu/tree/gdbstub/sve-registers-v2
QEMU ARMv8.1 VHE ([VIRT_263])
=============================
- inaugural Welsh code sprint with rth
- found some new bugs, squashed some old bugs
- together with recent upstream fixes SUCCESS!
- can now boot a guest from a VHE enabled kernel :-)
[VIRT_263] https://projects.linaro.org/browse/VIRT-263
Upstream Work ([VIRT-109])
==========================
- posted {PULL for 4.2 0/3} a few vm-test fixes Message-Id:
<20191126120339.18059-1-alex.bennee(a)linaro.org>
- there are still niggling netbsd failures
- posted {PATCH for 4.2?} .travis.yml: drop xcode9.4 from build matrix
Message-Id: <20191127132430.3681-1-alex.bennee(a)linaro.org>
- investigation into [ARM HPC compiler triggered linux-user bug]
- may be 64k page related as couldn't reproduce on Ubuntu
- posted {PATCH v1 0/5} linux-user mmap debug cleanup Message-Id:
<20191128194603.24818-1-alex.bennee(a)linaro.org>
[ARM HPC compiler triggered linux-user bug]
https://bugs.launchpad.net/qemu/+bug/1853826
Other Activities
================
- published [QEMU Summit and KVM Forum trip report]
[QEMU Summit and KVM Forum trip report]
https://collaborate.linaro.org/display/CR/20191030+QEMU+Summit+and+KVM+Foru…
Absences
========
- 2nd Dec Holiday
Current Review Queue
====================
* {PATCH 0/4} python/qemu: New accel module and improvements
Message-Id: <20191115180829.10275-1-wainersm(a)redhat.com>
Added: <2019-11-28 Thu>
* {PATCH v2 0/2} Run tcg tests with tci on Travis
Message-Id: <20191128153525.2646-1-thuth(a)redhat.com>
Added: <2019-11-28 Thu>
* {PATCH 0/2} flush CPU TB cache in breakpoint_invalidate
Message-Id: <20191127220602.10827-1-jcmvbkbc(a)gmail.com>
Added: <2019-11-28 Thu>
* {RFC PATCH 00/10} hw/avr: Introduce the Arduino board
Message-Id: <20191128015030.27543-1-f4bug(a)amsat.org>
Added: <2019-11-28 Thu>
--
Alex Bennée
Progress:
* VIRT-65 [QEMU upstream maintainership]
- code review:
+ finally got back to the reset-refactoring patchset
and gave review on v5 of that. This is very nearly ready.
+ reviewed and got into 4.2 rc3 some patches from Marc Z
fixing some missing emulation/bugs that newer Linux
guest kernels trip over
+ rc3 out of the door; we will need an rc4, though
- more time consumed by office-move
thanks
-- PMM
[Morello]
Rebase of LLD against September CUCL update complete
- Painful due to LLD changing address layout (every test expected
value shifted), and a naming convention change.
- No functional changes needed to patch.
- Submitted static linking patches for review. Will send the dynamic
ones after all static linking has been merged.
Wrote up notes of Linaro Tech-leads Morello Q&A.
Misc:
Upstream LLD reviews
== Progress ==
* Out of office on Thursday
* LLVM 9.0.1
- Uploaded ARM & AArch64 binaries for rc1
- ARM: opened 2 bug reports (asan and cfi tests failing)
* Triaging check-lldb failures on AArch64 [LLVM-512]
- Opened a few more bug reports
- Got one nasty failure that I want to look into a bit more before
committing a patch XFAIL-ing everything so far
* Morello
- Got a VM working, built the toolchain, currently trying to build android
- Setting up all sorts of gerrit accounts and other minutiae
== Plan ==
* More of the same
Hi!
I've attempted to study the implementation of memcpy for 32-bit Arm cores in
Glibc (which is also found in arm-optimized-routines and first appeared in
Linaro's cortex-strings project), and I came across a peculiar snippet:
#ifdef USE_VFP
/* Magic dust alert! Force VFP on Cortex-A9. Experiments show
that the FP pipeline is much better at streaming loads and
stores. This is outside the critical loop. */
vmov.f32 s0, s0
#endif
This seems to imply that this NOP-like instruction affects CPU state and makes
the vldr/vstr instructions that follow use different datapaths that they might
otherwise? Can anyone shed more light on this, please?
I was able to trace history of this code back to revision 100 in cortex-strings
repository, where it appeared as part of a large rewrite by Will Newton:
https://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/rev…
The entire memcpy.S file in Arm optimized-routines repo can be found here:
https://github.com/ARM-software/optimized-routines/blob/master/string/arm/m…
Thanks!
Alexander
Hi Arnd,
I took a look on the stack usage issue in the kernel snippet you provided [1],
and as you have noted the most impact indeed come from -ftree-ch optimization.
It is enabled in all optimization levels besides -Os (since besides possible
increasing the stack usage it also might increase code side).
I am still fulling grasping what free-ch optimization does, but my understanding
so far is it tries to reorganize the loop for later loop optimization phases.
More specifically, what it ends up doing on the specific snippet is create extra
stack variables for the internal membber access in the inner loop (which in its
turns increase stack usage).
This is also why adding the compiler barrier inhibits the optimization, since it
prevents the ftree-ch to optimize the internal loop reorganization and it is
passed as is to later optimizations phases.
It is also a generic pass that affects all architecture, albeit the resulting
stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
usage using -fstack-usage along with -O2:
arm 632
aarch64 448
powerpc 912
powerpc64le 560
s390 600
s390x 632
i386 1376
x86_64 784
Also, -fconserve-stack does not really help with this pass since ftree-ch does
not check the flag usage. The fconserve-stack currently only seems to effect
the inliner by setting both large-stack-frame and large-stack-frame-growth to
some conservative values.
The straightforward change I am checking is just to disable tree-ch optimization
if fconserve-stack is also enabled:
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index b894a7e0918..b14dd66257c 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -291,7 +291,8 @@ public:
{}
/* opt_pass methods: */
- virtual bool gate (function *) { return flag_tree_ch != 0; }
+ virtual bool gate (function *) { return flag_tree_ch != 0
+ && flag_conserve_stack == 0; }
/* Initialize and finalize loop structures, copying headers inbetween. */
virtual unsigned int execute (function *);
On powerpc64le with gcc master:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 496 static
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage -fconserve-stack && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 176 static
The reference for minimal stack usage is with -Os:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 32 static
I will try to check if also enable the same test for -fgcse and -free-ter
do make sense.
[1] https://godbolt.org/z/WKa-Bd
# Progress #
o Upstream GDB
* Make remote packet length in debugging output adjustable (as
opposed to fix to 512 bytes).
* Investigated ARM sim build issues with the GCC default moving to
-fno-common.
o GDB:
* GNU-644 - [GDB, AArch64] gdb.base/step-over-syscalls.exp failures
- No progress yet. Waiting for Kernel feedback.
* [RESOLVED] GNU-645 - gdbserver is not using SVE register
descriptions properly
- Pushed a fix upstream.
* GNU-170 - GDB BZ #21221 - gdb hangs while stepping an empty loop
- On hold for now.
o Friday off
# Plan #
o Upstream GDB
* Fox -fno-common build issues with ARM sim.
o GDB
* GNU-644 - [GDB, AArch64] gdb.base/step-over-syscalls.exp failures
- Continue working on a fix.