Hey,
Regarding the GCC ABI 5 issue, I was wondering what's the policy
behind updating packages on stable updates for both Debian and Ubuntu.
Our time frame is a bit constrained, and we definitely will have to
take some hard decisions in the next six months, so I'd like to
understand everything that is at stake before I have my own opinion.
LLVM has a 6 month major cycle, releasing around February / August.
Major releases are allowed to break the ABI. Major breakages need one
release warning period.
Ubuntu has a 6 month release cycle, around April / October. IIUC,
major releases are allowed to have new versions of packages, but
updates for the next few years have to keep within the same major
release.
Debian has a -1 years release cycle (heh), and has the same major /
minor policy, which makes it a lot harder to update major versions.
However, I believe unstable is still not closed, nor will be in August
this year, so updating to LLVM 3.9 will not be a problem, but it will
mean users will have to wait a bit more to get a working LLVM.
The time frame is then:
3.8.0 released March (without the fix)
Ubuntu X released April
3.9.0 releases August (hopefully with a fix)
Ubuntu X+1 released October
Debian freezes ??
LLVM 3.8.1 ??
If we don't back-port GCC ABI 5 into 3.8.1, Ubuntu users will not have
the fix ever, unless you *can* update to 3.9.0 in August.
Ubuntu X+1 will be fine using 3.9, as will Debian after August, unless
you guys freeze before that.
I believe both Debian and Ubuntu have a trunk-based LLVM package for
experimental use only, and it would be bad, but not completely broken,
to recommend users to use that meanwhile.
If Debian freezes *before* 3.9.0 is out, or if Ubuntu can't update to
3.9.0 on April's release, then we'll have a strong reason to back-port
the change to 3.8.x. If not, even though it will be uncomfortable for
users until August, the argument is not that strong and will be hard
to get it through.
Any comments? Ideas? Does any of that make sense?
cheers,
--renato
Hi,
I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC
5.2-2015.11-1) and have noticed a difference with the __sync
intrinsics.
Here is the simple test case
--- cut here ---
int add_int(int add_value, int *dest)
{
return __sync_add_and_fetch(dest, add_value);
}
--- cut here ---
Compiling with the stock gcc 5.2 (-S -O3) I get
---------
add_int:
.L2:
ldaxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
mov w0, w2
ret
---------
Wheras with Linaro gcc 5.2 I get
---------
add_int:
.L2:
ldxr w2, [x1]
add w2, w2, w0
stlxr w3, w2, [x1]
cbnz w3, .L2
dmb ish
mov w0, w2
ret
---------
Why the extra (unnecessary?) memory barrier?
Also, is it worthwhile putting a prfm before the ldaxr. EG
add_int:
prfm pst1strm, [x1]
.L2:
ldaxr w2, [x1]
See the following thread
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html
All the best,
Ed
== Progress ==
o GCC dev. (7/10)
* Remote validation sanitizing:
- fixed last issues in dejagnu patch and submitted it uptsream
- 2 more cleanup/fix dejagnu patches submitted and merged upstream
- proposed a fix/workaround for the output pattern issues (>400
failures removed with this patch)
o Misc (3/10)
* Various meetings
* internal discussions
== Plan ==
o Try to follow connect remotely
o Extended validation work
== Progress ==
* GCC bugs:
- #2073 tried to reproduce it with a manually-built toolchain. No luck
* GCC validation:
- added support to choose simulated cpu (different from --with-cpu)
* GCC:
- completing Neon intrinsics tests, to prepare cleanup
* Validation:
- small improvements
* Misc (conf calls, meetings, emails, ...)
== Next ==
Remote Connect
== Progress ==
* Support (5/10)
- Working on PR17193
- Continue review on D17141
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- Connect preparations
- GCC ABI 5 discussions
- Assessing Swift calling convention impact ARM back-end
- Interviews
# Progress #
* TCWG-545, Handle "branch-to-self" instruction in single stepping.
[5/10] Patches are posted upstream for review.
* TCWG-532, one patch is committed and one patch is posted for review.
[2/10]
* Tweak ARM process record. [2/10]
Two patches are pushed in. Many test fails are fixed.
* FSF patches review. [1/10].
# Plan #
* Linaro Connect.
--
Yao
Hi,
I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words:
cmp x2, 8 <<< (1)
sub sp, sp, #64 <<< (2)
bhi .L2
cmp w2, 8 <<< (1)
bls .L15
.L2:
add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
(2) Nowhere in the function does it store anything on the stack, so why
drop and restore the stack every time. Also, minor quibble in the
disass, why does sub use #64 whereas add uses just '64' (appreciate this
is probably binutils, not gcc).
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldrb w2, [x3,w2,uxtw] <<< (3)
adr x3, .Lrtx4
add x2, x3, w2, sxtb #2
br x2
(3) Why use a byte table, this is not some sort of embedded system. Use
a word table and this becomes.
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldr x2, [x3, x2, lsl #3]
br x2
An aligned word load takes exactly the same time as a byte load and we
save the faffing about calculating the address.
.L10:
ldp x6, x7, [x0]
ldp x4, x5, [x0, 16]
ldp x2, x3, [x0, 32] <<< (4)
stp x2, x3, [x1, 32] <<< (4)
stp x6, x7, [x1]
stp x4, x5, [x1, 16]
(4) Seems to be something wrong with the load scheduler here? Why not
move the stp x2, x3 to the end. It does this repeatedly.
Unfortunately as this function is performance critical it means I will
probably end up doing it in inline assembler which is time consuming,
error prone and non portable.
* Whinge mode off
Ed