== Progress ==
* Android LLVM
- Discussions on progress, trying to line up kernel+AOSP together
- Google has bailed Clang/LLVM for L release, will consider for next one
* Vectorizer
- Progressing on the implementation of the pragma parser
- http://llvm.org/PR18086
- Discussions about introduction of generic function vectorizer (ARM)
* Release 3.4
- Tested RC3, no regressions on tests or benchmarks
- http://people.linaro.org/~rengolin/llvm/
- http://llvm.org/pre-releases/3.4/rc3/
- Looked at a bug on the vectorizer for pentium3/freebsd
- Work around found, not easy enough to get them to RC4
* Background
- Many discussions, many support requests, many patch reviews
- Adding BOF notes to dev meeting site
- Booking train and hotel for FOSDEM 14
* Time
- CARD-862 8/10
- Others 2/10
* Happy Holidays! And see you in January!
== Issues ==
* Running benchmarks on my Chromebook is very unstable.
- Even though the standard deviation is small in two different moments,
the two results are statistically incompatible.
- The wireless network on the Chromebook, as widely known,
is unstable and unpredictable.
- I need a graphical interface, so I can do stuff during Connects,
or to see Phoronix results and that is probably the responsible
for all instability
- Next release, I'll use an ODroid (or Arndale) for benchmarks
== Plan ==
* Holidays!
== Progress ==
- 2013.12 releases (4/10)
* Handover to Michael
* Committed remaining backports/branch merges
* Unexpected regression in 4.7 branch narrowed to a linker bug, now fixed.
- cross validations (2/10)
* stabilized armeb+qemu validations
- misc (4/10): misc conf-calls and meetings; internal meetings
== Next ==
Next 2 weeks off (Dec 23rd Jan 3rd)
Merry Christmas and happy new year to all of you.
Hello,
I am using the pre-built toolchain gcc-arm-none-eabi-4_6-2012q2 from linaro
to compile u-boot (u-boot-linaro-stable) and to compile my standalone
applications to run on target(PandaBoard ES rev b2)
hello_world standalone application which comes with u-boot is executing
fine on target when I disable CONFIG_SYS_THUMB_BUILD, but when I enable it,
target gets reset with following information
Panda # go 82000000 hello
## Starting application at 0x82000000 ...
undefined instruction
pc : [<8200000c>] lr : [<bff83147>]
sp : bfeffe40 ip : bfeffc10 fp : 00000000
r10: 00000003 r9 : bffac954 r8 : bfefff68
r7 : bff01d88 r6 : 82000000 r5 : bff01d8c r4 : 00000003
r3 : 82000000 r2 : bff01d8c r1 : bff01d8c r0 : 00000002
Flags: nzCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
resetting ...
U-Boot SPL 2013.01.-rc1-g0f45941 (Dec 17 2013 - 14:23:41)
OMAP4460 <http://www.ti.com/product/OMAP4460> ES1.1
OMAP SD/MMC: 0
reading u-boot.img
reading u-boot.bin
reading u-boot.bin
......
Can anyone please help me why thumb mode build is failing?
On 18/12/13 05:06, Jonathan S. Shapiro wrote:
> At the risk of sticking my nose in, this isn't a startup code issue.
> It's a contract issue.
>
> First, I don't buy Richard's argument about memcpy() startup costs and
> hard-to-predict branches. We do those tests on essentially every
> *other* RISC platform without complaint, and it's very easy to order
> those branches so that the currently efficient cases run well. Perhaps
> more to the point, I haven't seen anybody put forward quantitative
> data that using the MMU for unaligned references is any better than
> executing those branches. Speaking as a recovering processor
> architect, that assumption needs to be validated quantitatively. My
> guess is that the branches are faster if properly arranged.
>
> Second, this is a contract issue. If newlib intends to support
> embedded platforms, then it needs to implement algorithms that are
> functionally correct without relying on an MMU. By all means use
> simpler or smarter algorithms when an MMU can be assumed to be
> available in a given configuration, but provide an algorithm that is
> functionally correct when no MMU is available. "Good overall
> performance in memcpy" is a fine thing, but it is subject to the
> requirement of meeting functional specifications. As Jochen Liedtke
> famously put it (read this in a heavy German accent): "Fast, ya. But
> correct? (shrug) Eh!"
>
> So: we need a normative statement saying what the contract is. The
> rest of the answer will fall out from that.
>
> I do agree with Richard that startup code is special. I've built
> deeply embedded runtimes of one form or another for 25 years now, and
> I have yet to see a system where optimizing a simplistic byte-wise
> memcpy during bootstrap would have made any difference in anything
> overall. That said, if the specification of memcpy requires it to
> handle incompatibly aligned pointers (and it does), and the contract
> for newlib requires it to operate in MMU-less scenarios in a given
> configuration (which, at least in some cases, it does), it's
> completely legitimate to expect that bootstrap code can call memcpy()
> and expect behavior that meets specifications.
>
> So what's the contract?
>
I disagree with your assertion that newlib *requires* it to operate in
an MMU-less scenario for all targets; it only does so when the target
can reasonably be expected to not have an MMU.
The only contract that exists is the one written in the C standard:
7.23.2.1#2 The memcpy function copies n characters from the object
pointed to by s2 into the object pointed to by s1. If copying takes
place between objects that overlap, the behavior is undefined.
But that is written on the assumption that we're in a normal execution
environment, not in some special case.
What you're missing is that AArch64 is (in ARM ARM terms) an A-profile
only environment where an MMU is mandated in the system. Furthermore,
processors implementing the architecture will *expect* that the MMU be
turned on as soon as possible after boot, since without this the caches
cannot be used and without those the performance will be truly horrible.
Once the caches are enabled, it's perfectly reasonable to assume that
memcpy will only be used for copies to and from NORMAL memory, since
other types of memory have potential side effects, which means that use
of memcpy would be unsafe.
If you want to write an MMU-less memcpy, then feel free to write one;
but please install it with a different interface -- something like
__memcpy_nommu(). Don't penalise the standard case for the non-standard
exceptional one.
R.
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any
hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built
with gcc-4.8). However, all of the test binaries that the build spits
out immediately segfault. gdb-ing shows that they segfault inside this
macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \
extern __thread T* _ ## p; \
template<> inline T* TSP<T>::get() const { return _ ## p; } \
extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get()
const that we're segfaulting in. This is the disassembly of this
function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]!
0x00000000004b4b70 <+4>: mov x29, sp
0x00000000004b4b74 <+8>: str x0, [x29,#16]
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
0x00000000004b4b80 <+20>: nop
0x00000000004b4b84 <+24>: nop
0x00000000004b4b88 <+28>: mrs x1, tpidr_el0
0x00000000004b4b8c <+32>: add x0, x1, x0
=> 0x00000000004b4b90 <+36>: ldr x0, [x0]
0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32
0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers
x0 0x7fb863fd70 548554407280
x1 0x7fb7ff76f0 548547819248
x2 0x0 0
x3 0x7fb7fc11b8 548547596728
x4 0x1 1
x5 0x0 0
x6 0x50 80
x7 0x0 0
x8 0x0 0
x9 0x6165727473676f4c 7018141438804717388
x10 0x0 0
x11 0x0 0
x12 0x2 2
x13 0x10 16
x14 0x0 0
x15 0x7fb7e5e590 548546143632
x16 0x64b3d8 6599640
x17 0x7fb7f667d0 548547225552
x18 0x7fffffdab0 549755804336
x19 0x7fffffed50 549755809104
x20 0xb 11
x21 0xb 11
x22 0x6500b0 6619312
x23 0x650070 6619248
x24 0x7fffffff 2147483647
x25 0x64db40 6609728
x26 0x7fffffeda0 549755809184
x27 0x653d00 6634752
x28 0x7fffffe750 549755807568
x29 0x7fffffe4d0 549755806928
x30 0x4b4ed4 4935380
sp 0x7fffffe4d0 0x7fffffe4d0
pc 0x4b4b90 0x4b4b90 <mongo::TSP<mongo::OwnedPointerVector<std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > >::get() const+36>
cpsr 0x20000000 536870912
fpsr 0x0 0
fpcr 0x0 0
If I recompile this object file without -fPIC, it works.
I guess I see three things that could be wrong:
1) The operand to "adrp x0, 0x64c000"[1]
2) The operand to "ldr x0, [x0,#776]"
3) The value of tpidr_el0
Oh, and I guess:
4) The setup of tls has gone wrong and the address in x0 _ought_ to be
accessible but isn't for some reason.
Any hints on which of these seems mostly likely to be the culprit?
Chers,
mwh
[1] FWIW, objdump reports 0x64c000 as "_GLOBAL_OFFSET_TABLE_+0x2d0", not
sure why that doesn't show up in gdb's disassembly).
== Progress ==
* Bugfixing and testing QEMU AArch64 FP patches (3/10, VIRT-183)
* Debugging and submitting a patch for ARM gdb ifunc test failures (1/10)
* Two day week due to holidays
== Issues ==
* None
== Plan ==
* Back on the 9th January, have a good Christmas and New Year everybody!
--
Will Newton
Toolchain Working Group, Linaro
Hi,
We've noticed an issue trying to use the Linaro AArch64 binary bare metal
toolchain release with the MMU turned off for some low-level tests.
Anytime puts, sprintf, etc. gets called, a reent structure gets created with
references to STDIN, STDOUT, STDERR FILE types. A member in the __sFile
struct, _mbstate, is an 8 byte struct, but is not aligned on an 8 byte
boundary. This means that when memset (or a similar function) gets called on
this struct, and doesn't operate one byte at a time, a data alignment fault
will be generated when operating out of device memory, such as on a system
where the MMU has not yet been turned on yet.
I'm still examining possible fixes (I'll probably look at building with
-mstrict-align first), but I wanted to check if anyone had thoughts on the
subject and if Newlib upstream or Linaro consider using Newlib with the MMU
turned off to be a valid use case or if running the code that turns on the MMU
is considered a prerequisite to everything else.
Thanks,
Christopher
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.
== Progress ==
TCWG-293 (9/10)
- wrote and tested 64bit division code
- it seems to work
- still need to do performance testing
TCWG-347 Fix PR59142 (1/10)
- split into series of 3 patches
- patch almost ready, was held up by non-availability of the lab
- need to bootstrap on Thumb-1 to prove change made in response to
review comments
TCWG-346 AArch64 Benchmarking: CoreMark & Dhrystone
- no significant progress, no access to the lab
== Next ==
Pick up aarch64 benchmarking when the board becomes accessible again
Submit PR59142
== Progress ==
- 2013.12 releases (4/10):
* stalled due to lab unavailability.
* A couple of backports are waiting for approval, another one is
being debugged.
- cross-validation (4/10): fixed arneb+qemu validations.
- misc (2/10): misc conf-calls and meetings
== Next ==
- Make 2013.12 releases
- cbuild2: continue testing, try to make 4.7 source release
- libsanitizer on AArch64: resume work
== Future ==
Next 2 weeks off (Dec 23rd-Jan 3rd)
== Issues ==
* 1.5 day of due to car issue. (3/10)
* Calxedas are down after lab maintenance.
== Progress ==
* LRA on AArch32:
o TCWG-343 : Make LRA the default for the ARM backend (5/10)
- Turn LRA on by default committed as rev205887
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01088.html
- New Thumb regressions reported (Cortex-m0 and bootstrap),
analysis ongoing.
- Analysed last week regressions and reported them upstream,
Vladimir fixed them at rev205974.
- iWMMXT issue : work ongoing.
o TCWG-345 : Analyse performance of LRA for ARM. (0/10)
- No progress this week.
* Reviewed some merge requests. (1/10)
* Various meetings. (1/10)
== Next ==
* Continue LRA, merge and patch reviews.