linaro-toolchain November 2012

linaro-toolchain@lists.linaro.org

16 participants
37 discussions

Re: [PATCH] ARM: decompressor: clear SCTLR.A bit for v7 cores

by Michael Hope

On 6 November 2012 02:48, Rob Herring <robherring2(a)gmail.com> wrote: > > On 11/05/2012 05:13 AM, Russell King - ARM Linux wrote: > > On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote: > >> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote: > >>> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote: > >>>> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote: > >>>>> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote: > >>>>>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote: > >>>>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote: > >>>>>>> > >>>>>>>> While v6 can support unaligned accesses, it is optional and current > >>>>>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for > >>>>>>>> v6. > >>>>>>> > >>>>>>> not true according to the gcc changes page > >>>>>> > >>>>>> What are you going to believe: documentation or what the compiler > >>>>>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access > >>>>>> support backported and 4.7.2, unaligned accesses are emitted for v7 > >>>>>> only. I guess default here means it is the default unless you change the > >>>>>> default in your build of gcc. > >>>>> > >>>>> Since ARMv6 can handle unaligned access in the same way as ARMv7 > >>>>> it seems a clear bug in gcc which might hopefully get fixed. > >>>>> Thus in this case I think it is reasonable to follow the > >>>>> gcc documentation, otherwise the code would break for ARMv6 > >>>>> when gcc gets fixed. > >>>> > >>>> But the compiler can't assume the state of the U bit. I think it is > >>>> still legal on v6 to not support unaligned accesses, but on v7 it is > >>>> required. All the standard v6 ARM cores support it, but I'm not sure > >>>> about custom cores or if there are SOCs with buses that don't support > >>>> unaligned accesses properly. > >>> > >>> Well, I read the "...since Linux version 2.6.28" comment > >>> in the gcc changes page in the way that they assume the > >>> U-bit is set. (Although I'm not sure it really is???) > >> > >> Actually, the kernel checks the arch version and the U bit on boot, > >> and chooses the appropriate setting for the A bit depending on the > >> result. (See arch/arm/mm/alignment.c:alignment_init().) > > > > That is in the kernel itself, _after_ the decompressor has run. It is > > not relevant to any discussion about the decompressor. > > > >> Currently, we depend on the CPU reset behaviour or firmware/ > >> bootloader to set the U bit for v6, but the behaviour should be > >> correct either way, though unaligned accesses will obviously > >> perform (much) better with U=1. > > > > Will someone _PLEASE_ address my initial comments against this patch > > in light of the fact that it's now been proven _NOT_ to be just a V7 > > issue, rather than everyone seemingly buring their heads in the sand > > over this. > > I tried adding -munaligned-accesses on a v6 build and still get byte > accesses rather than unaligned word accesses. So this does seem to be a > v7 only issue based on what gcc will currently produce. Copying Michael > Hope who can hopefully provide some insight on why v6 unaligned accesses > are not enabled. This looks like a bug. Unaligned access is enabled for armv6 but seems to only take effect for cores with Thumb-2. Here's a test case both with unaligned field access and unaligned block copy: struct foo { char a; int b; struct { int x[3]; } c; } __attribute__((packed)); int get_field(struct foo *p) { return p->b; } int copy_block(struct foo *p, struct foo *q) { p->c = q->c; } With -march=armv7-a you get the correct: bar: ldr r0, [r0, #1] @ unaligned @ 11 unaligned_loadsi/2 [length = 4] bx lr @ 21 *arm_return [length = 12] baz: str r4, [sp, #-4]! @ 25 *push_multi [length = 4] mov r2, r0 @ 2 *arm_movsi_vfp/1 [length = 4] ldr r4, [r1, #5]! @ unaligned @ 9 unaligned_loadsi/2 [length = 4] ldr ip, [r1, #4] @ unaligned @ 10 unaligned_loadsi/2 [length = 4] ldr r1, [r1, #8] @ unaligned @ 11 unaligned_loadsi/2 [length = 4] str r4, [r2, #5] @ unaligned @ 12 unaligned_storesi/2 [length = 4] str ip, [r2, #9] @ unaligned @ 13 unaligned_storesi/2 [length = 4] str r1, [r2, #13] @ unaligned @ 14 unaligned_storesi/2 [length = 4] ldmfd sp!, {r4} bx lr With -march=armv6 you get a byte-by-byte field access and a correct unaligned block copy: bar: ldrb r1, [r0, #2] @ zero_extendqisi2 ldrb r3, [r0, #1] @ zero_extendqisi2 ldrb r2, [r0, #3] @ zero_extendqisi2 ldrb r0, [r0, #4] @ zero_extendqisi2 orr r3, r3, r1, asl #8 orr r3, r3, r2, asl #16 orr r0, r3, r0, asl #24 bx lr baz: str r4, [sp, #-4]! mov r2, r0 ldr r4, [r1, #5]! @ unaligned ldr ip, [r1, #4] @ unaligned ldr r1, [r1, #8] @ unaligned str r4, [r2, #5] @ unaligned str ip, [r2, #9] @ unaligned str r1, [r2, #13] @ unaligned ldmfd sp!, {r4} bx lr readelf -A shows that the compiler planned to use unaligned access in both. My suspicion is that GCC is using the extv pattern to extract the field from memory, and that pattern is only enabled for Thumb-2 capable cores. I've logged PR55218. We'll discuss it at our next meeting. -- Michael

12 years, 5 months

corruption while doing open close

by Ajeet Yadav

Linux version 3.0.33 (Cortex A15) Below program crashes with 2.14.1 glibc but runs fine with 2.11.1 glibc. #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <string.h> #include <errno.h> #define MAX_LINE_SIZE 80 #define MAX_THREAD 20 #define MAX_POPEN 10 #define MALLOC_SIZE 16 void* pipe_thread(void *arg) { int i; char *p = NULL; FILE *fp[MAX_POPEN]; char shellCommand[MAX_LINE_SIZE]; memset(shellCommand, 0x00, MAX_LINE_SIZE); sprintf(shellCommand, "mount"); signal(SIGPIPE, SIG_IGN); while (1) { for (i = 0; i < MAX_POPEN; ++i) { fp[i] = popen(shellCommand, "r"); } if (p) { free(p); } for (i = 0; i < MAX_POPEN; ++i) { if (fp[i]) pclose(fp[i]); } p = malloc(MALLOC_SIZE); if (p) memset(p, 0, MALLOC_SIZE); } return NULL; } int main(int argc, char *argv[]) { int i; pthread_t tid; for (i = 0; i < MAX_THREAD; ++i) { pthread_create(&tid, NULL, &pipe_thread, (void*)NULL); } sleep(60); } gdb logs: (gdb) bt #0 0x4014f998 in _IO_new_fclose (fp=0x1) at iofclose.c:74 #1 0x4015b59c in fwide (fp=0xb8, mode=<optimized out>) at fwide.c:47 #2 0x00008a6c in ?? () Cannot access memory at address 0x8 #3 0x00008a6c in ?? () Cannot access memory at address 0x8 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

12 years, 8 months

[ACTIVITY] report week 48

by Peter Maydell

[Short week: 3 days] * looked at (but failed to reproduce) a hang in QEMU reported by Christoffer when shutting down a KVM ARM guest using TUN/TAP networking * investigated LP:1084148 (segfault in qemu usermode) sufficiently to diagnose it as probably another of qemu's "can't handle multithreaded guest programs" bugs * fixed some problems with QEMU's secondary CPU boot code which were masked by errors in QEMU's GIC model but revealed by real hardware (ie KVM); fixed the GIC model bugs as well * investigated LP:955379 (cmake hangs under qemu-arm-static). Tracked down to a race condition involving signal delivery, the fix to which would require the significant redesign I sketched out here a year or so ago: http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00384.html KVM blueprint progress tracker: http://ex.seabright.co.nz/helpers/backlog?group_by=topic&colour_by=state&pr… -- PMM

12 years, 8 months

[ACTIVITY] 26-30 November

by Matthew Gretton-Dann

== Blueprints == Initial Current Actual initial-aarch64-backport 31 Oct 2012 7 Dec 2012* aarch64-baremetal-testing 31 Oct 2012 7 Dec 2012* fix-gcc-multiarch-testing 31 Dec 2012 31 Dec 2012 backport-fma-intrinsic 31 Dec 2012 31 Dec 2012 fused-multiply-add-support 31 Dec 2012 31 Dec 2012 gcc-investigate-lra-for-arm 31 Dec 2012 31 Dec 2012 == Progress == * Admin * Interviewing * Preparation for taking over from Michael * Investigate patches for literal pool layout bug * Applied * PINGed triplet backport patches upstream * Other bug issues * Including an issue running SPEC2K on x86 with recent trunk * And a 4.6 gcc-linaro only issue == Next Week == * Start leading Toolchain team * Run HOT/COLD partitioning benchmarks * Analyse ARM results * On x86_64 to see what the actual benefit we could get * initial-aarch64-backport & aarch64-baremetal-testing * Finish documentation * gcc-investigate-lra-for-arm * Analyse benchmarks * fix-gcc-multiarch-testing * Come up with strawman proposal for updating testsuite to handle testing with varying command-line options. == Future == * backport-fma-intrinsic & fused-multiply-add-support * Backport patches once fix-gcc-multiarch-testing has been done. == Planned Leave == * Monday 24 December - Monday 31 December -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann(a)linaro.org

12 years, 8 months

Atomic builtins questions

by Yvan Roux

Hi, I think I have identified some issues with the atomic builtins, but I want your advises. For instance : A: __atomic_store_n (addr, val, __ATOMIC_SEQ_CST); gives the armv7 code: DMB sy STR r1, [r0] DMB sy but if I have well understood, the DMBs instructions only provide the property that the code is sequentially consistent, but not the atomicity for which we have to use the LDREX/STREX instructions. Thus I think that the code should be : DMB sy 1: LDREX r2, [r0] STREX r1, r2, [r0] TEQ r1, #0 BNE 1b B: __atomic_load_n (addr, __ATOMIC_ACQUIRE); gives the armv7 code: DMB sy LDR r0, [r0] but the load-acquire semantique specifies that all loads and stores appearing in program order after the load-acquire will be observed after the load-acquire, thus the DMB should be after the LDR, no ? -- Yvan

12 years, 8 months

ARMv8 load acquire / store release question

by Yvan Roux

Hi, I'm working on the libatomic-ops (part of the Boehm gc) AArch64 support, I mainly use GCC's __atomic builtins to do this, but in our 4.7 version they don't use the load acquire / store release instructions now available in the ARMv8 ISA. These instructions are used in the mainline GCC (in atomic.md) but not in their exclusive form, I understand that it should be due to the performance penalty, but I want your feeling on that point as I don't find the ARMv8 ISA really clear. If we want to implement an atomic load acquire, is LDAR x1, [x0] sufficient, or do we have to write it like that : L: LDAXR x0, [x3] STEX x1, x0, [x3] CBZ x0, L1 Thanks Yvan

12 years, 8 months

[MINUTES] Toolchain Performace Call 2012-11-27

by Matthew Gretton-Dann

All, [Editiorial: Michael & I discussed making what we do as a working group more visible at Connect. One thing we discussed was making our meeting minutes more visible by emailing actions out after each meeting. This will be part of the job of the 'minuter' - a job I plan to spread around as I am useless at it whilst also running a call - more info on the Wiki: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings] The minutes of the performance call held on 27 November 2012 can be found at: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2012-11-27 In summary the actions from the meting are: * mgrettondann split LRA blueprint * Christophe to update Hot/Cold partitioning bugzilla * mgrettondann: benchmark on Hold/Cold partitioning * Michael to log a ticket to improve reporting of benchmarks when the run complete. * Ramana to log EEMBC failure with Hot/Cold partitioning into bugzilla. * Christophe to backport bswap16 builtin, except for the testcase which fails in one of our configurations (Thumb1 + hard FP ABI) The next performance call will be on 11 December 2012 and the agenda can be found at: https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2012-12-11 Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann(a)linaro.org

12 years, 8 months

[ANNOUNCE] Linaro Toolchain Binaries 2012.11 released

by Zhenqiang Chen

The Linaro Toolchain Working Group is pleased to announce the 2012.11 release of the Linaro Toolchain Binaries, a pre-built version of Linaro GCC and Linaro GDB that runs on generic Linux or Windows and targets the glibc Linaro Evaluation Build. Uses include: * Cross compiling ARM applications from your laptop * Remote debugging * Build the Linux kernel for your board What's included: * Linaro GCC 4.7 2012.11 * Linaro GDB 7.5 2012.09 * A statically linked gdbserver * A system root * Manuals under share/doc/ The system root contains the basic header files and libraries to link your programs against. The Linux version is supported on Ubuntu 10.04.3 and 12.04, Debian 6.0.2, Fedora 16, openSUSE 12.1, Red Hat Enterprise Linux Workstation 5.7 and later, and should run on any Linux Standard Base 3.0 compatible distribution. Please see the README about running on x86_64 hosts. The Windows version is supported on Windows XP Pro SP3, Windows Vista Business SP2, and Windows 7 Pro SP1. The binaries and build scripts are available from: https://launchpad.net/linaro-toolchain-binaries/trunk/2012.11 Need help? Ask a question on https://ask.linaro.org/ Already on Launchpad? Submit a bug at https://bugs.launchpad.net/linaro-toolchain-binaries On IRC? See us on #linaro on Freenode. Other ways that you can contact us or get involved are listed at https://wiki.linaro.org/GettingInvolved.

12 years, 8 months

[Activity] Week 47

by Zhenqiang Chen

Summary: * Investigate shrink-wrap result. * Prepare for Linaro toolchain binary release, script merge and aarch64 test. Details: 1. Investigate shrink-wrap result of function Ray_In_Bound. By default, ARM/MIPS/PPC/X86 toolchain can not shrink-wrap the function. For ARM, there is copy "r6 = r1" which blocks the optimization. By hacking the assemble code, I got ~3% performance improvement for 453.povray benchmark. 2. Setup AARCH64 simulation environment by following http://www.linaro.org/engineering/armv8. 3. Write scripts to collect branch cost performance. It will take weeks to get all the benchmark results. 4. Smoke test Linaro toolchain binaries 2012.11 release. 5. Try export crosstool-ng trunk to a bzr project. bzr fast-import always fail on Ubuntu 10.04, but it works on 12.04. 6. RM toolchain related work. Plans: * Collect performance data for branch cost tuning. * Linaro binary toolchain 2012.11 release. * Verify shrink-wrap bugs. Best regards! -Zhenqiang

12 years, 8 months

[ACTIVITY] 19-23 November 2012

by Yvan Roux

== Progress == * Infrastructure: - Upgraded my laptop. * Boehm GC AArch64 support: - Added basic aarch64 support to libatomics (by use of GCC's atomics builtins). - Analysed testsuite results and the generated code. - Identified potential builtins' issues. * Some internal support. == Next == * Re-work libatomics support.

12 years, 8 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain November 2012