linaro-toolchain February 2012

linaro-toolchain@lists.linaro.org

21 participants
51 discussions

by Barry Song

Hi guys, I compile a native gdb using linaro 2011.10 by “./configure --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi”, and the gdb runs on arm target boards directly. # gdb GNU gdb (Linaro GDB) 7.3-2011.10 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-none-linux-gnueabi". For bug reporting instructions, please see: <http://bugs.launchpad.net/gdb-linaro/>. (gdb) I can use it to debug native programs on target boards directly. For example, attach process, set breakpoints, check registers and memory. One issue is I can't see multi-threads, for example: PID 646 is system_server by ps "646 1000 159m S system_server" Then I use gdb to attach it: # gdb attach 646 (gdb) info threads Id Target Id Frame * 1 process 646 "system_server" __ioctl () at bionic/libc/arch-arm/syscalls/__ioctl.S:15 as you see, “info threads” only shows one thread but there are several threads in system_server. But if I compile a new program based on glibc and gnu libthread, I can see multi-threads by the gdb. So my questions are: 1. Should I compile the native gdb using android toolchain and android bionic/libthread libraries? 2. Why can’t the current gdb capture multithreads for android processes? This question is actually about the theory for gdb to know multi-threads. In my opinion, both gnu and android use clone() to fork threads and threads in one process have same tgid in kernel and all threads return same getpid() value. Why not gdb just travel process lists to find multi-threads? Thanks Barry

13 years, 5 months

[ACTIVITY] Jan 30 - Feb 3

by Thiago Jung Bauermann

Hi, * Learned the basics of bzr and examined the gdb-linaro repository. * Went through Michael Hope's steps to import upstream's 7.4 branch into bzr. * Explored gdb-linaro bugs and blueprints in Launchpad to familiarize myself with what has been done and is planned or proposed to be done. * Went through the gdb-linaro/7.3 branch to verify what needs to be forward-ported to gdb-linaro/7.4. Forward-ported 10 patches. * Checked which Linaro Connect sessions would be of interest for me to attend remotely, but found out that only one will be available for remote participation. * Worked very little on Wednesday since my laptop refused to turn on again after I hibernated it. I found out on the next day that plugging in an external monitor makes it happy again (I didn't have a monitor on Wed to try this out so I was stuck). Apparently the LCD screen died. -- []'s Thiago Jung Bauermann Linaro Toolchain Working Group

13 years, 5 months

[ACTIVITY] weekly status

by Ken Werner

Hi, libunwind * reviewed small patch from T. R. of Nokia who provided a bugfix when searching for unwind table entry for an IP OpenEmbedded * build the OE-core images (minimal, sato and qt4e) with -O1 and -O0 * collected the ELF size and memory footprint and updated the charts * encountered an issue when compiling Qt 4.8.0 using -O0. It causes qdbusviewer fail to link because an .LTHUNK symbol survives * tested various compilers and optimization levels and noticed that the .LTHUNK symbols do also survive with higher optimization levels * only the Linaro and ARM CSL toolchains seem to be affected (FSF trunk, 46branch and 46release seem to work) * provided a reduced testcase and opened lp #924726 * Linaro cc1 emits undefined label when using -fPIC -Os (lp #924889) * already fixed upstream, Ramana is backporting to Linaro GCC * look into the external-toolchain branch from C. Larson: https://github.com/kergoth/oe-core/tree/external-toolchain and tested it against CSL 2011.03 -> works fine * started to document: https://wiki.linaro.org/KenWerner/Sandbox/OpenEmbedded-Core Regards, Ken

13 years, 5 months

[ACTIVITY] Jan 30 - Feb 3

by Ulrich Weigand

== GCC == * Benchmarking the 4.6 backport of subreg forward-propagation confirmed that this is a net loss. On 4.7, microbenchmarks suggest a different outcome (due to register allocator enhancements), so I've created a 4.7 Bazaar branch including the patch and submitted it for benchmarking. * Implemented a patch to allow memory operands with vec_set and vec_extract to avoid excessive vmov generation in the PR 51819 test case. Patch shows no regression on microbenchmarks; full testing and benchmarking still outstanding. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

13 years, 5 months

[ACTIVITY] report week 05

by Peter Maydell

RAG: Red: Amber: Green: Current Milestones: || || Planned || Estimate || Actual || ||cp15-rework || 2012-01-06 || 2012-02-20 || || ||qemu-kvm-getting-started || 2012-03-04?|| 2012-03-04?||2012-02-01 || (for blueprint definitions: https://wiki.linaro.org/PeterMaydell/QemuKVM) Historical Milestones: ||a15-usermode-support || 2011-11-10 || 2011-11-10 || 2011-10-27 || ||upstream-omap3-cleanup || 2011-11-10 || 2011-12-15 || 2011-12-12 || ||initial-a15-system-model || 2012-01-27 || 2012-01-27 || 2012-01-17 || == cp15-rework == * Since this isn't in the critical path any more we'll work on it next quarter, and adjust its priority/due dates post Connect. == qemu-kvm-getting-started == * finished writing up HowTo documents for reproducing the KVM prototype and running it on a Fast Model: https://wiki.linaro.org/PeterMaydell/KVM/HowTo * did the last bits of testing enough to be able to say we've done the initial prototype work for TCWG2011-A15-KVM, which means we can close out the qemu-kvm-getting-started blueprint. == other == * more upstream patch review, etc * LP:926012: patches to support prctl(PR_SET_NAME, ...) in linux-user mode, for the benefit of perl 5.14.

13 years, 5 months

[ACTIVITY] w05

by Asa Sandahl

Hi! * -o2/-o3 benchmarking: Did some final work on the charts for the -O3 write up. * gcc-4.7 benchmarking: Worked on the charts for gcc-4.7 benchmarking. The Fortran tests in Spec2000 were missing from my runs. The reason was that I had mixed with the LD_LIBRARY_PATH in the cbuild make file on a previous occasion. :-( Created an internal wiki page for the results, modeled after Michael's page about -O3: https://wiki.linaro.org/Internal/ToolChain/Benchmarks/gcc-4.7_benchmarking * v8: No progress for JavaScript benchmarks this weak. https://wiki.linaro.org/AsaSandahl/Sandbox/JavaScriptBenchmarks * Bug triaging: Had a look at this bug. https://bugs.launchpad.net/gcc-linaro/+bug/923397 Regards Åsa

13 years, 5 months

[Activity] Week 05

by Zhenqiang Chen

Summary: * Analyze PATH issues for win32 binary toolchain. Details: 1. Analyze PATH issues for win32 binary toolchain. * gcc can not find the install dir if user set PATH="INSTALL_DIR\bin" in dos cmdline, which leads to compile fail since gcc can not find ../lib, ../libexec, etc. * Root cause: in dos, " is taken as part of dirs during set PATH, which is different from cygwin or linux. And dir with " is invalid in dos. * Work out a patch to filter " in function make_relative_prefix_1 and discuss it with Michael. * Create an windows install package, so users do not need to set PATH. Will discuss more detail with Michael for the following releases. 2. Read document to ramp-up on gcc. Plans: * Feb 6-10: Linaro Connect Q1.12 Best regards! -Zhenqiang

13 years, 5 months

Benchmark investigation at Connect

by Michael Hope

I've set up a new user on my laptop that we can use for experimenting with benchmarks during Connect. Here's what we've got: * A user called 'connect' * Ramana, Ulrich, Åsa, and myself can log in via SSH * bzr repo for trunk, 4.6, and gcc-linaro 4.6 * Tarballs of all gcc-linaro 4.4, 4.5, and 4.6 releases * Tarballs of the recent CSL, Google 4.6, and Android 4.4 compilers * A sysroot in ~/sysroot * Cross-binutils etch in ~/cross * A shared ccache in ~/ccache primed with these * A ~/env.sh that sets up the right paths etc * A ~/builds/doconfigure.sh that configures for our standard ARMv7-A C/C++/Fortran configuration * EEMBC, DENBench, and SPEC 2000 under ~/benchmarks The trees under build/ build into build/$ver/build and install into build/$ver/install. The benchmarks are set up to cross build and cross run. I've pulled ursa2 out of the farm and nuked it, so we can do something like: make CROSS_COMPILE=arm-linux-gnueabi- RUN="$PWD/sshrun ursa2" to run the benchmarks. I'll see about perf. We still need to re-baseline the benchmarks and get perf traces. Anything else needed? You can log in and try things if you want. Go through the bounce host and try connecting as connect@crucis. -- Michael

13 years, 5 months

Improved use of vst1 via vec_extract (PR 51819 test case)

by Ulrich Weigand

Hi Ramana, as you pointed out, in the gcc.dg/vect/vect-double-reduc-6.c test case, using compiler options as described in PR 51819, we see the following inefficient code generation: vmov.32 r2, d28[0] @ 57 vec_extractv4si [length = 4] vmov.32 r1, d22[0] @ 84 vec_extractv4si [length = 4] str r2, [r0, #4] @ 58 *thumb2_movsi_vfp/7 [length = 4] vmov.32 r3, d0[0] @ 111 vec_extractv4si [length = 4] str r1, [r0, #8] @ 85 *thumb2_movsi_vfp/7 [length = 4] vst1.32 {d2[0]}, [r0:64] @ 31 neon_vst1_lanev4si [length = 4] str r3, [r0, #12] @ 112 *thumb2_movsi_vfp/7 [length = 4] bx lr @ 120 *thumb2_return [length = 12] (The :64 alignment in vst1.32 is incorrect; that is that actual problem in PR 51819, which is now fixed.) The reason for this particular code sequence turns out to be as follows: The middle end tries to store the LSB vector lane to memory, and uses the vec_extract named pattern to do so. This pattern currently only supports an "s_register_operand" destination, and is implemented via vmov to a core register. The contents of that register are then stored to memory. Now why does any vst1 instruction show up? This is because combine is able to merge the vec_extract back into the store pattern and ends up with a pattern that matches neon_vst1_lanev4si. Note that the latter pattern is actually intended to implement NEON built-ins (vst1_lane_... etc). Now there seem to be two problems with this scenario: First of all, the neon_vst1_lane<mode> patterns seem to be actually incorrect on big-endian systems due to lane-numbering problems. As I understand it, all NEON intrinsics are supposed to take lane numbers according to the NEON ordering scheme, while the vec_select RTX pattern is defined to take lane numbers according to the in-memory order. Those disagree in the big-endian case. All other patterns implementing NEON intrinsics therefore avoid using vec_select, and instead resort to using special UNSPEC codes -- the sole exception to this happens to be neon_vst1_lane<mode>. It would appear that this is actually incorrect, and the pattern ought to use a UNSPEC_VST1_LANE unspec instead (that UNSPEC code is already defined, but nowhere used). Now if we make that change, then the above code sequence will contain no vst1 any more. But in any case, expanding first into a vec_extract followed by a store pattern, only to rely on combine to merge them back together, is a suboptimal approach. One obvious drawback is that the auto-inc-dec pass runs before reload, and therefore only sees plain stores -- with no reason whatsoever to attempt to introduce post-inc operations. Also, just in general it normally works out best to allow the final choice between register and memory operands to be make in reload ... Therefore, I think the vec_extract patterns ought to support *both* register and memory destination operands, and implement those via vmov or vst1 in final code generation, as appropriate. This means that we can make the choice in reload, guided as usual by alternative ordering and/or penalties -- for example, we can choose to reload the address and still use vst1 over reloading the contents to a core register and then using an offsetted store. Finally, this sequence will also allow the auto-inc-dec pass to do a better job. The current in-tree pass doesn't manage unfortunately, but with Richard's proposed replacement, assisted by a tweak to the cost function to make sure the (future) address reload is "priced in" correctly, I'm now seeing what appears to be the optimal code sequence: vst1.32 {d6[0]}, [r0:64]! @ 30 vec_extractv4si/1 [length = 4] vst1.32 {d22[0]}, [r0]! @ 56 vec_extractv4si/1 [length = 4] vst1.32 {d2[0]}, [r0:64]! @ 82 vec_extractv4si/1 [length = 4] vst1.32 {d4[0]}, [r0] @ 108 vec_extractv4si/1 [length = 4] bx lr @ 116 *thumb2_return [length = 12] (Again the :64 is wrong; it's already fixed on mainline but I haven't pulled that change in yet.) The attached patch implements the above-mentioned changes. Any comments? I'll try to get some performance numbers as well before moving forward with the patch ... (As an aside, it might likewise be helpful to update the vec_set patterns to allow for memory operands, implemented via vld1.) (See attached file: diff-gcc-arm-vecextractmem) B.t.w. I'm wondering how I can properly test: - that the NEON intrinsics still work - that everything works on big-endian Any suggestions would be appreciated! Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

13 years, 5 months

Building at -O3 write up

by Michael Hope

One of our cards this quarter was -O3 as a performance theme which included doing a write up on the advantages and usability of -O3. This write up is at: https://wiki.linaro.org/Internal/ToolChain/BuildingAtO3 A sanitised version with non-sharable benchmark data is at: https://wiki.linaro.org/Internal/ToolChain/BuildingAtO3 Could someone review it for me please? Both facts and style. Peter, care to nit it? -- Michael

13 years, 5 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain February 2012