Hi Alexandros,
Could you use the linaro-toolchain list for stuff like this please? You're more likely to find somebody who knows the answer that way.
I'm pretty sure the problem is not the compiler because, as far as I can see, both architectures' compilers emit ".weak" directives. If there is a problem, I'd say it's in the linker.
Your test case gives two different addresses on Lucid x86, and on ARM (so you say, I've not tested it), but the same address twice on Precise. This is a surprising result. *I* would have expected that static values in different dlopen'd libraries would not be unified, but apparently they are ... somtimes.
I'm afraid I don't really have any insight here. :(
Anyway, regardless of whether one is correct, or not, I'd suggest *not* relying on this behaviour - it's clearly not portable. I say leave it at arm's length in production software for a few years.
Andrew
On 06/03/12 14:27, Alexandros Frantzis wrote:
On Tue, Mar 06, 2012 at 09:51:01AM +0800, Sam Spilsbury wrote:
On Mon, Mar 5, 2012 at 11:50 PM, Alexandros Frantzis alexandros.frantzis@linaro.org wrote:
Hi all,
this is an update on my progress with the updated compiz branches.
I have been trying to run our update compiz branches (compiz-*/linaro-gles2-update) on ARM (precise armhf), but I have stumbled onto the same issue Marc reported some days ago. In particular, I get:
/usr/bin/compiz (core) - Fatal: Private index value "15CompositeWindow_index_4" already stored in screen. /usr/bin/compiz (core) - Fatal: Private index value "15CompositeScreen_index_4" already stored in screen.
and then a segfault when I try to run compiz.
Note that I *don't* have this problem when running on x86_64 precise.
The issue can be recreated with:
$ compiz composite opengl
I added some debugging messages to pluginclasshandler.h to get a better feeling of what is going on, and ran on both my desktop and on ARM. This is the output near the point when GLScreen get initialized:
...
compiz (core) - Info: get(): mIndex.initiated for "8GLScreen_index_4" : 0 compiz (core) - Info: initializeIndex(): Initializining index value "8GLScreen_index_4" compiz (core) - Info: initializeIndex(): Private index value added for "8GLScreen_index_4" compiz (core) - Info: getInstance(): Get instance for "8GLScreen_index_4" compiz (core) - Info: getInstance(): Spawning new class for "8GLScreen_index_4" compiz (core) - Info: ctor(): mIndex.initiated for "8GLScreen_index_4" : 1 compiz (core) - Info: ctor(): Increasing reference count for "8GLScreen_index_4": 1
--- x86_64 --- compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 1 --- armhf --- compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 0 compiz (core) - Info: initializeIndex(): Initializining index value "15CompositeScreen_index_4" compiz (core) - Fatal: initializeIndex(): Private index value "15CompositeScreen_index_4" already stored in screen.
After the composite plugin loads and mIndex.initiated is set to 1, place a watchpoint on mIndex.initiated (it should be a separate template instantiation for each different class) and check if it changes, or check if we are reading mIndex.initiated from a different address, and if so, check the addresses of this for each constructor and destructor being called. (could be a compiler bug, I've hit these on this part of the code before).
In the armhf case, CompositeScreen is erroneously considered not initialized, and is initialiazed again, therefore messing up the plugin system.
I am trying to figure out if this is a manifestation of some kind of memory corruption that doesn't affect us on x86_64 for whatever reason (alignment, integer size etc), or something completely different.
Thoughts?
Thanks, Alexandros
-- Sam Spilsbury
Hi all,
(I have also added Michael, Andrew and Ulrich from the Linaro toolchain group to the recipients. Hi!)
Checking the addresses, as Sam suggested, I found that there are two different PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex and PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex objects.
After a bit of investigation, objdump gave an explanation:
objdump -t /usr/lib/compiz/libcomposite.so | c++filt | grep mIndex
-- x86_64 -- 0000000000277a80 u O .bss 0000000000000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex 0000000000277a70 u O .bss 0000000000000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex -- armhf -- 00065648 w O .bss 00000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex 00065658 w O .bss 00000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex
And the same kind of output for libopengl.so
On x86_64 the symbols are marked 'u': 'unique global', whereas on armhf they are marked 'w': 'weak'. This seems to be causing our troubles.
I have produced a small test case for this:
http://people.linaro.org/~afrantzis/cpp_unique_global.tar.gz
Building and running 'LD_LIBRARY_PATH=. ./main' on x86_64 prints out f1 and f2 with the same address, whereas on armhf the addresses are different (i.e. two different objects). On x86_64 the symbol A<int>::a is 'u', on armhf it is 'w'.
For completeness, when running without templates (edit a.h to change) the two printed addresses are different on both x86_64 and armhf. Also A::a is 'g': 'normal global' for both.
Michael, Andrew, Ulrich can you please give us some insight into the situation? Does this seem like a compiler or linker bug on ARM, or is the code depending on undefined behavior, or something different? I have pasted the used g++ versions at the end of the email.
Thanks, Alexandros
--- g++ x86_64 -- Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
--- g++ armhf -- Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper Target: arm-linux-gnueabihf Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf Thread model: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
(Hit send too soon on my last mail and appear to have removed linaro-toolchain Apologies to those who get duplicates)
On Tue, Mar 06, 2012 at 04:00:36PM +0000, Andrew Stubbs wrote:
Hi Alexandros,
Could you use the linaro-toolchain list for stuff like this please? You're more likely to find somebody who knows the answer that way.
I'm pretty sure the problem is not the compiler because, as far as I can see, both architectures' compilers emit ".weak" directives. If there is a problem, I'd say it's in the linker.
Your test case gives two different addresses on Lucid x86, and on ARM (so you say, I've not tested it), but the same address twice on Precise. This is a surprising result. *I* would have expected that static values in different dlopen'd libraries would not be unified, but apparently they are ... somtimes.
I suspect this is a compiler bug around the handling of STB_GNU_UNIQUE_OBJECT- something I suspect was invented to solve the problem in this space - it should all just work in the GNU/Linux world.
The assembler files on x86_64 from the small testcase have
.type _ZN1AIiE1aE, @gnu_unique_object
while the one in case of ARM doesn't have this.
However my suspicion about the problem is around the fact that GCC in it's build process
emits .type x, @gnu_unique_object to check whether this feature is supported by the GNU assembler. Historically `@' has been a comment character on ARM . So, the compiler doesn't know that GNU_UNIQUE_OBJECT is supported in the assembler and it all falls apart very quickly after that and therefore doesn't generate such code. ...
The quickest workaround IMHO is for a new compiler build that is rebuilt with --enable-gnu-unique-object. Given this feature went into a not very recent version of binutils, I would expect most recent assemblers to support this feature and for this to just work (TM). I would expect this configure option to be turned on for cross-compilers as well. It might also be the fastest way of testing this feature.
Thoughts ? I would like another set of eyes on this.
I verified this works on an armel box by :
(natty)lp-ramana@jenipapo:~/cpp_unique_global$ diff -au f12.s f1.s | less --- f12.s 2012-03-07 00:47:32.000000000 +0000 +++ f1.s 2012-03-06 23:25:54.000000000 +0000 @@ -130,7 +130,7 @@ .weak _ZN1AIiE1aE .section .bss._ZN1AIiE1aE,"awG",%nobits,_ZN1AIiE1aE,comdat .align 2 - .type _ZN1AIiE1aE, %object + .type _ZN1AIiE1aE, %gnu_unique_object .size _ZN1AIiE1aE, 4 _ZN1AIiE1aE: .space 4
and the same for f2.s, regenerating by hand libf1.so and libf2.so and the output generated is :
(natty)lp-ramana@jenipapo:~/cpp_unique_global$ LD_LIBRARY_PATH=. ./main f1 0x40028034 f2 0x40028034
regards, Ramana
On 6 March 2012 16:00, Andrew Stubbs andrew.stubbs@linaro.org wrote:
Hi Alexandros,
Could you use the linaro-toolchain list for stuff like this please? You're more likely to find somebody who knows the answer that way.
I'm pretty sure the problem is not the compiler because, as far as I can see, both architectures' compilers emit ".weak" directives. If there is a problem, I'd say it's in the linker.
Your test case gives two different addresses on Lucid x86, and on ARM (so you say, I've not tested it), but the same address twice on Precise. This is a surprising result. *I* would have expected that static values in different dlopen'd libraries would not be unified, but apparently they are ... somtimes.
I'm afraid I don't really have any insight here. :(
Anyway, regardless of whether one is correct, or not, I'd suggest *not* relying on this behaviour - it's clearly not portable. I say leave it at arm's length in production software for a few years.
Andrew
On 06/03/12 14:27, Alexandros Frantzis wrote:
On Tue, Mar 06, 2012 at 09:51:01AM +0800, Sam Spilsbury wrote:
On Mon, Mar 5, 2012 at 11:50 PM, Alexandros Frantzis alexandros.frantzis@linaro.org wrote:
Hi all,
this is an update on my progress with the updated compiz branches.
I have been trying to run our update compiz branches (compiz-*/linaro-gles2-update) on ARM (precise armhf), but I have stumbled onto the same issue Marc reported some days ago. In particular, I get:
/usr/bin/compiz (core) - Fatal: Private index value "15CompositeWindow_index_4" already stored in screen. /usr/bin/compiz (core) - Fatal: Private index value "15CompositeScreen_index_4" already stored in screen.
and then a segfault when I try to run compiz.
Note that I *don't* have this problem when running on x86_64 precise.
The issue can be recreated with:
$ compiz composite opengl
I added some debugging messages to pluginclasshandler.h to get a better feeling of what is going on, and ran on both my desktop and on ARM. This is the output near the point when GLScreen get initialized:
...
compiz (core) - Info: get(): mIndex.initiated for "8GLScreen_index_4" : 0 compiz (core) - Info: initializeIndex(): Initializining index value "8GLScreen_index_4" compiz (core) - Info: initializeIndex(): Private index value added for "8GLScreen_index_4" compiz (core) - Info: getInstance(): Get instance for "8GLScreen_index_4" compiz (core) - Info: getInstance(): Spawning new class for "8GLScreen_index_4" compiz (core) - Info: ctor(): mIndex.initiated for "8GLScreen_index_4" : 1 compiz (core) - Info: ctor(): Increasing reference count for "8GLScreen_index_4": 1
--- x86_64 --- compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 1 --- armhf --- compiz (core) - Info: get(): mIndex.initiated for "15CompositeScreen_index_4" : 0 compiz (core) - Info: initializeIndex(): Initializining index value "15CompositeScreen_index_4" compiz (core) - Fatal: initializeIndex(): Private index value "15CompositeScreen_index_4" already stored in screen.
After the composite plugin loads and mIndex.initiated is set to 1, place a watchpoint on mIndex.initiated (it should be a separate template instantiation for each different class) and check if it changes, or check if we are reading mIndex.initiated from a different address, and if so, check the addresses of this for each constructor and destructor being called. (could be a compiler bug, I've hit these on this part of the code before).
In the armhf case, CompositeScreen is erroneously considered not initialized, and is initialiazed again, therefore messing up the plugin system.
I am trying to figure out if this is a manifestation of some kind of memory corruption that doesn't affect us on x86_64 for whatever reason (alignment, integer size etc), or something completely different.
Thoughts?
Thanks, Alexandros
-- Sam Spilsbury
Hi all,
(I have also added Michael, Andrew and Ulrich from the Linaro toolchain group to the recipients. Hi!)
Checking the addresses, as Sam suggested, I found that there are two different PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex and PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex objects.
After a bit of investigation, objdump gave an explanation:
objdump -t /usr/lib/compiz/libcomposite.so | c++filt | grep mIndex
-- x86_64 -- 0000000000277a80 u O .bss 0000000000000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex 0000000000277a70 u O .bss 0000000000000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex -- armhf -- 00065648 w O .bss 00000010 PluginClassHandler<CompositeWindow, CompWindow, 4>::mIndex 00065658 w O .bss 00000010 PluginClassHandler<CompositeScreen, CompScreen, 4>::mIndex
And the same kind of output for libopengl.so
On x86_64 the symbols are marked 'u': 'unique global', whereas on armhf they are marked 'w': 'weak'. This seems to be causing our troubles.
I have produced a small test case for this:
http://people.linaro.org/~afrantzis/cpp_unique_global.tar.gz
Building and running 'LD_LIBRARY_PATH=. ./main' on x86_64 prints out f1 and f2 with the same address, whereas on armhf the addresses are different (i.e. two different objects). On x86_64 the symbol A<int>::a is 'u', on armhf it is 'w'.
For completeness, when running without templates (edit a.h to change) the two printed addresses are different on both x86_64 and armhf. Also A::a is 'g': 'normal global' for both.
Michael, Andrew, Ulrich can you please give us some insight into the situation? Does this seem like a compiler or linker bug on ARM, or is the code depending on undefined behavior, or something different? I have pasted the used g++ versions at the end of the email.
Thanks, Alexandros
--- g++ x86_64 -- Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
--- g++ armhf -- Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper Target: arm-linux-gnueabihf Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf Thread model: posix gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu2)
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
On Wed, Mar 07, 2012 at 12:57:55AM +0000, Ramana Radhakrishnan wrote:
(Hit send too soon on my last mail and appear to have removed linaro-toolchain Apologies to those who get duplicates)
On Tue, Mar 06, 2012 at 04:00:36PM +0000, Andrew Stubbs wrote:
Hi Alexandros,
Could you use the linaro-toolchain list for stuff like this please? You're more likely to find somebody who knows the answer that way.
I'm pretty sure the problem is not the compiler because, as far as I can see, both architectures' compilers emit ".weak" directives. If there is a problem, I'd say it's in the linker.
Your test case gives two different addresses on Lucid x86, and on ARM (so you say, I've not tested it), but the same address twice on Precise. This is a surprising result. *I* would have expected that static values in different dlopen'd libraries would not be unified, but apparently they are ... somtimes.
I suspect this is a compiler bug around the handling of STB_GNU_UNIQUE_OBJECT- something I suspect was invented to solve the problem in this space - it should all just work in the GNU/Linux world.
The assembler files on x86_64 from the small testcase have
.type _ZN1AIiE1aE, @gnu_unique_object
while the one in case of ARM doesn't have this.
However my suspicion about the problem is around the fact that GCC in it's build process
emits .type x, @gnu_unique_object to check whether this feature is supported by the GNU assembler. Historically `@' has been a comment character on ARM . So, the compiler doesn't know that GNU_UNIQUE_OBJECT is supported in the assembler and it all falls apart very quickly after that and therefore doesn't generate such code. ...
The quickest workaround IMHO is for a new compiler build that is rebuilt with --enable-gnu-unique-object. Given this feature went into a not very recent version of binutils, I would expect most recent assemblers to support this feature and for this to just work (TM). I would expect this configure option to be turned on for cross-compilers as well. It might also be the fastest way of testing this feature.
Thoughts ? I would like another set of eyes on this.
I verified this works on an armel box by :
(natty)lp-ramana@jenipapo:~/cpp_unique_global$ diff -au f12.s f1.s | less --- f12.s 2012-03-07 00:47:32.000000000 +0000 +++ f1.s 2012-03-06 23:25:54.000000000 +0000 @@ -130,7 +130,7 @@ .weak _ZN1AIiE1aE .section .bss._ZN1AIiE1aE,"awG",%nobits,_ZN1AIiE1aE,comdat .align 2
.type _ZN1AIiE1aE, %object
.type _ZN1AIiE1aE, %gnu_unique_object .size _ZN1AIiE1aE, 4
_ZN1AIiE1aE: .space 4
and the same for f2.s, regenerating by hand libf1.so and libf2.so and the output generated is :
(natty)lp-ramana@jenipapo:~/cpp_unique_global$ LD_LIBRARY_PATH=. ./main f1 0x40028034 f2 0x40028034
regards, Ramana
Hi Ramana,
thanks for the analysis. I have filed:
https://bugs.launchpad.net/gcc-linaro/+bug/949805
for this issue.
Thanks, Alexandros
Matthias,
This affects ubuntu-gcc as well and the work around is to try the --enable-gnu-unique-object compiler configure time flag. Could you try to rebuild a toolchain with the configure option --enable-gnu-unique-object and check if tests don't regress with the feature turned on ? With that work around you should be able to get a toolchain into the archive that allows compiz to be rebuilt.
regards, Ramana
for this issue.
Thanks, Alexandros
On 08.03.2012 16:14, Ramana Radhakrishnan wrote:
Matthias,
This affects ubuntu-gcc as well and the work around is to try the --enable-gnu-unique-object compiler configure time flag. Could you try to rebuild a toolchain with the configure option --enable-gnu-unique-object and check if tests don't regress with the feature turned on ? With that work around you should be able to get a toolchain into the archive that allows compiz to be rebuilt.
committed. will be part of the next upload.
Matthias
Great news. Thanks to everyone for moving so quickly on this.
cheers, Jesse
On Thu, Mar 8, 2012 at 8:42 PM, Matthias Klose doko@ubuntu.com wrote:
On 08.03.2012 16:14, Ramana Radhakrishnan wrote:
Matthias,
This affects ubuntu-gcc as well and the work around is to try the --enable-gnu-unique-object compiler configure time flag. Could you try to rebuild a toolchain with the configure option --enable-gnu-unique-object and check if tests don't regress with the feature turned on ? With that work around you should be able to get a toolchain into the archive that allows compiz to be rebuilt.
committed. will be part of the next upload.
Matthias
On Fri, Mar 09, 2012 at 09:16:02AM +0000, Jesse Barker wrote:
On Thu, Mar 8, 2012 at 8:42 PM, Matthias Klose doko@ubuntu.com wrote:
On 08.03.2012 16:14, Ramana Radhakrishnan wrote:
Matthias,
This affects ubuntu-gcc as well and the work around is to try the --enable-gnu-unique-object compiler configure time flag. Could you try to rebuild a toolchain with the configure option --enable-gnu-unique-object and check if tests don't regress with the feature turned on ? With that work around you should be able to get a toolchain into the archive that allows compiz to be rebuilt.
committed. will be part of the next upload.
Matthias
Great news. Thanks to everyone for moving so quickly on this.
cheers, Jesse
+1
I rebuilt compiz with the new armhf gcc packages and the compiz crash is gone.
Thanks, Alexandros
linaro-toolchain@lists.linaro.org