Hi.
First.
#include <Im-doing-something-somewhat-odd.h>
I'm trying to use a current clang/llvm (current as in git checkout from just the other day) to build an opencl kernel and then link that with some code which has been compiled with gcc/g++.
When the clang .o is linked to the gcc/gcc+, I'm getting /home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments, /home/tgall/opencl/SNU/tmp2/cl_temp_1.o does not
the cl_temp_1.o was produced with clang. the cl_temp_1.tkl via gcc/g++.
Let's dive into details.
This is following in the footsteps of an open source framework called SNU which implements OpenCL. Within SNU they had a fairly old version of clang+llvm which wouldn't even build on ARM so step one has been to figure out what SNU was doing with clang and replicate this using latest clang.
So given the following minimal test kernel placed into cl_temp_1.cl
/* Header to make Clang compatible with OpenCL */ #define __global __attribute__((address_space(1)))
int get_global_id(int index);
/* Test kernel */ __kernel void test(__global float *in, __global float *out) { int index = get_global_id(0); out[index] = 3.14159f * in[index] + in[index]; }
then we following the following steps: clang -mfloat-abi=hard -mfpu=neon -S -emit-llvm -x cl -I/home/tgall/opencl/SNU/src/compiler/tools/clang/lib/Headers -I/home/tgall/opencl/SNU/inc -include /home/tgall/opencl/SNU/inc/comp/cl_kernel.h /home/tgall/opencl/SNU/tmp2/cl_temp_1.cl -o /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
with the resulting cl_temp_1.ll we:
llc /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
which results in cl_temp_1.s. Then:
clang -c -mfloat-abi=hard -mfpu=neon -o /home/tgall/opencl/SNU/tmp2/cl_temp_1.o /home/tgall/opencl/SNU/tmp2/cl_temp_1.s
so now in theory we should have a perflectly good cl_temp_1.o ready for linking.
But first let's get the bits ready that will be built by the traditional gnu toolschain. We have:
gcc -shared -fPIC -O3 -o /home/tgall/opencl/SNU/tmp2/cl_temp_1_info.so /home/tgall/opencl/SNU/tmp2/cl_temp_1_info.c
and
gcc -shared -fPIC -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -fsigned-char -DDEF_INCLUDE_ARM -I. -I /home/tgall/opencl/SNU/src/runtime/hal/device/cpu -I /home/tgall/opencl/SNU/src/runtime/include -I /home/tgall/opencl/SNU/src/runtime/core -I /home/tgall/opencl/SNU/src/runtime/core/device -I /home/tgall/opencl/SNU/src/runtime/hal -I /home/tgall/opencl/SNU/src/runtime/hal/device -DTARGET_MACH_CPU -O3 -c /home/tgall/opencl/SNU/src/runtime/hal/device/cpu/hal.c -o /home/tgall/opencl/SNU/tmp2/hal.o
And here we try to link it all together.
g++ -shared -fPIC -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -fsigned-char -DDEF_INCLUDE_ARM -O3 -o /home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl /home/tgall/opencl/SNU/tmp2/hal.o /home/tgall/opencl/SNU/tmp2/cl_temp_1.o -L/home/tgall/opencl/SNU/lib/lnx_arm -lsnusamsung_opencl_builtin_lnx_arm -lpthread -lm
and bang we're back to the error I first mentioned: /usr/bin/ld: error: /home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments, /home/tgall/opencl/SNU/tmp2/cl_temp_1.o does not
so first obvious question is -mfloat-abi=hard -mfpu=neon correct for clang?
tgall@miranda:~/opencl/SNU/tmp2$ clang --version clang version 3.3 Target: armv7l-unknown-linux-gnueabihf Thread model: posix
Thanks for any suggestions!
-- Regards, Tom
"Where's the kaboom!? There was supposed to be an earth-shattering kaboom!" Marvin Martian Tech Lead, Graphics Working Group | Linaro.org │ Open source software for ARM SoCs w) tom.gall att linaro.org h) tom_gall att mac.com
Hi Tom,
On 17 April 2013 21:13, Tom Gall tom.gall@linaro.org wrote:
When the clang .o is linked to the gcc/gcc+, I'm getting /home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments, /home/tgall/opencl/SNU/tmp2/cl_temp_1.o does not
This is pretty common. Clang assumes ARMv4 unless you're pretty specific about your core.
clang -mfloat-abi=hard -mfpu=neon -S -emit-llvm -x cl
-I/home/tgall/opencl/SNU/src/compiler/tools/clang/lib/Headers -I/home/tgall/opencl/SNU/inc -include /home/tgall/opencl/SNU/inc/comp/cl_kernel.h /home/tgall/opencl/SNU/tmp2/cl_temp_1.cl -o /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
What target triple do you see when you run:
$ head /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
If it's "arm-blah", then it'll default to ARMv4. It has to be "armv7*" to default to Cortex-A8, but would be good to specify the CPU as well. It won't detect from the hardware you're in yet.
so first obvious question is -mfloat-abi=hard -mfpu=neon correct for clang?
Neither required, nor sufficient. ;)
When you chose your triple "armv7l-*" it'll default to A8, Neon, hard-float. If you specify hard-float and Neon, it won't default to A8 and the parameters will be ignored further in. It doesn't make sense, I agree, and it's a problem not just for cross-compilation, but native.
The best bet is to specify the triple AND the CPU, so that you're sure you're getting what you want:
$ clang -target arm-linux-gnueabihf -mcpu=cortex-a9 -mfpu=neon -mthumb
As you noticed, Thumb2 is not the default for Cortex-A*, but hard-float is. You can always see what hidden options you got by adding -v to the command line. Also, the triple here is "arm-*" but Clang will notice the A9 option and will change accordingly in the IR and pass the correct options to the assembler. If you do in two steps, you still have to pass it yourself, because "armv7-*" in the IR will turn out as Cortex-A8 by default.
Two other options that I encourage you to try:
-integrated-as : the experimental (on ARM) integrated assembler. You won't be using GAS, so if your code depends on GAS' idiosyncrasies, don't use this option.
-O3 : Apart from the usual, this will turn on auto-vectorization (like GCC), which is also kind of experimental. Just be aware of that.
Hope that helps, --renato
PS: If you're cross compiling, you'll have to manually specify the include paths.
Hi Renato,
I was trying to experiment with what you mentioned; here: https://pastebin.linaro.org/view/35ea66c1 - it still seems to fail with the same error.
Would you please have a look at it, and let me know if you spot something absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world !)
Best regards, ~Sumit.
On 18 April 2013 13:36, Renato Golin renato.golin@linaro.org wrote:
Hi Tom,
On 17 April 2013 21:13, Tom Gall tom.gall@linaro.org wrote:
When the clang .o is linked to the gcc/gcc+, I'm getting /home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments, /home/tgall/opencl/SNU/tmp2/cl_temp_1.o does not
This is pretty common. Clang assumes ARMv4 unless you're pretty specific about your core.
clang -mfloat-abi=hard -mfpu=neon -S -emit-llvm -x cl
-I/home/tgall/opencl/SNU/src/compiler/tools/clang/lib/Headers -I/home/tgall/opencl/SNU/inc -include /home/tgall/opencl/SNU/inc/comp/cl_kernel.h /home/tgall/opencl/SNU/tmp2/cl_temp_1.cl -o /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
What target triple do you see when you run:
$ head /home/tgall/opencl/SNU/tmp2/cl_temp_1.ll
If it's "arm-blah", then it'll default to ARMv4. It has to be "armv7*" to default to Cortex-A8, but would be good to specify the CPU as well. It won't detect from the hardware you're in yet.
so first obvious question is -mfloat-abi=hard -mfpu=neon correct for clang?
Neither required, nor sufficient. ;)
When you chose your triple "armv7l-*" it'll default to A8, Neon, hard-float. If you specify hard-float and Neon, it won't default to A8 and the parameters will be ignored further in. It doesn't make sense, I agree, and it's a problem not just for cross-compilation, but native.
The best bet is to specify the triple AND the CPU, so that you're sure you're getting what you want:
$ clang -target arm-linux-gnueabihf -mcpu=cortex-a9 -mfpu=neon -mthumb
As you noticed, Thumb2 is not the default for Cortex-A*, but hard-float is. You can always see what hidden options you got by adding -v to the command line. Also, the triple here is "arm-*" but Clang will notice the A9 option and will change accordingly in the IR and pass the correct options to the assembler. If you do in two steps, you still have to pass it yourself, because "armv7-*" in the IR will turn out as Cortex-A8 by default.
Two other options that I encourage you to try:
-integrated-as : the experimental (on ARM) integrated assembler. You won't be using GAS, so if your code depends on GAS' idiosyncrasies, don't use this option.
-O3 : Apart from the usual, this will turn on auto-vectorization (like GCC), which is also kind of experimental. Just be aware of that.
Hope that helps, --renato
PS: If you're cross compiling, you'll have to manually specify the include paths.
On 23 April 2013 10:03, Sumit Semwal sumit.semwal@linaro.org wrote:
Would you please have a look at it, and let me know if you spot something absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world !)
I can't see anything wrong with it.
What I suggest is to start investigating the IR files for clues (especially the target triple and the aapcs_vfp function attributes), the assembly files (for build attributes and function prologue) and the objects (for correctly linked libraries).
If you use -v on all command lines (clang, gcc, g++, ld) you might spot what assumptions are being taken and what to do instead.
cheers, --renato
On 25 April 2013 16:19, Renato Golin renato.golin@linaro.org wrote:
On 23 April 2013 10:03, Sumit Semwal sumit.semwal@linaro.org wrote:
Would you please have a look at it, and let me know if you spot something absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world !)
I can't see anything wrong with it.
What I suggest is to start investigating the IR files for clues (especially the target triple and the aapcs_vfp function attributes), the assembly files (for build attributes and function prologue) and the objects (for correctly linked libraries).
If you use -v on all command lines (clang, gcc, g++, ld) you might spot what assumptions are being taken and what to do instead.
cheers, --renato
Sure, Thanks Renato!
-- Thanks and regards, Sumit Semwal
Hi Renato!
Apologies again for troubling you; attached is a log file from one of the builds I am doing (trying to get pocl working on chromebook) - I 'think' I have provided the right values for target etc, but maybe you could have a quick look to spot any quickly visible mismatches?
Thanks a bunch! Best regards, ~Sumit.
On 25 April 2013 16:37, Sumit Semwal sumit.semwal@linaro.org wrote:
On 25 April 2013 16:19, Renato Golin renato.golin@linaro.org wrote:
On 23 April 2013 10:03, Sumit Semwal sumit.semwal@linaro.org wrote:
Would you please have a look at it, and let me know if you spot something absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world !)
I can't see anything wrong with it.
What I suggest is to start investigating the IR files for clues (especially the target triple and the aapcs_vfp function attributes), the assembly files (for build attributes and function prologue) and the objects (for correctly linked libraries).
If you use -v on all command lines (clang, gcc, g++, ld) you might spot what assumptions are being taken and what to do instead.
cheers, --renato
Sure, Thanks Renato!
-- Thanks and regards, Sumit Semwal
Hi Sumit,
I think I've found it!
You compiled to IR (.bc) by specifying hard-float, but when you convert it to assembly (where the AAPCS will be lowered), you don't:
[pocl] executing [/usr/lib/llvm-3.2/bin/llc -relocation-model=pic -o /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.s /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.bc]
Later on you pass the hard-float argument to the assembler (clang, which passes to as), but that's too late.
If you want to compile in separate steps, you'll have to provide consistent flags on each step, to make sure nothing is left behind. All tools, clang, as, llc etc will have to have the same set of flags (or similar flags, if they accept slightly different syntax).
Hope that helps!
--renato
On 25 April 2013 15:36, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Renato!
Apologies again for troubling you; attached is a log file from one of the builds I am doing (trying to get pocl working on chromebook) - I 'think' I have provided the right values for target etc, but maybe you could have a quick look to spot any quickly visible mismatches?
Thanks a bunch! Best regards, ~Sumit.
On 25 April 2013 16:37, Sumit Semwal sumit.semwal@linaro.org wrote:
On 25 April 2013 16:19, Renato Golin renato.golin@linaro.org wrote:
On 23 April 2013 10:03, Sumit Semwal sumit.semwal@linaro.org wrote:
Would you please have a look at it, and let me know if you spot
something
absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world
!)
I can't see anything wrong with it.
What I suggest is to start investigating the IR files for clues
(especially
the target triple and the aapcs_vfp function attributes), the assembly
files
(for build attributes and function prologue) and the objects (for
correctly
linked libraries).
If you use -v on all command lines (clang, gcc, g++, ld) you might spot
what
assumptions are being taken and what to do instead.
cheers, --renato
Sure, Thanks Renato!
-- Thanks and regards, Sumit Semwal
-- Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
Awesome Renato! Thanks a bunch! I'll try it out and let you know!
Best, ~Sumit.
On 26 April 2013 01:17, Renato Golin renato.golin@linaro.org wrote:
Hi Sumit,
I think I've found it!
You compiled to IR (.bc) by specifying hard-float, but when you convert it to assembly (where the AAPCS will be lowered), you don't:
[pocl] executing [/usr/lib/llvm-3.2/bin/llc -relocation-model=pic -o /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.s /tmp/pthread/test_as_type/1-1-1.0-0-0/parallel.bc]
Later on you pass the hard-float argument to the assembler (clang, which passes to as), but that's too late.
If you want to compile in separate steps, you'll have to provide consistent flags on each step, to make sure nothing is left behind. All tools, clang, as, llc etc will have to have the same set of flags (or similar flags, if they accept slightly different syntax).
Hope that helps!
--renato
On 25 April 2013 15:36, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Renato!
Apologies again for troubling you; attached is a log file from one of the builds I am doing (trying to get pocl working on chromebook) - I 'think' I have provided the right values for target etc, but maybe you could have a quick look to spot any quickly visible mismatches?
Thanks a bunch! Best regards, ~Sumit.
On 25 April 2013 16:37, Sumit Semwal sumit.semwal@linaro.org wrote:
On 25 April 2013 16:19, Renato Golin renato.golin@linaro.org wrote:
On 23 April 2013 10:03, Sumit Semwal sumit.semwal@linaro.org wrote:
Would you please have a look at it, and let me know if you spot something absolutely basic and idiotic? (sorry, and a n00b to llvm / clang world !)
I can't see anything wrong with it.
What I suggest is to start investigating the IR files for clues (especially the target triple and the aapcs_vfp function attributes), the assembly files (for build attributes and function prologue) and the objects (for correctly linked libraries).
If you use -v on all command lines (clang, gcc, g++, ld) you might spot what assumptions are being taken and what to do instead.
cheers, --renato
Sure, Thanks Renato!
-- Thanks and regards, Sumit Semwal
-- Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
linaro-toolchain@lists.linaro.org