Hi Tom,When the clang .o is linked to the gcc/gcc+, I'm getting
/home/tgall/opencl/SNU/tmp2/cl_temp_1.tkl uses VFP register arguments,
/home/tgall/opencl/SNU/tmp2/cl_temp_1.o does notThis is pretty common. Clang assumes ARMv4 unless you're pretty specific about your core.clang -mfloat-abi=hard -mfpu=neon -S -emit-llvm -x cl
-I/home/tgall/opencl/SNU/src/compiler/tools/clang/lib/Headers
-I/home/tgall/opencl/SNU/inc -include
/home/tgall/opencl/SNU/inc/comp/cl_kernel.h
/home/tgall/opencl/SNU/tmp2/cl_temp_1.cl -o
/home/tgall/opencl/SNU/tmp2/cl_temp_1.llWhat target triple do you see when you run:$ head /home/tgall/opencl/SNU/tmp2/cl_temp_1.llIf it's "arm-blah", then it'll default to ARMv4. It has to be "armv7*" to default to Cortex-A8, but would be good to specify the CPU as well. It won't detect from the hardware you're in yet.so first obvious question is -mfloat-abi=hard -mfpu=neon correct for clang?Neither required, nor sufficient. ;)When you chose your triple "armv7l-*" it'll default to A8, Neon, hard-float. If you specify hard-float and Neon, it won't default to A8 and the parameters will be ignored further in. It doesn't make sense, I agree, and it's a problem not just for cross-compilation, but native.The best bet is to specify the triple AND the CPU, so that you're sure you're getting what you want:$ clang -target arm-linux-gnueabihf -mcpu=cortex-a9 -mfpu=neon -mthumbAs you noticed, Thumb2 is not the default for Cortex-A*, but hard-float is. You can always see what hidden options you got by adding -v to the command line. Also, the triple here is "arm-*" but Clang will notice the A9 option and will change accordingly in the IR and pass the correct options to the assembler. If you do in two steps, you still have to pass it yourself, because "armv7-*" in the IR will turn out as Cortex-A8 by default.Two other options that I encourage you to try:-integrated-as : the experimental (on ARM) integrated assembler. You won't be using GAS, so if your code depends on GAS' idiosyncrasies, don't use this option.-O3 : Apart from the usual, this will turn on auto-vectorization (like GCC), which is also kind of experimental. Just be aware of that.Hope that helps,--renatoPS: If you're cross compiling, you'll have to manually specify the include paths.
Thanks and regards,
Sumit Semwal
Linaro Kernel Engineer - Graphics working group
Linaro.org │ Open source software for ARM SoCs