Hi,
On Thu, Dec 2, 2010 at 9:49 PM, Michael Hope michael.hope@linaro.org wrote:
Hi there. Currently you can't use NEON instructions in inline assembly if the compiler is set to -mfpu=vfp such as Ubuntu's -mfpu=vfpv3-d16. Trying code like this:
int main() { asm("veor d1, d2, d3"); return 0; }
gives an error message like:
test.s: Assembler messages: test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3'
The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the compiler what instructions to use, and also tells the assembler what instructions are valid. We might want the compiler to use the VFP for compatibility or power reasons, but still be able to use NEON instructions in inline assembler without passing extra flags.
We came across a similar case in the kernel just recently... and it's likely to recur as we try to move toward more unified kernels.
The problem is that the toolchain considers:
a) the architecture baseline _needed_ by an object, and the architectural features it _may use_ to be one and the same thing. This is true for C code, but not universally true for assembler (either inline or not).
b) the architecture baseline _needed_ by the output of the linker to be the union of all the architecture baselines _needed_ by all the individual objects linked together. This is not necessarily true even for C code.
These conservative assumptions only really support the fixed-configuration use case; they don't accomodate the concept of run-time adaptation to CPU features.
For background on the kernel discussion, see this thread -- you'll have to follow it a bit:
http://ns.spinics.net/lists/arm-kernel/msg105325.html
Inserting ".fpu neon" to the start of the inline assembly fixes the problem. Is this valid? Are assembly files with multiple .fpu statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as gas seems to ignore the second -mfpu.
Strictly speaking, no, because many many points of gas behaviour are not specified and there's no definition of what should happen if there are multiple conflicting .arch or .fpu directives. We'd need a toolchain expert to pass judgement on this.
Also, changing the arch part way through the file means you're no longer protected: incorrect code generation by the compiler from that point onwards may not be detected if it occurs.
Worse, if there were to be a "neonv2" in the future, you would now unexpectedly downgrade the architecture halfway through the file, so the assembler may barf on subsequent compiler-generated code... so to avoid future maintenance problems, a way to restore the "true" architecture is definitely needed.
In principle, you could change and resture the architecture with the help of some build system hacks:
gcc -DASM_DEFAULT_ARCH='".arch $(ARCH_VERSION); .fpu $(FP_ARCH_VERSION);"'
asm( ".fpu neon\n\t" "veor d0, d1, d2\n\t" ASM_DEFAULT_ARCH )
This doesn't sit well with the Debian/Ubuntu way of building things where we have to build options into the compiler as defaults for there to be any hope of them taking effect ... because of the way package build systems clobber CFLAGS/CPPFLAGS all over the place and in practice can't be overridden globally. So, you'd have a tweak the build scripts for each affected package.
Also, the architecture feature requirements put in the object can look a bit weird--- presumably because the ".fpu" directive is overloaded to describe two different architectural features (VFP and NEON).
If I do this: .arch armv7-a .fpu neon veor d0, d1, d2 .fpu vfp .arch armv4t
then fromelf lists the following attributes for the object:
Attribute Section: aeabi File Attributes Tag_CPU_name: "4T" Tag_CPU_arch: v4T Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-1 Tag_FP_arch: VFPv2 Tag_Advanced_SIMD_arch: NEONv1 Tag_DIV_use: Not allowed
i.e., the baseline for each architectural feature is whatever the last applicable .arch or .fpu directive in the file specified, or the arch required by the instructions present in the file, whichever is the higher.
However, the assembler checks instructions validity line by line, so this:
.arch armv7-a .fpu neon veor d0, d1, d2 .fpu vfp veor d0, d1, d2 .arch armv4t
gives an assembler error, which is sort of what we expect/want:
tst.s: Assembler messages: tst.s:5: Error: selected processor does not support ARM mode `veor d0,d1,d2'
note - only the second veor causes the error here, because NEON instructions are no longer permitted after the ".fpu vfp" directive.
While these tricks might be some use in practice, I'd be cautious about relying on them.
...so...
What's the best way to handle this? Some options are: * Add '.fpu neon' directives to the start of any inline assembly
May work for now, but probably not a great idea, as above; plus there is no easy way way to restore the correct architecture afterwards.
* Separate out the features, so you can specify the capabilities with one option and restrict the compiler to a subset with another. Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
Could work, but might be contraversial. I guess it's for toolchain guys to comment.
* Relax the assembler so that any instructions are accepted. We'd lose some checking of GCC's output though.
LIkely to be contraversial? -- this could be a straightforward fix, but it certainly should never be the default behaviour. And the question of what architecture version requirement attributes get written into the resulting object remains.
What I'd really like on my wishlist is to be able to write something like:
.pusharch .arch armv7-a .fpu neon /* fancy stuff */ .poparch
Where .poparch restores whatever architecture version was in force before .pusharch, and everything between the outermost .pusharch ... .poparch pair is ignored for the purpose of setting the attributes on the object file. This should be safe to use inside inline asm, and appears to fit well with the linux kernel use case and with what you're trying to do.
That doesn't feel like rocket science, but then I'm not a toolchain hacker ;)
Cheers ---Dave