Re: Using inline NEON code

3 Dec 2010


      Hi,
On Thu, Dec 2, 2010 at 9:49 PM, Michael Hope michael.hope@linaro.org wrote:
...
Hi there.  Currently you can't use NEON instructions in inline
assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
-mfpu=vfpv3-d16.  Trying code like this:
int main()
{
  asm("veor d1, d2, d3");
  return 0;
}
gives an error message like:
test.s: Assembler messages:
test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3'
The problem is that -mfpu=vfpv3-d16 has two jobs:  it tells the
compiler what instructions to use, and also tells the assembler what
instructions are valid.  We might want the compiler to use the VFP for
compatibility or power reasons, but still be able to use NEON
instructions in inline assembler without passing extra flags.
We came across a similar case in the kernel just recently... and it's
likely to recur as we try to move toward more unified kernels.
The problem is that the toolchain considers:
a) the architecture baseline _needed_ by an object, and the
architectural features it _may use_ to be one and the same thing.
This is true for C code, but not universally true for assembler
(either inline or not).
b) the architecture baseline _needed_ by the output of the linker to
be the union of all the architecture baselines _needed_ by all the
individual objects linked together.  This is not necessarily true even
for C code.
These conservative assumptions only really support the
fixed-configuration use case; they don't accomodate the concept of
run-time adaptation to CPU features.
For background on the kernel discussion, see this thread -- you'll
have to follow it a bit:
http://ns.spinics.net/lists/arm-kernel/msg105325.html
...
Inserting ".fpu neon" to the start of the inline assembly fixes the
problem.  Is this valid?  Are assembly files with multiple .fpu
statements allowed?  Passing '-Wa,-mfpu=neon' to GCC doesn't work as
gas seems to ignore the second -mfpu.
Strictly speaking, no, because many many points of gas behaviour are
not specified and there's no definition of what should happen if there
are multiple conflicting .arch or .fpu directives.  We'd need a
toolchain expert to pass judgement on this.
Also, changing the arch part way through the file means you're no
longer protected: incorrect code generation by the compiler from that
point onwards may not be detected if it occurs.
Worse, if there were to be a "neonv2" in the future, you would now
unexpectedly downgrade the architecture halfway through the file, so
the assembler may barf on subsequent compiler-generated code... so to
avoid future maintenance problems, a way to restore the "true"
architecture is definitely needed.
In principle, you could change and resture the architecture with the
help of some build system hacks:
gcc -DASM_DEFAULT_ARCH='".arch $(ARCH_VERSION); .fpu $(FP_ARCH_VERSION);"'
asm(
        ".fpu neon\n\t"
        "veor d0, d1, d2\n\t"
        ASM_DEFAULT_ARCH
)
This doesn't sit well with the Debian/Ubuntu way of building things
where we have to build options into the compiler as defaults for there
to be any hope of them taking effect ... because of the way package
build systems clobber CFLAGS/CPPFLAGS all over the place and in
practice can't be overridden globally.  So, you'd have a tweak the
build scripts for each affected package.
Also, the architecture feature requirements put in the object can look
a bit weird--- presumably because the ".fpu" directive is overloaded
to describe two different architectural features (VFP and NEON).
If I do this:
        .arch armv7-a
        .fpu neon
        veor d0, d1, d2
        .fpu vfp
        .arch armv4t
then fromelf lists the following attributes for the object:
Attribute Section: aeabi
File Attributes
  Tag_CPU_name: "4T"
  Tag_CPU_arch: v4T
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-1
  Tag_FP_arch: VFPv2
  Tag_Advanced_SIMD_arch: NEONv1
  Tag_DIV_use: Not allowed
i.e., the baseline for each architectural feature is whatever the last
applicable .arch or .fpu directive in the file specified, or the arch
required by the instructions present in the file, whichever is the
higher.
However, the assembler checks instructions validity line by line, so this:
.arch armv7-a
        .fpu neon
        veor d0, d1, d2
        .fpu vfp
        veor d0, d1, d2
        .arch armv4t
gives an assembler error, which is sort of what we expect/want:
tst.s: Assembler messages:
tst.s:5: Error: selected processor does not support ARM mode `veor d0,d1,d2'
note - only the second veor causes the error here, because NEON
instructions are no longer permitted after the ".fpu vfp" directive.
While these tricks might be some use in practice, I'd be cautious
about relying on them.
...so...
...
What's the best way to handle this? Some options are:
 * Add '.fpu neon' directives to the start of any inline assembly
May work for now, but probably not a great idea, as above; plus there
is no easy way way to restore the correct architecture afterwards.
...
* Separate out the features, so you can specify the capabilities with
one option and restrict the compiler to a subset with another.
Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
Could work, but might be contraversial.  I guess it's for toolchain
guys to comment.
...
* Relax the assembler so that any instructions are accepted.  We'd
lose some checking of GCC's output though.
LIkely to be contraversial? -- this could be a straightforward fix,
but it certainly should never be the default behaviour.  And the
question of what architecture version requirement attributes get
written into the resulting object remains.
What I'd really like on my wishlist is to be able to write something like:
.pusharch
        .arch armv7-a
        .fpu neon
            /* fancy stuff */
        .poparch
Where .poparch restores whatever architecture version was in force
before .pusharch, and everything between the outermost .pusharch ...
.poparch pair is ignored for the purpose of setting the attributes on
the object file.  This should be safe to use inside inline asm, and
appears to fit well with the linux kernel use case and with what
you're trying to do.
That doesn't feel like rocket science, but then I'm not a toolchain hacker ;)
Cheers
---Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Using inline NEON code