On May 10, 2020 4:59:17 AM PDT, David Laight David.Laight@ACULAB.COM wrote:
From: Peter Anvin
Sent: 08 May 2020 18:32 On 2020-05-08 10:21, Nick Desaulniers wrote:
One last suggestion. Add the "b" modifier to the mask operand:
"orb
%b1, %0". That forces the compiler to use the 8-bit register name instead of trying to deduce the width from the input.
Ah right:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers
Looks like that works for both compilers. In that case, we can
likely
drop the `& 0xff`, too. Let me play with that, then I'll hopefully send a v3 today.
Good idea. I requested a while ago that they document these
modifiers; they
chose not to document them all which in some ways is good; it shows
what they
are willing to commit to indefinitely.
I thought the intention here was to explicitly do a byte access. If the constant bit number has had a div/mod by 8 done on it then the address can be misaligned - so you mustn't do a non-byte sized locked access.
OTOH the original base address must be aligned.
Looking at some instruction timing, BTS/BTR aren't too bad if the bit number is a constant. But are 6 or 7 clocks slower if it is in %cl. Given these are locked RMW bus cycles they'll always be slow!
How about an asm multi-part alternative that uses a byte offset and byte constant if the compiler thinks the mask is constant or a 4-byte offset and 32bit mask if it doesn't.
The other alternative is to just use BTS/BTS and (maybe) rely on the assembler to add in the word offset to the base address.
David
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
I don't understand what you are getting at here.
The intent is to do a byte access. The "multi-part asm" you are talking about is also already there...