On September 1, 2020 9:18:57 AM PDT, Nadav Amit nadav.amit@gmail.com wrote:
From: Nadav Amit namit@vmware.com
The __force_order logic seems to be inverted. __force_order is supposedly used to manipulate the compiler to use its memory dependencies analysis to enforce orders between CR writes and reads. Therefore, the memory should behave as a "CR": when the CR is read, the memory should be "read" by the inline assembly, and __force_order should be an output. When the CR is written, the memory should be "written".
This change should allow to remove the "volatile" qualifier from CR reads at a later patch.
While at it, remove the extra new-line from the inline assembly, as it only confuses GCC when it estimates the cost of the inline assembly.
Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: x86@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit namit@vmware.com
Unless I misunderstand the logic, __force_order should also be used by rdpkru() and wrpkru() which do not have dependency on __force_order. I also did not understand why native_write_cr0() has R/W dependency on __force_order, and why native_write_cr4() no longer has any dependency on __force_order.
arch/x86/include/asm/special_insns.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 5999b0b3dd4a..dff5e5b01a3c 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -24,32 +24,32 @@ void native_write_cr0(unsigned long val); static inline unsigned long native_read_cr0(void) { unsigned long val;
- asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
- asm volatile("mov %%cr0,%0" : "=r" (val) : "m" (__force_order)); return val;
}
static __always_inline unsigned long native_read_cr2(void) { unsigned long val;
- asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
- asm volatile("mov %%cr2,%0" : "=r" (val) : "m" (__force_order)); return val;
}
static __always_inline void native_write_cr2(unsigned long val) {
- asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
- asm volatile("mov %1,%%cr2" : "=m" (__force_order) : "r" (val));
}
static inline unsigned long __native_read_cr3(void) { unsigned long val;
- asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
- asm volatile("mov %%cr3,%0" : "=r" (val) : "m" (__force_order)); return val;
}
static inline void native_write_cr3(unsigned long val) {
- asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
- asm volatile("mov %1,%%cr3" : "=m" (__force_order) : "r" (val));
}
static inline unsigned long native_read_cr4(void) @@ -64,10 +64,10 @@ static inline unsigned long native_read_cr4(void) asm volatile("1: mov %%cr4, %0\n" "2:\n" _ASM_EXTABLE(1b, 2b)
: "=r" (val), "=m" (__force_order) : "0" (0));
: "=r" (val) : "m" (__force_order), "0" (0));
#else /* CR4 always exists on x86_64. */
- asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
- asm volatile("mov %%cr4,%0" : "=r" (val) : "m" (__force_order));
#endif return val; }
This seems reasonable to me, and I fully agree with you that the logic seems to be just plain wrong (and unnoticed since all volatile operations are strictly ordered anyway), but you better not remove the volatile from cr2 read... unlike the other CRs that one is written by hardware and so genuinely is volatile.
However, I do believe "=m" is at least theoretically wrong. You are only writing *part* of the total state represented by this token (you can think of it as equivalent to a bitop), so it should be "+m".