2025-03-14T14:39:29-07:00, Deepak Gupta debug@rivosinc.com:
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
RISC-V shadow stack could use mmap() and madvise() perfectly well. Userspace can always initialize the shadow stack properly and the shadow stack memory is never protected from other malicious threads.
I think that the compatibility argument is reasonable. We'd need to modify the other syscalls to allow a write-only mapping anyway.
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{
- /*
* Never expect -1 on shadow stack. Expect return addresses and zero
*/
- unsigned long swap = -1;
- __enable_user_access();
- asm goto(
".option push\n"
".option arch, +zicfiss\n"
Shouldn't compiler accept ssamoswap.d opcode even without zicfiss arch?
"1: ssamoswap.d %[swap], %[val], %[addr]\n"
_ASM_EXTABLE(1b, %l[fault])
RISCV_ACQUIRE_BARRIER
Why is the barrier here?
".option pop\n"
: [swap] "=r" (swap), [addr] "+A" (*addr)
: [val] "r" (val)
: "memory"
: fault
);
- __disable_user_access();
- return swap;
+fault:
- __disable_user_access();
- return -1;
I think we should return 0 and -EFAULT. We can ignore the swapped value, or return it through a pointer.
+}
+static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
unsigned long token_offset, bool set_tok)
+{
- int flags = MAP_ANONYMOUS | MAP_PRIVATE;
Is MAP_GROWSDOWN pointless?
- struct mm_struct *mm = current->mm;
- unsigned long populate, tok_loc = 0;
- if (addr)
flags |= MAP_FIXED_NOREPLACE;
- mmap_write_lock(mm);
- addr = do_mmap(NULL, addr, size, PROT_READ, flags,
PROT_READ implies VM_READ, so won't this select PAGE_COPY in the protection_map instead of PAGE_SHADOWSTACK?
Wouldn't avoiding VM_READ also allow us to get rid of the ugly hack in pte_mkwrite? (VM_WRITE would naturally select the right XWR flags.)
VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
- mmap_write_unlock(mm);
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ [...]
- if (addr && (addr & (PAGE_SIZE - 1)))
if (!PAGE_ALIGNED(addr))