On Thu, Aug 15, 2024 at 2:18 AM Edgecombe, Rick P rick.p.edgecombe@intel.com wrote:
On Thu, 2024-08-08 at 09:15 +0100, Mark Brown wrote:
if (access_remote_vm(mm, addr, &val, sizeof(val),
FOLL_FORCE | FOLL_WRITE) != sizeof(val))
goto out;
The GUPs still seem a bit unfortunate for a couple reasons:
- We could do a CMPXCHG version and are just not (I see ARM has identical code
in gcs_consume_token()). It's not the only race like this though FWIW.
- I *think* this is the only unprivileged FOLL_FORCE that can write to the
current process in the kernel. As is, it could be used on normal RO mappings, at least in a limited way. Maybe another point for the VMA check. We'd want to check that it is normal shadow stack?
Yeah, having a FOLL_FORCE write in clone3 would be a weakness for userspace CFI and probably make it possible to violate mseal() restrictions that are supposed to enforce that address space regions are read-only.
- Lingering doubts about the wisdom of doing GUPs during task creation.
I don't think they are show stoppers, but the VMA check would be nice to have in the first upstream support.
[...]
+static void shstk_post_fork(struct task_struct *p,
struct kernel_clone_args *args)
+{
if (!IS_ENABLED(CONFIG_ARCH_HAS_USER_SHADOW_STACK))
return;
if (!args->shadow_stack)
return;
if (arch_shstk_post_fork(p, args) != 0)
force_sig_fault_to_task(SIGSEGV, SEGV_CPERR, NULL, p);
+}
Hmm, is this forcing the signal on the new task, which is set up on a user provided shadow stack that failed the token check? It would handle the signal with an arbitrary SSP then I think. We should probably fail the clone call in the parent instead, which can be done by doing the work in copy_process(). Do you see a problem with doing it at the end of copy_process()? I don't know if there could be ordering constraints.
FWIW I think we have things like force_fatal_sig() and force_exit_sig() to send signals that userspace can't catch with signal handlers - if you have to do the copying after the new task has been set up, something along those lines might be the right way to kill the child.
Though, did anyone in the thread yet suggest that you could do this before the child process has fully materialized but after the child MM has been set up? Somewhere in copy_process() between copy_mm() and the "/* No more failure paths after this point. */" comment?