On Fri, Aug 16, 2024 at 05:52:20PM +0200, Jann Horn wrote:
As a heads-up so you don't get surprised by this in the future:
Because clone3() does not pass the flags in a register like clone() does, it is not available in places like docker containers that use the default Docker seccomp policy (https://github.com/moby/moby/blob/master/profiles/seccomp/default.json). Docker uses seccomp to filter clone() arguments (to prevent stuff like namespace creation), and that's not possible with clone3(), so clone3() is blocked.
This is probably fine, the existing shadow stack ABI provides a sensible default behaviour for things that just use regular clone(). This series just adds more control for things using clone3(), the main issue would be anything that *needs* to specify stack size/placement and can't use clone3(). That would need a separate userspace API if required, and we'd still want to extend clone3() anyway.