4.19-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Herrmann dh.herrmann@gmail.com
commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream.
This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space.
Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar).
By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes.
Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination.
Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Tom Gundersen teg@jklm.no Signed-off-by: David Herrmann dh.herrmann@gmail.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/fork.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -1784,8 +1784,6 @@ static __latent_entropy struct task_stru
posix_cpu_timers_init(p);
- p->start_time = ktime_get_ns(); - p->real_start_time = ktime_get_boot_ns(); p->io_context = NULL; audit_set_context(p, NULL); cgroup_fork(p); @@ -1950,6 +1948,17 @@ static __latent_entropy struct task_stru goto bad_fork_free_pid;
/* + * From this point on we must avoid any synchronous user-space + * communication until we take the tasklist-lock. In particular, we do + * not want user-space to be able to predict the process start-time by + * stalling fork(2) after we recorded the start_time but before it is + * visible to the system. + */ + + p->start_time = ktime_get_ns(); + p->real_start_time = ktime_get_boot_ns(); + + /* * Make it visible to the rest of the system, but dont wake it up yet. * Need tasklist lock for parent etc handling! */