Re: [PATCH 5.4 32/80] taskstats: Cleanup the use of task->exit_code

25 Feb 2022


      "Dr. Thomas Orgis" thomas.orgis@uni-hamburg.de writes:
...
Am Tue, 22 Feb 2022 17:53:12 -0600
schrieb "Eric W. Biederman" ebiederm@xmission.com:
...
How do you figure?
I admit that I am struggling with understanding where exit codes come
from in the non-usual cases. During my taskstats tests, I played with
writing a multithreaded application that does call pthread_exit() in
the main thread (pid==tgid), for example. I slowly had to learn just
how messy this can be …
Is it clearly defined what the exitcode of a task as part of a process
is/should/can mean, as opposed to the process as a whole?
In the code it is clearly defined.  The decoding is exactly the same
as from an entire process and for a single threaded process there is no
difference.
Linux has a system 2 system calls "exit(2)" and "exit_group(2)" if a
thread exits by itself whatever is passed to exit(2) is the exit code.
What pthread_exit passes to exit(2) I don't know.  I have not been able
to trace glibc that far, and I have not instrumented up a kernel to see.
For threads that are alive when exit_group(2) is called they all get the
same final exit code.
...
...
For single-threaded processes ac_exitcode would always be reasonable,
and be what userspace passed to exit(3).
Yes. That is the one case where we all know what we are dealing with;-)
...
For multi-threaded processes ac_exitcode before my change was set to
some completely arbitrary value for the thread whose tgid == tid.
Isn't the only place where it really makes sense to set the exitcode
when the last task of the process exits? I guess that was the intention
of the earlier code — with the same wrong assumption that I fell victim
to for quite some time: That the group leader (first task, tgid == pid)
always exits last.
I do not know in which cases group member threads have meaningful exit
codes different from the last one (which is the one returned for the
process in whole … ?). I'd love to see the exact reasoning on how
multithreading got mapped into kernel tasks which used to track only
single-threaded processes before.
The internal model in the kernel is there are tasks (which pthreads are
mapped to in a 1-1 fashion).  These tasks were the original process
abstraction.  In the case of CLONE_THREAD these tasks are glued together
into a POSIX process, with shared signal handling.
So from a kernel standpoint as it basically the original process
abstraction it is all well defined what happens when an individual task
exits.
...
...
With my change the value returned
is at least well defined.
But defined to what?
See above.
...
...
Now maybe it would have been better to flag the bug fix with a version
number.  Unfortunately I did not even realize taskstats had a version
number.  I just know the code made no sense.
Well, fixing a bug that has been there from the beginning (of adding
multithreading, at least) is a significant change that one might want
to know about. And I do think that it fits to thouroughly fix these
issues that relate to identifying threads and processes (the shameless
plug of my taskstats patch that I'm working on since 2018, and only got
right in 2022, finally — I hope), while at that.
It looks like the bug was in commit f3cef7a99469 ("[PATCH] csa: basic
accounting over taskstats") in 2006 in 2.6.19-rc1 when taskstats were
added.  That is long after CLONE_THREAD support was added in the 2.5
development kernel.
I have been working to get a single place that code can look to find the
process exit status.  AKA so that the code can always set
SIGNAL_GROUP_EXIT, and look at signal->group_exit_code.  Fixing this was
just part of sorting out the misconceptions, and I didn't realize there
was anyone that paying attention and cared.
I will see if I can find some time to give your taskstats patch a
review.
Eric

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 5.4 32/80] taskstats: Cleanup the use of task->exit_code