On Fri, Oct 11, 2024 at 12:05:55PM +0100, Lorenzo Stoakes wrote:
The means by which a pid is determined from a pidfd is duplicated, with some callers holding a reference to the (pid)fd, and others explicitly pinning the pid.
Introduce __pidfd_get_pid() which abstracts both approaches and provide optional output parameters for file->f_flags and the fd (the latter of which, if provided, prevents the function from decrementing the fd's refernce count).
Additionally, allow the ability to open a pidfd by opening a /proc/<pid> directory, utilised by the pidfd_send_signal() system call, providing a pidfd_get_pid_proc() helper function to do so.
Doing this allows us to eliminate open-coded pidfd pid lookup and to consistently handle this in one place.
This lays the groundwork for a subsequent patch which adds a new sentinel pidfd to explicitly reference the current process (i.e. thread group leader) without the need for a pidfd.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/linux/pid.h | 42 +++++++++++++++++++++++++++++++- kernel/pid.c | 58 ++++++++++++++++++++++++++++++--------------- kernel/signal.c | 22 ++++------------- 3 files changed, 84 insertions(+), 38 deletions(-)
diff --git a/include/linux/pid.h b/include/linux/pid.h index a3aad9b4074c..68b02eab7509 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -2,6 +2,7 @@ #ifndef _LINUX_PID_H #define _LINUX_PID_H +#include <linux/file.h> #include <linux/pid_types.h> #include <linux/rculist.h> #include <linux/rcupdate.h> @@ -72,8 +73,47 @@ extern struct pid init_struct_pid; struct file;
+/**
- __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
- @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if
@alloc_proc is also set.
- @pin_pid: If set, then the reference counter of the returned pid is
incremented. If not set, then @fd should be provided to pin the
pidfd.
- @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
of a pidfd, and this will be used to determine the pid.
- @flags: Output variable, if non-NULL, then the file->f_flags of the
pidfd will be set here.
- @fd: Output variable, if non-NULL, then the pidfd reference will
remain elevated and the caller will need to decrement it
themselves.
- Returns: If successful, the pid associated with the pidfd, otherwise an
error.
- */
+struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
bool allow_proc, unsigned int *flags,
struct fd *fd);
+static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags) +{
- return __pidfd_get_pid(pidfd, /* pin_pid = */ true,
/* allow_proc = */ false,
flags, /* fd = */ NULL);
+}
+static inline struct pid *pidfd_to_pid_proc(unsigned int pidfd,
unsigned int *flags,
struct fd *fd)
+{
- return __pidfd_get_pid(pidfd, /* pin_pid = */ false,
/* allow_proc = */ true,
flags, fd);
+}
struct pid *pidfd_pid(const struct file *file); -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags); int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret); void do_notify_pidfd(struct task_struct *task); diff --git a/kernel/pid.c b/kernel/pid.c index 2715afb77eab..25cc1c36a1b1 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -36,6 +36,7 @@ #include <linux/pid_namespace.h> #include <linux/init_task.h> #include <linux/syscalls.h> +#include <linux/proc_fs.h> #include <linux/proc_ns.h> #include <linux/refcount.h> #include <linux/anon_inodes.h> @@ -534,22 +535,46 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) } EXPORT_SYMBOL_GPL(find_ge_pid); -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
bool allow_proc, unsigned int *flags,
struct fd *fd)
Hm, we should never return a struct fd. A struct fd is an inherently scoped-bound concept - or at least aims to be. Simply put, we always want to have the fdget() and the fdput() in the same scope as the file pointer you can access via fd_file() is only valid as long as we're in the syscall.
Ideally we mostly use CLASS(fd/fd_raw) and nearly never fdget(). The point is that this is the wrong api to expose.
It would probably be wiser if you added a pidfd based fdget() inspired primitive.