Re: [PATCH v2 28/33] nsfs: support file handles

15 Sep 2025

      On Mon, Sep 15, 2025 at 03:25:20PM +0200, Jan Kara wrote:
...
On Fri 12-09-25 13:52:51, Christian Brauner wrote:
...
A while ago we added support for file handles to pidfs so pidfds can be
encoded and decoded as file handles. Userspace has adopted this quickly
and it's proven very useful. Implement file handles for namespaces as
well.
A process is not always able to open /proc/self/ns/. That requires
procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be
overmounted. However, userspace can always derive a namespace fd from
a pidfd. And that always works for a task's own namespace.
There's no need to introduce unnecessary behavioral differences between
/proc/self/ns/ fds, pidfd-derived namespace fds, and file-handle-derived
namespace fds. So namespace file handles are always decodable if the
caller is located in the namespace the file handle refers to.
This also allows a task to e.g., store a set of file handles to its
namespaces in a file on-disk so it can verify when it gets rexeced that
they're still valid and so on. This is akin to the pidfd use-case.
Or just plainly for namespace comparison reasons where a file handle to
the task's own namespace can be easily compared against others.
Reviewed-by: Amir Goldstein amir73il@gmail.com
Signed-off-by: Christian Brauner brauner@kernel.org
...
...

switch (ns->ops->type) {

+#ifdef CONFIG_CGROUPS

case CLONE_NEWCGROUP:
if (!current_in_namespace(to_cg_ns(ns)))

	owning_ns = to_cg_ns(ns)->user_ns;

break;

+#endif
+#ifdef CONFIG_IPC_NS

case CLONE_NEWIPC:
if (!current_in_namespace(to_ipc_ns(ns)))

	owning_ns = to_ipc_ns(ns)->user_ns;

break;

+#endif

case CLONE_NEWNS:
if (!current_in_namespace(to_mnt_ns(ns)))

	owning_ns = to_mnt_ns(ns)->user_ns;

break;

+#ifdef CONFIG_NET_NS

case CLONE_NEWNET:
if (!current_in_namespace(to_net_ns(ns)))

	owning_ns = to_net_ns(ns)->user_ns;

break;

+#endif
+#ifdef CONFIG_PID_NS

case CLONE_NEWPID:
if (!current_in_namespace(to_pid_ns(ns))) {

	owning_ns = to_pid_ns(ns)->user_ns;

} else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) {

	ns->ops->put(ns);

	return ERR_PTR(-EPERM);

}

break;

+#endif
+#ifdef CONFIG_TIME_NS

case CLONE_NEWTIME:
if (!current_in_namespace(to_time_ns(ns)))

	owning_ns = to_time_ns(ns)->user_ns;

break;

+#endif
+#ifdef CONFIG_USER_NS

case CLONE_NEWUSER:
if (!current_in_namespace(to_user_ns(ns)))

	owning_ns = to_user_ns(ns);

break;

+#endif
+#ifdef CONFIG_UTS_NS

case CLONE_NEWUTS:
if (!current_in_namespace(to_uts_ns(ns)))

	owning_ns = to_uts_ns(ns)->user_ns;

break;

+#endif
Frankly, switches like these are asking for more Generic usage ;) But ok
for now.
...

default:
return ERR_PTR(-EOPNOTSUPP);

}

if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) {
ns->ops->put(ns);

return ERR_PTR(-EPERM);

}

/* path_from_stashed() unconditionally consumes the reference. */
ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
if (ret)
return ERR_PTR(ret);

return no_free_ptr(path.dentry);

Ugh, so IMO this is very subtle because we declare
struct path path __free(path_put)
but then do no_free_ptr(path.dentry). I really had to lookup implementation
of no_free_ptr() to check whether we are leaking mnt reference here or not
(we are not). But that seems as an implementation detail we shouldn't
better rely on? Wouldn't be:
return dget(path.dentry);
much clearer (and sligthly less efficient, I know, but who cares)?
Fine by me as well!

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 28/33] nsfs: support file handles