On Thu, Sep 11, 2025 at 01:36:28PM +0200, Amir Goldstein wrote:
On Thu, Sep 11, 2025 at 11:31 AM Christian Brauner brauner@kernel.org wrote:
On Wed, Sep 10, 2025 at 07:21:22PM +0200, Amir Goldstein wrote:
On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner brauner@kernel.org wrote:
A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful.
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in.
Implement the exhaustive file handles for namespaces as well.
I think you decide to split the "exhaustive" part to another patch, so better drop this paragraph?
Yes, good point. I've dont that.
I am missing an explanation about the permissions for opening these file handles.
My understanding of the code is that the opener needs to meet one of the conditions:
- user has CAP_SYS_ADMIN in the userns owning the opened namespace
- current task is in the opened namespace
Yes.
But I do not fully understand the rationale behind the 2nd condition, that is, when is it useful?
A caller is always able to open a file descriptor to it's own set of namespaces. File handles will behave the same way.
I understand why it's safe, and I do not object to it at all, I just feel that I do not fully understand the use case of how ns file handles are expected to be used. A process can always open /proc/self/ns/mnt What's the use case where a process may need to open its own ns by handle?
I will explain. For CAP_SYS_ADMIN I can see why keeping handles that do not keep an elevated refcount of ns object could be useful in the same way that an NFS client keeps file handles without keeping the file object alive.
But if you do not have CAP_SYS_ADMIN and can only open your own ns by handle, what is the application that could make use of this? and what's the benefit of such application keeping a file handle instead of ns fd?
A process is not always able to open /proc/self/ns/. That requires procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be overmounted. However, they can derive a namespace fd from their own pidfd. And that also always works if it's their own namespace.
There's no need to introduce unnecessary behavioral differences between /proc/self/ns/, pidfd-derived namespace fs, and file-handle-derived namespace fds. That's just going to be confusing.
The other thing is that there are legitimate use-case for encoding your own namespace. For example, you might store file handles to your set of namespaces in a file on-disk so you can verify when you get rexeced that they're still valid and so on. This is akin to the pidfd use-case.
Or just plainly for namespace comparison reasons where you keep a file handle to your own namespaces and can then easily check against others.