On Mon, Dec 15, 2025 at 09:46:19AM +0100, Jan Kara wrote:
On Sat 13-12-25 02:03:56, Chen Linxuan via B4 Relay wrote:
From: Chen Linxuan me@black-desk.cn
When using fsconfig(..., FSCONFIG_CMD_CREATE, ...), the filesystem context is retrieved from the file descriptor. Since the file structure persists across syscall restarts, the context state is preserved:
// fs/fsopen.c SYSCALL_DEFINE5(fsconfig, ...) { ... fc = fd_file(f)->private_data; ... ret = vfs_fsconfig_locked(fc, cmd, ¶m); ... }
In vfs_cmd_create(), the context phase is transitioned to FS_CONTEXT_CREATING before calling vfs_get_tree():
// fs/fsopen.c static int vfs_cmd_create(struct fs_context *fc, bool exclusive) { ... fc->phase = FS_CONTEXT_CREATING; ... ret = vfs_get_tree(fc); ... }
However, vfs_get_tree() may return -ERESTARTNOINTR if the filesystem implementation needs to restart the syscall. For example, cgroup v1 does this when it encounters a race condition where the root is dying:
// kernel/cgroup/cgroup-v1.c int cgroup1_get_tree(struct fs_context *fc) { ... if (unlikely(ret > 0)) { msleep(10); return restart_syscall(); } return ret; }
If the syscall is restarted, fsconfig() is called again and retrieves the *same* fs_context. However, vfs_cmd_create() rejects the call because the phase was left as FS_CONTEXT_CREATING during the first attempt:
Well, not quite. The phase is actually set to FS_CONTEXT_FAILED if vfs_get_tree() returns any error. Still the effect is the same.
Uh, I'm not sure we should do this. If this only affects cgroup v1 then I say we should simply not care at all. It's a deprecated api and anyone using it uses something that is inherently broken and a big portion of userspace has already migrated. The current or upcoming systemd release has dropped all cgroup v1 support.
Generally, making fsconfig() restartable is not as trivial as it looks because once you called into the filesystem the config that was setup might have already been consumed. That's definitely the case for stuff in overlayfs and others. So no, that patch won't work and btw, I remembered that we already had that discussion a few years ago and I was right:
https://lore.kernel.org/20200923201958.b27ecda5a1e788fb5f472bcd@virtuozzo.co...