Hi,
This patch series is mainly a rebase on top of v5.12-rc3 and a synchronization with the new mount_setattr(2). A light cleanup of hook_sb_delete() and new tests are also included.
The SLOC count is 1329 for security/landlock/ and 2556 for tools/testing/selftest/landlock/ . Test coverage for security/landlock/ is 93.6% of lines. The code not covered only deals with internal kernel errors (e.g. memory allocation) and race conditions. This series is being fuzzed by syzkaller (which may cover internal kernel errors), and patches are on their way: https://github.com/google/syzkaller/pull/2380
The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v30/userspace-api/landlock.html
This series can be applied on top of v5.12-rc3 . This can be tested with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending "landlock," to CONFIG_LSM. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v30 This patch series seems ready for upstream and I would really appreciate final reviews.
Landlock LSM ------------
The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2].
[1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-... [2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.ne...
Previous versions: v29: https://lore.kernel.org/lkml/20210225190614.2181147-1-mic@digikod.net/ v28: https://lore.kernel.org/lkml/20210202162710.657398-1-mic@digikod.net/ v27: https://lore.kernel.org/lkml/20210121205119.793296-1-mic@digikod.net/ v26: https://lore.kernel.org/lkml/20201209192839.1396820-1-mic@digikod.net/ v25: https://lore.kernel.org/lkml/20201201192322.213239-1-mic@digikod.net/ v24: https://lore.kernel.org/lkml/20201112205141.775752-1-mic@digikod.net/ v23: https://lore.kernel.org/lkml/20201103182109.1014179-1-mic@digikod.net/ v22: https://lore.kernel.org/lkml/20201027200358.557003-1-mic@digikod.net/ v21: https://lore.kernel.org/lkml/20201008153103.1155388-1-mic@digikod.net/ v20: https://lore.kernel.org/lkml/20200802215903.91936-1-mic@digikod.net/ v19: https://lore.kernel.org/lkml/20200707180955.53024-1-mic@digikod.net/ v18: https://lore.kernel.org/lkml/20200526205322.23465-1-mic@digikod.net/ v17: https://lore.kernel.org/lkml/20200511192156.1618284-1-mic@digikod.net/ v16: https://lore.kernel.org/lkml/20200416103955.145757-1-mic@digikod.net/ v15: https://lore.kernel.org/lkml/20200326202731.693608-1-mic@digikod.net/ v14: https://lore.kernel.org/lkml/20200224160215.4136-1-mic@digikod.net/ v13: https://lore.kernel.org/lkml/20191104172146.30797-1-mic@digikod.net/ v12: https://lore.kernel.org/lkml/20191031164445.29426-1-mic@digikod.net/ v11: https://lore.kernel.org/lkml/20191029171505.6650-1-mic@digikod.net/ v10: https://lore.kernel.org/lkml/20190721213116.23476-1-mic@digikod.net/ v9: https://lore.kernel.org/lkml/20190625215239.11136-1-mic@digikod.net/ v8: https://lore.kernel.org/lkml/20180227004121.3633-1-mic@digikod.net/ v7: https://lore.kernel.org/lkml/20170821000933.13024-1-mic@digikod.net/ v6: https://lore.kernel.org/lkml/20170328234650.19695-1-mic@digikod.net/ v5: https://lore.kernel.org/lkml/20170222012632.4196-1-mic@digikod.net/ v4: https://lore.kernel.org/lkml/20161026065654.19166-1-mic@digikod.net/ v3: https://lore.kernel.org/lkml/20160914072415.26021-1-mic@digikod.net/ v2: https://lore.kernel.org/lkml/1472121165-29071-1-git-send-email-mic@digikod.n... v1: https://lore.kernel.org/kernel-hardening/1458784008-16277-1-git-send-email-m...
Casey Schaufler (1): LSM: Infrastructure management of the superblock
Mickaël Salaün (11): landlock: Add object management landlock: Add ruleset and domain management landlock: Set up the security framework and manage credentials landlock: Add ptrace restrictions fs,security: Add sb_delete hook landlock: Support filesystem access-control landlock: Add syscall implementations arch: Wire up Landlock syscalls selftests/landlock: Add user space tests samples/landlock: Add a sandbox manager example landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 + Documentation/security/landlock.rst | 79 + Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/landlock.rst | 307 ++ MAINTAINERS | 15 + arch/Kconfig | 7 + arch/alpha/kernel/syscalls/syscall.tbl | 3 + arch/arm/tools/syscall.tbl | 3 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 6 + arch/ia64/kernel/syscalls/syscall.tbl | 3 + arch/m68k/kernel/syscalls/syscall.tbl | 3 + arch/microblaze/kernel/syscalls/syscall.tbl | 3 + arch/mips/kernel/syscalls/syscall_n32.tbl | 3 + arch/mips/kernel/syscalls/syscall_n64.tbl | 3 + arch/mips/kernel/syscalls/syscall_o32.tbl | 3 + arch/parisc/kernel/syscalls/syscall.tbl | 3 + arch/powerpc/kernel/syscalls/syscall.tbl | 3 + arch/s390/kernel/syscalls/syscall.tbl | 3 + arch/sh/kernel/syscalls/syscall.tbl | 3 + arch/sparc/kernel/syscalls/syscall.tbl | 3 + arch/um/Kconfig | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 3 + arch/x86/entry/syscalls/syscall_64.tbl | 3 + arch/xtensa/kernel/syscalls/syscall.tbl | 3 + fs/super.c | 1 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 4 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- include/uapi/linux/landlock.h | 128 + kernel/sys_ni.c | 5 + samples/Kconfig | 7 + samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 13 + samples/landlock/sandboxer.c | 238 ++ security/Kconfig | 11 +- security/Makefile | 2 + security/landlock/Kconfig | 21 + security/landlock/Makefile | 4 + security/landlock/common.h | 20 + security/landlock/cred.c | 46 + security/landlock/cred.h | 58 + security/landlock/fs.c | 687 ++++ security/landlock/fs.h | 56 + security/landlock/limits.h | 21 + security/landlock/object.c | 67 + security/landlock/object.h | 91 + security/landlock/ptrace.c | 120 + security/landlock/ptrace.h | 14 + security/landlock/ruleset.c | 473 +++ security/landlock/ruleset.h | 165 + security/landlock/setup.c | 40 + security/landlock/setup.h | 18 + security/landlock/syscalls.c | 445 +++ security/security.c | 51 +- security/selinux/hooks.c | 58 +- security/selinux/include/objsec.h | 6 + security/selinux/ss/services.c | 3 +- security/smack/smack.h | 6 + security/smack/smack_lsm.c | 35 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/landlock/.gitignore | 2 + tools/testing/selftests/landlock/Makefile | 24 + tools/testing/selftests/landlock/base_test.c | 219 ++ tools/testing/selftests/landlock/common.h | 183 ++ tools/testing/selftests/landlock/config | 7 + tools/testing/selftests/landlock/fs_test.c | 2792 +++++++++++++++++ .../testing/selftests/landlock/ptrace_test.c | 337 ++ tools/testing/selftests/landlock/true.c | 5 + 72 files changed, 6896 insertions(+), 77 deletions(-) create mode 100644 Documentation/security/landlock.rst create mode 100644 Documentation/userspace-api/landlock.rst create mode 100644 include/uapi/linux/landlock.h create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/common.h create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h create mode 100644 security/landlock/limits.h create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h create mode 100644 security/landlock/syscalls.c create mode 100644 tools/testing/selftests/landlock/.gitignore create mode 100644 tools/testing/selftests/landlock/Makefile create mode 100644 tools/testing/selftests/landlock/base_test.c create mode 100644 tools/testing/selftests/landlock/common.h create mode 100644 tools/testing/selftests/landlock/config create mode 100644 tools/testing/selftests/landlock/fs_test.c create mode 100644 tools/testing/selftests/landlock/ptrace_test.c create mode 100644 tools/testing/selftests/landlock/true.c
base-commit: 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we cannot rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with these constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its objects. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-2-mic@digikod.net ---
Changes since v28: * Improve Kconfig description (suggested by Serge Hallyn). * Add Acked-by Serge Hallyn. * Clean up comment.
Changes since v27: * Update Kconfig for landlock_restrict_self(2). * Cosmetic fixes: use 80 columns in Kconfig and align Makefile declarations.
Changes since v26: * Update Kconfig for landlock_enforce_ruleset_self(2). * Fix spelling.
Changes since v24: * Fix typo in comment (spotted by Jann Horn). * Add Reviewed-by: Jann Horn jannh@google.com
Changes since v23: * Update landlock_create_object() to return error codes instead of NULL. This help error handling in callers. * When using make oldconfig with a previous configuration already including the CONFIG_LSM variable, no question is asked to update its content. Update the Kconfig help to warn about LSM stacking configuration. * Constify variable (spotted by Vincent Dagonneau).
Changes since v22: * Fix spelling (spotted by Jann Horn).
Changes since v21: * Update Kconfig help. * Clean up comments.
Changes since v18: * Account objects to kmemcg.
Changes since v14: * Simplify the object, rule and ruleset management at the expense of a less aggressive memory freeing (contributed by Jann Horn, with additional modifications): - Remove object->list aggregating the rules tied to an object. - Remove landlock_get_object(), landlock_drop_object(), {get,put}_object_cleaner() and landlock_rule_is_disabled(). - Rewrite landlock_put_object() to use a more simple mechanism (no tricky RCU). - Replace enum landlock_object_type and landlock_release_object() with landlock_object_underops->release() - Adjust unions and Sparse annotations. Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0... * Merge struct landlock_rule into landlock_ruleset_elem to simplify the rule management. * Constify variables. * Improve kernel documentation. * Cosmetic variable renames. * Remove the "default" in the Kconfig (suggested by Jann Horn). * Only use refcount_inc() through getter helpers. * Update Kconfig description.
Changes since v13: * New dedicated implementation, removing the need for eBPF.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/ --- MAINTAINERS | 10 +++++ security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 21 +++++++++ security/landlock/Makefile | 3 ++ security/landlock/object.c | 67 ++++++++++++++++++++++++++++ security/landlock/object.h | 91 ++++++++++++++++++++++++++++++++++++++ 7 files changed, 195 insertions(+) create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h
diff --git a/MAINTAINERS b/MAINTAINERS index aa84121c5611..87a2738dfdec 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9997,6 +9997,16 @@ F: net/core/sock_map.c F: net/ipv4/tcp_bpf.c F: net/ipv4/udp_bpf.c
+LANDLOCK SECURITY MODULE +M: Mickaël Salaün mic@digikod.net +L: linux-security-module@vger.kernel.org +S: Supported +W: https://landlock.io +T: git https://github.com/landlock-lsm/linux.git +F: security/landlock/ +K: landlock +K: LANDLOCK + LANTIQ / INTEL Ethernet drivers M: Hauke Mehrtens hauke@hauke-m.de L: netdev@vger.kernel.org diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..15a4342b5d01 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -238,6 +238,7 @@ source "security/loadpin/Kconfig" source "security/yama/Kconfig" source "security/safesetid/Kconfig" source "security/lockdown/Kconfig" +source "security/landlock/Kconfig"
source "security/integrity/Kconfig"
diff --git a/security/Makefile b/security/Makefile index 3baf435de541..47e432900e24 100644 --- a/security/Makefile +++ b/security/Makefile @@ -13,6 +13,7 @@ subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown subdir-$(CONFIG_BPF_LSM) += bpf +subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock
# always enable default capabilities obj-y += commoncap.o @@ -32,6 +33,7 @@ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ obj-$(CONFIG_CGROUPS) += device_cgroup.o obj-$(CONFIG_BPF_LSM) += bpf/ +obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/
# Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..c1e862a38410 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config SECURITY_LANDLOCK + bool "Landlock support" + depends on SECURITY + select SECURITY_PATH + help + Landlock is a sandboxing mechanism that enables processes to restrict + themselves (and their future children) by gradually enforcing + tailored access control policies. A Landlock security policy is a + set of access rights (e.g. open a file in read-only, make a + directory, etc.) tied to a file hierarchy. Such policy can be + configured and enforced by any processes for themselves using the + dedicated system calls: landlock_create_ruleset(), + landlock_add_rule(), and landlock_restrict_self(). + + See Documentation/userspace-api/landlock.rst for further information. + + If you are unsure how to answer this question, answer N. Otherwise, + you should also prepend "landlock," to the content of CONFIG_LSM to + enable Landlock at boot time. diff --git a/security/landlock/Makefile b/security/landlock/Makefile new file mode 100644 index 000000000000..cb6deefbf4c0 --- /dev/null +++ b/security/landlock/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o + +landlock-y := object.o diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..d674fdf9ff04 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Object management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/bug.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/kernel.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> + +#include "object.h" + +struct landlock_object *landlock_create_object( + const struct landlock_object_underops *const underops, + void *const underobj) +{ + struct landlock_object *new_object; + + if (WARN_ON_ONCE(!underops || !underobj)) + return ERR_PTR(-ENOENT); + new_object = kzalloc(sizeof(*new_object), GFP_KERNEL_ACCOUNT); + if (!new_object) + return ERR_PTR(-ENOMEM); + refcount_set(&new_object->usage, 1); + spin_lock_init(&new_object->lock); + new_object->underops = underops; + new_object->underobj = underobj; + return new_object; +} + +/* + * The caller must own the object (i.e. thanks to object->usage) to safely put + * it. + */ +void landlock_put_object(struct landlock_object *const object) +{ + /* + * The call to @object->underops->release(object) might sleep, e.g. + * because of iput(). + */ + might_sleep(); + if (!object) + return; + + /* + * If the @object's refcount cannot drop to zero, we can just decrement + * the refcount without holding a lock. Otherwise, the decrement must + * happen under @object->lock for synchronization with things like + * get_inode_object(). + */ + if (refcount_dec_and_lock(&object->usage, &object->lock)) { + __acquire(&object->lock); + /* + * With @object->lock initially held, remove the reference from + * @object->underobj to @object (if it still exists). + */ + object->underops->release(object); + kfree_rcu(object, rcu_free); + } +} diff --git a/security/landlock/object.h b/security/landlock/object.h new file mode 100644 index 000000000000..3e5d5b6941c3 --- /dev/null +++ b/security/landlock/object.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Object management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_OBJECT_H +#define _SECURITY_LANDLOCK_OBJECT_H + +#include <linux/compiler_types.h> +#include <linux/refcount.h> +#include <linux/spinlock.h> + +struct landlock_object; + +/** + * struct landlock_object_underops - Operations on an underlying object + */ +struct landlock_object_underops { + /** + * @release: Releases the underlying object (e.g. iput() for an inode). + */ + void (*release)(struct landlock_object *const object) + __releases(object->lock); +}; + +/** + * struct landlock_object - Security blob tied to a kernel object + * + * The goal of this structure is to enable to tie a set of ephemeral access + * rights (pertaining to different domains) to a kernel object (e.g an inode) + * in a safe way. This implies to handle concurrent use and modification. + * + * The lifetime of a &struct landlock_object depends of the rules referring to + * it. + */ +struct landlock_object { + /** + * @usage: This counter is used to tie an object to the rules matching + * it or to keep it alive while adding a new rule. If this counter + * reaches zero, this struct must not be modified, but this counter can + * still be read from within an RCU read-side critical section. When + * adding a new rule to an object with a usage counter of zero, we must + * wait until the pointer to this object is set to NULL (or recycled). + */ + refcount_t usage; + /** + * @lock: Protects against concurrent modifications. This lock must be + * held from the time @usage drops to zero until any weak references + * from @underobj to this object have been cleaned up. + * + * Lock ordering: inode->i_lock nests inside this. + */ + spinlock_t lock; + /** + * @underobj: Used when cleaning up an object and to mark an object as + * tied to its underlying kernel structure. This pointer is protected + * by @lock. Cf. landlock_release_inodes() and release_inode(). + */ + void *underobj; + union { + /** + * @rcu_free: Enables lockless use of @usage, @lock and + * @underobj from within an RCU read-side critical section. + * @rcu_free and @underops are only used by + * landlock_put_object(). + */ + struct rcu_head rcu_free; + /** + * @underops: Enables landlock_put_object() to release the + * underlying object (e.g. inode). + */ + const struct landlock_object_underops *underops; + }; +}; + +struct landlock_object *landlock_create_object( + const struct landlock_object_underops *const underops, + void *const underobj); + +void landlock_put_object(struct landlock_object *const object); + +static inline void landlock_get_object(struct landlock_object *const object) +{ + if (object) + refcount_inc(&object->usage); +} + +#endif /* _SECURITY_LANDLOCK_OBJECT_H */
On Tue, Mar 16, 2021 at 09:42:41PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we cannot rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with these constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its objects. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-2-mic@digikod.net
Changes since v28:
- Improve Kconfig description (suggested by Serge Hallyn).
- Add Acked-by Serge Hallyn.
- Clean up comment.
Changes since v27:
- Update Kconfig for landlock_restrict_self(2).
- Cosmetic fixes: use 80 columns in Kconfig and align Makefile declarations.
Changes since v26:
- Update Kconfig for landlock_enforce_ruleset_self(2).
- Fix spelling.
Changes since v24:
- Fix typo in comment (spotted by Jann Horn).
- Add Reviewed-by: Jann Horn jannh@google.com
Changes since v23:
- Update landlock_create_object() to return error codes instead of NULL. This help error handling in callers.
- When using make oldconfig with a previous configuration already including the CONFIG_LSM variable, no question is asked to update its content. Update the Kconfig help to warn about LSM stacking configuration.
- Constify variable (spotted by Vincent Dagonneau).
Changes since v22:
- Fix spelling (spotted by Jann Horn).
Changes since v21:
- Update Kconfig help.
- Clean up comments.
Changes since v18:
- Account objects to kmemcg.
Changes since v14:
- Simplify the object, rule and ruleset management at the expense of a less aggressive memory freeing (contributed by Jann Horn, with additional modifications):
Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0...
- Remove object->list aggregating the rules tied to an object.
- Remove landlock_get_object(), landlock_drop_object(), {get,put}_object_cleaner() and landlock_rule_is_disabled().
- Rewrite landlock_put_object() to use a more simple mechanism (no tricky RCU).
- Replace enum landlock_object_type and landlock_release_object() with landlock_object_underops->release()
- Adjust unions and Sparse annotations.
- Merge struct landlock_rule into landlock_ruleset_elem to simplify the rule management.
- Constify variables.
- Improve kernel documentation.
- Cosmetic variable renames.
- Remove the "default" in the Kconfig (suggested by Jann Horn).
- Only use refcount_inc() through getter helpers.
- Update Kconfig description.
Changes since v13:
- New dedicated implementation, removing the need for eBPF.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
MAINTAINERS | 10 +++++ security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 21 +++++++++ security/landlock/Makefile | 3 ++ security/landlock/object.c | 67 ++++++++++++++++++++++++++++ security/landlock/object.h | 91 ++++++++++++++++++++++++++++++++++++++ 7 files changed, 195 insertions(+) create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h
diff --git a/MAINTAINERS b/MAINTAINERS index aa84121c5611..87a2738dfdec 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9997,6 +9997,16 @@ F: net/core/sock_map.c F: net/ipv4/tcp_bpf.c F: net/ipv4/udp_bpf.c +LANDLOCK SECURITY MODULE +M: Mickaël Salaün mic@digikod.net +L: linux-security-module@vger.kernel.org +S: Supported +W: https://landlock.io +T: git https://github.com/landlock-lsm/linux.git +F: security/landlock/ +K: landlock +K: LANDLOCK
LANTIQ / INTEL Ethernet drivers M: Hauke Mehrtens hauke@hauke-m.de L: netdev@vger.kernel.org diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..15a4342b5d01 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -238,6 +238,7 @@ source "security/loadpin/Kconfig" source "security/yama/Kconfig" source "security/safesetid/Kconfig" source "security/lockdown/Kconfig" +source "security/landlock/Kconfig" source "security/integrity/Kconfig" diff --git a/security/Makefile b/security/Makefile index 3baf435de541..47e432900e24 100644 --- a/security/Makefile +++ b/security/Makefile @@ -13,6 +13,7 @@ subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown subdir-$(CONFIG_BPF_LSM) += bpf +subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock # always enable default capabilities obj-y += commoncap.o @@ -32,6 +33,7 @@ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ obj-$(CONFIG_CGROUPS) += device_cgroup.o obj-$(CONFIG_BPF_LSM) += bpf/ +obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/ # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..c1e862a38410 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_LANDLOCK
- bool "Landlock support"
- depends on SECURITY
- select SECURITY_PATH
- help
Landlock is a sandboxing mechanism that enables processes to restrict
themselves (and their future children) by gradually enforcing
tailored access control policies. A Landlock security policy is a
set of access rights (e.g. open a file in read-only, make a
directory, etc.) tied to a file hierarchy. Such policy can be
configured and enforced by any processes for themselves using the
dedicated system calls: landlock_create_ruleset(),
landlock_add_rule(), and landlock_restrict_self().
See Documentation/userspace-api/landlock.rst for further information.
If you are unsure how to answer this question, answer N. Otherwise,
you should also prepend "landlock," to the content of CONFIG_LSM to
enable Landlock at boot time.
diff --git a/security/landlock/Makefile b/security/landlock/Makefile new file mode 100644 index 000000000000..cb6deefbf4c0 --- /dev/null +++ b/security/landlock/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+landlock-y := object.o diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..d674fdf9ff04 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Landlock LSM - Object management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#include <linux/bug.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/kernel.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h>
+#include "object.h"
+struct landlock_object *landlock_create_object(
const struct landlock_object_underops *const underops,
void *const underobj)
+{
- struct landlock_object *new_object;
- if (WARN_ON_ONCE(!underops || !underobj))
return ERR_PTR(-ENOENT);
- new_object = kzalloc(sizeof(*new_object), GFP_KERNEL_ACCOUNT);
Is there any benefit to using a dedicated kmem_cache instead of kmalloc? I see later that you end up with variable-sized allocations, so this might be fine as-is.
- if (!new_object)
return ERR_PTR(-ENOMEM);
- refcount_set(&new_object->usage, 1);
- spin_lock_init(&new_object->lock);
- new_object->underops = underops;
- new_object->underobj = underobj;
- return new_object;
+}
+/*
- The caller must own the object (i.e. thanks to object->usage) to safely put
- it.
- */
+void landlock_put_object(struct landlock_object *const object) +{
- /*
* The call to @object->underops->release(object) might sleep, e.g.
* because of iput().
*/
- might_sleep();
- if (!object)
return;
- /*
* If the @object's refcount cannot drop to zero, we can just decrement
* the refcount without holding a lock. Otherwise, the decrement must
* happen under @object->lock for synchronization with things like
* get_inode_object().
*/
- if (refcount_dec_and_lock(&object->usage, &object->lock)) {
__acquire(&object->lock);
/*
* With @object->lock initially held, remove the reference from
* @object->underobj to @object (if it still exists).
*/
object->underops->release(object);
kfree_rcu(object, rcu_free);
- }
+} diff --git a/security/landlock/object.h b/security/landlock/object.h new file mode 100644 index 000000000000..3e5d5b6941c3 --- /dev/null +++ b/security/landlock/object.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Landlock LSM - Object management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#ifndef _SECURITY_LANDLOCK_OBJECT_H +#define _SECURITY_LANDLOCK_OBJECT_H
+#include <linux/compiler_types.h> +#include <linux/refcount.h> +#include <linux/spinlock.h>
+struct landlock_object;
+/**
- struct landlock_object_underops - Operations on an underlying object
- */
+struct landlock_object_underops {
- /**
* @release: Releases the underlying object (e.g. iput() for an inode).
*/
- void (*release)(struct landlock_object *const object)
__releases(object->lock);
+};
+/**
- struct landlock_object - Security blob tied to a kernel object
- The goal of this structure is to enable to tie a set of ephemeral access
- rights (pertaining to different domains) to a kernel object (e.g an inode)
- in a safe way. This implies to handle concurrent use and modification.
- The lifetime of a &struct landlock_object depends of the rules referring to
- it.
- */
+struct landlock_object {
- /**
* @usage: This counter is used to tie an object to the rules matching
* it or to keep it alive while adding a new rule. If this counter
* reaches zero, this struct must not be modified, but this counter can
* still be read from within an RCU read-side critical section. When
* adding a new rule to an object with a usage counter of zero, we must
* wait until the pointer to this object is set to NULL (or recycled).
*/
- refcount_t usage;
- /**
* @lock: Protects against concurrent modifications. This lock must be
* held from the time @usage drops to zero until any weak references
* from @underobj to this object have been cleaned up.
*
* Lock ordering: inode->i_lock nests inside this.
*/
- spinlock_t lock;
- /**
* @underobj: Used when cleaning up an object and to mark an object as
* tied to its underlying kernel structure. This pointer is protected
* by @lock. Cf. landlock_release_inodes() and release_inode().
*/
- void *underobj;
- union {
/**
* @rcu_free: Enables lockless use of @usage, @lock and
* @underobj from within an RCU read-side critical section.
* @rcu_free and @underops are only used by
* landlock_put_object().
*/
struct rcu_head rcu_free;
/**
* @underops: Enables landlock_put_object() to release the
* underlying object (e.g. inode).
*/
const struct landlock_object_underops *underops;
- };
+};
+struct landlock_object *landlock_create_object(
const struct landlock_object_underops *const underops,
void *const underobj);
+void landlock_put_object(struct landlock_object *const object);
+static inline void landlock_get_object(struct landlock_object *const object) +{
- if (object)
refcount_inc(&object->usage);
+}
+#endif /* _SECURITY_LANDLOCK_OBJECT_H */
2.30.2
Either way, this object lifetime management looks okay to me:
Reviewed-by: Kees Cook keescook@chromium.org
On 19/03/2021 19:13, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:41PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we cannot rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with these constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its objects. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-2-mic@digikod.net
Changes since v28:
- Improve Kconfig description (suggested by Serge Hallyn).
- Add Acked-by Serge Hallyn.
- Clean up comment.
Changes since v27:
- Update Kconfig for landlock_restrict_self(2).
- Cosmetic fixes: use 80 columns in Kconfig and align Makefile declarations.
Changes since v26:
- Update Kconfig for landlock_enforce_ruleset_self(2).
- Fix spelling.
Changes since v24:
- Fix typo in comment (spotted by Jann Horn).
- Add Reviewed-by: Jann Horn jannh@google.com
Changes since v23:
- Update landlock_create_object() to return error codes instead of NULL. This help error handling in callers.
- When using make oldconfig with a previous configuration already including the CONFIG_LSM variable, no question is asked to update its content. Update the Kconfig help to warn about LSM stacking configuration.
- Constify variable (spotted by Vincent Dagonneau).
Changes since v22:
- Fix spelling (spotted by Jann Horn).
Changes since v21:
- Update Kconfig help.
- Clean up comments.
Changes since v18:
- Account objects to kmemcg.
Changes since v14:
- Simplify the object, rule and ruleset management at the expense of a less aggressive memory freeing (contributed by Jann Horn, with additional modifications):
Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0...
- Remove object->list aggregating the rules tied to an object.
- Remove landlock_get_object(), landlock_drop_object(), {get,put}_object_cleaner() and landlock_rule_is_disabled().
- Rewrite landlock_put_object() to use a more simple mechanism (no tricky RCU).
- Replace enum landlock_object_type and landlock_release_object() with landlock_object_underops->release()
- Adjust unions and Sparse annotations.
- Merge struct landlock_rule into landlock_ruleset_elem to simplify the rule management.
- Constify variables.
- Improve kernel documentation.
- Cosmetic variable renames.
- Remove the "default" in the Kconfig (suggested by Jann Horn).
- Only use refcount_inc() through getter helpers.
- Update Kconfig description.
Changes since v13:
- New dedicated implementation, removing the need for eBPF.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/
MAINTAINERS | 10 +++++ security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 21 +++++++++ security/landlock/Makefile | 3 ++ security/landlock/object.c | 67 ++++++++++++++++++++++++++++ security/landlock/object.h | 91 ++++++++++++++++++++++++++++++++++++++ 7 files changed, 195 insertions(+) create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h
diff --git a/MAINTAINERS b/MAINTAINERS index aa84121c5611..87a2738dfdec 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9997,6 +9997,16 @@ F: net/core/sock_map.c F: net/ipv4/tcp_bpf.c F: net/ipv4/udp_bpf.c +LANDLOCK SECURITY MODULE +M: Mickaël Salaün mic@digikod.net +L: linux-security-module@vger.kernel.org +S: Supported +W: https://landlock.io +T: git https://github.com/landlock-lsm/linux.git +F: security/landlock/ +K: landlock +K: LANDLOCK
LANTIQ / INTEL Ethernet drivers M: Hauke Mehrtens hauke@hauke-m.de L: netdev@vger.kernel.org diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..15a4342b5d01 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -238,6 +238,7 @@ source "security/loadpin/Kconfig" source "security/yama/Kconfig" source "security/safesetid/Kconfig" source "security/lockdown/Kconfig" +source "security/landlock/Kconfig" source "security/integrity/Kconfig" diff --git a/security/Makefile b/security/Makefile index 3baf435de541..47e432900e24 100644 --- a/security/Makefile +++ b/security/Makefile @@ -13,6 +13,7 @@ subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown subdir-$(CONFIG_BPF_LSM) += bpf +subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock # always enable default capabilities obj-y += commoncap.o @@ -32,6 +33,7 @@ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ obj-$(CONFIG_CGROUPS) += device_cgroup.o obj-$(CONFIG_BPF_LSM) += bpf/ +obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/ # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..c1e862a38410 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_LANDLOCK
- bool "Landlock support"
- depends on SECURITY
- select SECURITY_PATH
- help
Landlock is a sandboxing mechanism that enables processes to restrict
themselves (and their future children) by gradually enforcing
tailored access control policies. A Landlock security policy is a
set of access rights (e.g. open a file in read-only, make a
directory, etc.) tied to a file hierarchy. Such policy can be
configured and enforced by any processes for themselves using the
dedicated system calls: landlock_create_ruleset(),
landlock_add_rule(), and landlock_restrict_self().
See Documentation/userspace-api/landlock.rst for further information.
If you are unsure how to answer this question, answer N. Otherwise,
you should also prepend "landlock," to the content of CONFIG_LSM to
enable Landlock at boot time.
diff --git a/security/landlock/Makefile b/security/landlock/Makefile new file mode 100644 index 000000000000..cb6deefbf4c0 --- /dev/null +++ b/security/landlock/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+landlock-y := object.o diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..d674fdf9ff04 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Landlock LSM - Object management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#include <linux/bug.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/kernel.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h>
+#include "object.h"
+struct landlock_object *landlock_create_object(
const struct landlock_object_underops *const underops,
void *const underobj)
+{
- struct landlock_object *new_object;
- if (WARN_ON_ONCE(!underops || !underobj))
return ERR_PTR(-ENOENT);
- new_object = kzalloc(sizeof(*new_object), GFP_KERNEL_ACCOUNT);
Is there any benefit to using a dedicated kmem_cache instead of kmalloc? I see later that you end up with variable-sized allocations, so this might be fine as-is.
I though about that, but for the sake of simplicity I choose to not use that in this series. It may comes with a future update if there is a visible performance improvement though.
- if (!new_object)
return ERR_PTR(-ENOMEM);
- refcount_set(&new_object->usage, 1);
- spin_lock_init(&new_object->lock);
- new_object->underops = underops;
- new_object->underobj = underobj;
- return new_object;
+}
+/*
- The caller must own the object (i.e. thanks to object->usage) to safely put
- it.
- */
+void landlock_put_object(struct landlock_object *const object) +{
- /*
* The call to @object->underops->release(object) might sleep, e.g.
* because of iput().
*/
- might_sleep();
- if (!object)
return;
- /*
* If the @object's refcount cannot drop to zero, we can just decrement
* the refcount without holding a lock. Otherwise, the decrement must
* happen under @object->lock for synchronization with things like
* get_inode_object().
*/
- if (refcount_dec_and_lock(&object->usage, &object->lock)) {
__acquire(&object->lock);
/*
* With @object->lock initially held, remove the reference from
* @object->underobj to @object (if it still exists).
*/
object->underops->release(object);
kfree_rcu(object, rcu_free);
- }
+} diff --git a/security/landlock/object.h b/security/landlock/object.h new file mode 100644 index 000000000000..3e5d5b6941c3 --- /dev/null +++ b/security/landlock/object.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Landlock LSM - Object management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#ifndef _SECURITY_LANDLOCK_OBJECT_H +#define _SECURITY_LANDLOCK_OBJECT_H
+#include <linux/compiler_types.h> +#include <linux/refcount.h> +#include <linux/spinlock.h>
+struct landlock_object;
+/**
- struct landlock_object_underops - Operations on an underlying object
- */
+struct landlock_object_underops {
- /**
* @release: Releases the underlying object (e.g. iput() for an inode).
*/
- void (*release)(struct landlock_object *const object)
__releases(object->lock);
+};
+/**
- struct landlock_object - Security blob tied to a kernel object
- The goal of this structure is to enable to tie a set of ephemeral access
- rights (pertaining to different domains) to a kernel object (e.g an inode)
- in a safe way. This implies to handle concurrent use and modification.
- The lifetime of a &struct landlock_object depends of the rules referring to
- it.
- */
+struct landlock_object {
- /**
* @usage: This counter is used to tie an object to the rules matching
* it or to keep it alive while adding a new rule. If this counter
* reaches zero, this struct must not be modified, but this counter can
* still be read from within an RCU read-side critical section. When
* adding a new rule to an object with a usage counter of zero, we must
* wait until the pointer to this object is set to NULL (or recycled).
*/
- refcount_t usage;
- /**
* @lock: Protects against concurrent modifications. This lock must be
* held from the time @usage drops to zero until any weak references
* from @underobj to this object have been cleaned up.
*
* Lock ordering: inode->i_lock nests inside this.
*/
- spinlock_t lock;
- /**
* @underobj: Used when cleaning up an object and to mark an object as
* tied to its underlying kernel structure. This pointer is protected
* by @lock. Cf. landlock_release_inodes() and release_inode().
*/
- void *underobj;
- union {
/**
* @rcu_free: Enables lockless use of @usage, @lock and
* @underobj from within an RCU read-side critical section.
* @rcu_free and @underops are only used by
* landlock_put_object().
*/
struct rcu_head rcu_free;
/**
* @underops: Enables landlock_put_object() to release the
* underlying object (e.g. inode).
*/
const struct landlock_object_underops *underops;
- };
+};
+struct landlock_object *landlock_create_object(
const struct landlock_object_underops *const underops,
void *const underobj);
+void landlock_put_object(struct landlock_object *const object);
+static inline void landlock_get_object(struct landlock_object *const object) +{
- if (object)
refcount_inc(&object->usage);
+}
+#endif /* _SECURITY_LANDLOCK_OBJECT_H */
2.30.2
Either way, this object lifetime management looks okay to me:
Reviewed-by: Kees Cook keescook@chromium.org
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock ruleset is mainly a red-black tree with Landlock rules as nodes. This enables quick update and lookup to match a requested access, e.g. to a file. A ruleset is usable through a dedicated file descriptor (cf. following commit implementing syscalls) which enables a process to create and populate a ruleset with new rules.
A domain is a ruleset tied to a set of processes. This group of rules defines the security policy enforced on these processes and their future children. A domain can transition to a new domain which is the intersection of all its constraints and those of a ruleset provided by the current process. This modification only impact the current process. This means that a process can only gain more constraints (i.e. lose accesses) over time.
Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net ---
Changes since v28: * Add Acked-by Serge Hallyn. * Clean up comment.
Changes since v27: * Fix domains with layers of non-overlapping access rights. * Add stricter limit checks (same semantic). * Change the grow direction of a rule layer stack to make it the same as the new ruleset fs_access_masks stack (cosmetic change). * Cosmetic fix for a comment block.
Changes since v26: * Fix spelling.
Changes since v25: * Add build-time checks for the num_layers and num_rules variables according to LANDLOCK_MAX_NUM_LAYERS and LANDLOCK_MAX_NUM_RULES, and move these limits to a dedicated file. * Cosmetic variable renames.
Changes since v24: * Update struct landlock_rule with a layer stack. This reverts "Always intersect access rights" from v24 and also adds the ability to tie access rights with their policy layer. As noted by Jann Horn, always intersecting access rights made some use cases uselessly more difficult to handle in user space. Thanks to this new stack, we still have a deterministic policy behavior whatever their level in the stack of policies, while using a "union" of accesses when building a ruleset. The implementation use a FAM to keep the access checks quick and memory efficient (4 bytes per layer per inode). Update insert_rule() accordingly.
Changes since v23: * Always intersect access rights. Following the filesystem change logic, make ruleset updates more consistent by always intersecting access rights (boolean AND) instead of combining them (boolean OR) for the same layer. This defensive approach could also help avoid user space to inadvertently allow multiple access rights for the same object (e.g. write and execute access on a path hierarchy) instead of dealing with such inconsistency. This can happen when there is no deduplication of objects (e.g. paths and underlying inodes) whereas they get different access rights with landlock_add_rule(2). * Add extra checks to make sure that: - there is always an (allocated) object in each used rules; - when updating a ruleset with a new rule (i.e. not merging two rulesets), the ruleset doesn't contain multiple layers. * Hide merge parameter from the public landlock_insert_rule() API. This helps avoid misuse of this function. * Replace a remaining hardcoded 1 with SINGLE_DEPTH_NESTING.
Changes since v22: * Explicitely use RB_ROOT and SINGLE_DEPTH_NESTING (suggested by Jann Horn). * Improve comments and fix spelling (suggested by Jann Horn).
Changes since v21: * Add and clean up comments.
Changes since v18: * Account rulesets to kmemcg. * Remove struct holes. * Cosmetic changes.
Changes since v17: * Move include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* to a following patch.
Changes since v16: * Allow enforcement of empty ruleset, which enables deny-all policies.
Changes since v15: * Replace layer_levels and layer_depth with a bitfield of layers, cf. filesystem commit. * Rename the LANDLOCK_ACCESS_FS_{UNLINK,RMDIR} with LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} because it makes sense to use them for the action of renaming a file or a directory, which may lead to the removal of the source file or directory. Removes the LANDLOCK_ACCESS_FS_{LINK_TO,RENAME_FROM,RENAME_TO} which are now replaced with LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and LANDLOCK_ACCESS_FS_MAKE_* . * Update the documentation accordingly and highlight how the access rights are taken into account. * Change nb_rules from atomic_t to u32 because it is not use anymore by show_fdinfo(). * Add safeguard for level variables types. * Check max number of rules. * Replace struct landlock_access (self and beneath bitfields) with one bitfield. * Remove useless variable. * Add comments.
Changes since v14: * Simplify the object, rule and ruleset management at the expense of a less aggressive memory freeing (contributed by Jann Horn, with additional modifications): - Make a domain immutable (remove the opportunistic cleaning). - Remove RCU pointers. - Merge struct landlock_ref and struct landlock_ruleset_elem into landlock_rule: get ride of rule's RCU. - Adjust union. - Remove the landlock_insert_rule() check about a new object with the same address as a previously disabled one, because it is not possible to disable a rule anymore. Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0... * Fix nested domains by implementing a notion of layer level and depth: - Update landlock_insert_rule() to manage such layers. - Add an inherit_ruleset() helper to properly create a new domain. - Rename landlock_find_access() to landlock_find_rule() and return a full rule reference. - Add a layer_level and a layer_depth fields to struct landlock_rule. - Add a top_layer_level field to struct landlock_ruleset. * Remove access rights that may be required for FD-only requests: truncate, getattr, lock, chmod, chown, chgrp, ioctl. This will be handle in a future evolution of Landlock, but right now the goal is to lighten the code to ease review. * Remove LANDLOCK_ACCESS_FS_OPEN and rename LANDLOCK_ACCESS_FS_{READ,WRITE} with a FILE suffix. * Rename LANDLOCK_ACCESS_FS_READDIR to match the *_FILE pattern. * Remove LANDLOCK_ACCESS_FS_MAP which was useless. * Fix memory leak in put_hierarchy() (reported by Jann Horn). * Fix user-after-free and rename free_ruleset() (reported by Jann Horn). * Replace the for loops with rbtree_postorder_for_each_entry_safe(). * Constify variables. * Only use refcount_inc() through getter helpers. * Change Landlock_insert_ruleset_access() to Landlock_insert_ruleset_rule(). * Rename landlock_put_ruleset_enqueue() to landlock_put_ruleset_deferred(). * Improve kernel documentation and add a warning about the unhandled access/syscall families. * Move ABI check to syscall.c .
Changes since v13: * New implementation, inspired by the previous inode eBPF map, but agnostic to the underlying kernel object.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-7-mic@digikod.net/ --- security/landlock/Makefile | 2 +- security/landlock/limits.h | 17 ++ security/landlock/ruleset.c | 469 ++++++++++++++++++++++++++++++++++++ security/landlock/ruleset.h | 165 +++++++++++++ 4 files changed, 652 insertions(+), 1 deletion(-) create mode 100644 security/landlock/limits.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h
diff --git a/security/landlock/Makefile b/security/landlock/Makefile index cb6deefbf4c0..d846eba445bb 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := object.o +landlock-y := object.o ruleset.o diff --git a/security/landlock/limits.h b/security/landlock/limits.h new file mode 100644 index 000000000000..b734f597bb0e --- /dev/null +++ b/security/landlock/limits.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Limits for different components + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_LIMITS_H +#define _SECURITY_LANDLOCK_LIMITS_H + +#include <linux/limits.h> + +#define LANDLOCK_MAX_NUM_LAYERS 64 +#define LANDLOCK_MAX_NUM_RULES U32_MAX + +#endif /* _SECURITY_LANDLOCK_LIMITS_H */ diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c new file mode 100644 index 000000000000..59c86126ea1c --- /dev/null +++ b/security/landlock/ruleset.c @@ -0,0 +1,469 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Ruleset management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/bits.h> +#include <linux/bug.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/lockdep.h> +#include <linux/overflow.h> +#include <linux/rbtree.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h> + +#include "limits.h" +#include "object.h" +#include "ruleset.h" + +static struct landlock_ruleset *create_ruleset(const u32 num_layers) +{ + struct landlock_ruleset *new_ruleset; + + new_ruleset = kzalloc(struct_size(new_ruleset, fs_access_masks, + num_layers), GFP_KERNEL_ACCOUNT); + if (!new_ruleset) + return ERR_PTR(-ENOMEM); + refcount_set(&new_ruleset->usage, 1); + mutex_init(&new_ruleset->lock); + new_ruleset->root = RB_ROOT; + new_ruleset->num_layers = num_layers; + /* + * hierarchy = NULL + * num_rules = 0 + * fs_access_masks[] = 0 + */ + return new_ruleset; +} + +struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask) +{ + struct landlock_ruleset *new_ruleset; + + /* Informs about useless ruleset. */ + if (!fs_access_mask) + return ERR_PTR(-ENOMSG); + new_ruleset = create_ruleset(1); + if (!IS_ERR(new_ruleset)) + new_ruleset->fs_access_masks[0] = fs_access_mask; + return new_ruleset; +} + +static void build_check_rule(void) +{ + const struct landlock_rule rule = { + .num_layers = ~0, + }; + + BUILD_BUG_ON(rule.num_layers < LANDLOCK_MAX_NUM_LAYERS); +} + +static struct landlock_rule *create_rule( + struct landlock_object *const object, + const struct landlock_layer (*const layers)[], + const u32 num_layers, + const struct landlock_layer *const new_layer) +{ + struct landlock_rule *new_rule; + u32 new_num_layers; + + build_check_rule(); + if (new_layer) { + /* Should already be checked by landlock_merge_ruleset(). */ + if (WARN_ON_ONCE(num_layers >= LANDLOCK_MAX_NUM_LAYERS)) + return ERR_PTR(-E2BIG); + new_num_layers = num_layers + 1; + } else { + new_num_layers = num_layers; + } + new_rule = kzalloc(struct_size(new_rule, layers, new_num_layers), + GFP_KERNEL_ACCOUNT); + if (!new_rule) + return ERR_PTR(-ENOMEM); + RB_CLEAR_NODE(&new_rule->node); + landlock_get_object(object); + new_rule->object = object; + new_rule->num_layers = new_num_layers; + /* Copies the original layer stack. */ + memcpy(new_rule->layers, layers, + flex_array_size(new_rule, layers, num_layers)); + if (new_layer) + /* Adds a copy of @new_layer on the layer stack. */ + new_rule->layers[new_rule->num_layers - 1] = *new_layer; + return new_rule; +} + +static void put_rule(struct landlock_rule *const rule) +{ + might_sleep(); + if (!rule) + return; + landlock_put_object(rule->object); + kfree(rule); +} + +static void build_check_ruleset(void) +{ + const struct landlock_ruleset ruleset = { + .num_rules = ~0, + .num_layers = ~0, + }; + + BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES); + BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS); +} + +/** + * insert_rule - Create and insert a rule in a ruleset + * + * @ruleset: The ruleset to be updated. + * @object: The object to build the new rule with. The underlying kernel + * object must be held by the caller. + * @layers: One or multiple layers to be copied into the new rule. + * @num_layers: The number of @layers entries. + * + * When user space requests to add a new rule to a ruleset, @layers only + * contains one entry and this entry is not assigned to any level. In this + * case, the new rule will extend @ruleset, similarly to a boolean OR between + * access rights. + * + * When merging a ruleset in a domain, or copying a domain, @layers will be + * added to @ruleset as new constraints, similarly to a boolean AND between + * access rights. + */ +static int insert_rule(struct landlock_ruleset *const ruleset, + struct landlock_object *const object, + const struct landlock_layer (*const layers)[], + size_t num_layers) +{ + struct rb_node **walker_node; + struct rb_node *parent_node = NULL; + struct landlock_rule *new_rule; + + might_sleep(); + lockdep_assert_held(&ruleset->lock); + if (WARN_ON_ONCE(!object || !layers)) + return -ENOENT; + walker_node = &(ruleset->root.rb_node); + while (*walker_node) { + struct landlock_rule *const this = rb_entry(*walker_node, + struct landlock_rule, node); + + if (this->object != object) { + parent_node = *walker_node; + if (this->object < object) + walker_node = &((*walker_node)->rb_right); + else + walker_node = &((*walker_node)->rb_left); + continue; + } + + /* Only a single-level layer should match an existing rule. */ + if (WARN_ON_ONCE(num_layers != 1)) + return -EINVAL; + + /* If there is a matching rule, updates it. */ + if ((*layers)[0].level == 0) { + /* + * Extends access rights when the request comes from + * landlock_add_rule(2), i.e. @ruleset is not a domain. + */ + if (WARN_ON_ONCE(this->num_layers != 1)) + return -EINVAL; + if (WARN_ON_ONCE(this->layers[0].level != 0)) + return -EINVAL; + this->layers[0].access |= (*layers)[0].access; + return 0; + } + + if (WARN_ON_ONCE(this->layers[0].level == 0)) + return -EINVAL; + + /* + * Intersects access rights when it is a merge between a + * ruleset and a domain. + */ + new_rule = create_rule(object, &this->layers, this->num_layers, + &(*layers)[0]); + if (IS_ERR(new_rule)) + return PTR_ERR(new_rule); + rb_replace_node(&this->node, &new_rule->node, &ruleset->root); + put_rule(this); + return 0; + } + + /* There is no match for @object. */ + build_check_ruleset(); + if (ruleset->num_rules >= LANDLOCK_MAX_NUM_RULES) + return -E2BIG; + new_rule = create_rule(object, layers, num_layers, NULL); + if (IS_ERR(new_rule)) + return PTR_ERR(new_rule); + rb_link_node(&new_rule->node, parent_node, walker_node); + rb_insert_color(&new_rule->node, &ruleset->root); + ruleset->num_rules++; + return 0; +} + +static void build_check_layer(void) +{ + const struct landlock_layer layer = { + .level = ~0, + }; + + BUILD_BUG_ON(layer.level < LANDLOCK_MAX_NUM_LAYERS); +} + +/* @ruleset must be locked by the caller. */ +int landlock_insert_rule(struct landlock_ruleset *const ruleset, + struct landlock_object *const object, const u32 access) +{ + struct landlock_layer layers[] = {{ + .access = access, + /* When @level is zero, insert_rule() extends @ruleset. */ + .level = 0, + }}; + + build_check_layer(); + return insert_rule(ruleset, object, &layers, ARRAY_SIZE(layers)); +} + +static inline void get_hierarchy(struct landlock_hierarchy *const hierarchy) +{ + if (hierarchy) + refcount_inc(&hierarchy->usage); +} + +static void put_hierarchy(struct landlock_hierarchy *hierarchy) +{ + while (hierarchy && refcount_dec_and_test(&hierarchy->usage)) { + const struct landlock_hierarchy *const freeme = hierarchy; + + hierarchy = hierarchy->parent; + kfree(freeme); + } +} + +static int merge_ruleset(struct landlock_ruleset *const dst, + struct landlock_ruleset *const src) +{ + struct landlock_rule *walker_rule, *next_rule; + int err = 0; + + might_sleep(); + /* Should already be checked by landlock_merge_ruleset() */ + if (WARN_ON_ONCE(!src)) + return 0; + /* Only merge into a domain. */ + if (WARN_ON_ONCE(!dst || !dst->hierarchy)) + return -EINVAL; + + /* Locks @dst first because we are its only owner. */ + mutex_lock(&dst->lock); + mutex_lock_nested(&src->lock, SINGLE_DEPTH_NESTING); + + /* Stacks the new layer. */ + if (WARN_ON_ONCE(src->num_layers != 1 || dst->num_layers < 1)) { + err = -EINVAL; + goto out_unlock; + } + dst->fs_access_masks[dst->num_layers - 1] = src->fs_access_masks[0]; + + /* Merges the @src tree. */ + rbtree_postorder_for_each_entry_safe(walker_rule, next_rule, + &src->root, node) { + struct landlock_layer layers[] = {{ + .level = dst->num_layers, + }}; + + if (WARN_ON_ONCE(walker_rule->num_layers != 1)) { + err = -EINVAL; + goto out_unlock; + } + if (WARN_ON_ONCE(walker_rule->layers[0].level != 0)) { + err = -EINVAL; + goto out_unlock; + } + layers[0].access = walker_rule->layers[0].access; + err = insert_rule(dst, walker_rule->object, &layers, + ARRAY_SIZE(layers)); + if (err) + goto out_unlock; + } + +out_unlock: + mutex_unlock(&src->lock); + mutex_unlock(&dst->lock); + return err; +} + +static int inherit_ruleset(struct landlock_ruleset *const parent, + struct landlock_ruleset *const child) +{ + struct landlock_rule *walker_rule, *next_rule; + int err = 0; + + might_sleep(); + if (!parent) + return 0; + + /* Locks @child first because we are its only owner. */ + mutex_lock(&child->lock); + mutex_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING); + + /* Copies the @parent tree. */ + rbtree_postorder_for_each_entry_safe(walker_rule, next_rule, + &parent->root, node) { + err = insert_rule(child, walker_rule->object, + &walker_rule->layers, walker_rule->num_layers); + if (err) + goto out_unlock; + } + + if (WARN_ON_ONCE(child->num_layers <= parent->num_layers)) { + err = -EINVAL; + goto out_unlock; + } + /* Copies the parent layer stack and leaves a space for the new layer. */ + memcpy(child->fs_access_masks, parent->fs_access_masks, + flex_array_size(parent, fs_access_masks, parent->num_layers)); + + if (WARN_ON_ONCE(!parent->hierarchy)) { + err = -EINVAL; + goto out_unlock; + } + get_hierarchy(parent->hierarchy); + child->hierarchy->parent = parent->hierarchy; + +out_unlock: + mutex_unlock(&parent->lock); + mutex_unlock(&child->lock); + return err; +} + +static void free_ruleset(struct landlock_ruleset *const ruleset) +{ + struct landlock_rule *freeme, *next; + + might_sleep(); + rbtree_postorder_for_each_entry_safe(freeme, next, &ruleset->root, + node) + put_rule(freeme); + put_hierarchy(ruleset->hierarchy); + kfree(ruleset); +} + +void landlock_put_ruleset(struct landlock_ruleset *const ruleset) +{ + might_sleep(); + if (ruleset && refcount_dec_and_test(&ruleset->usage)) + free_ruleset(ruleset); +} + +static void free_ruleset_work(struct work_struct *const work) +{ + struct landlock_ruleset *ruleset; + + ruleset = container_of(work, struct landlock_ruleset, work_free); + free_ruleset(ruleset); +} + +void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset) +{ + if (ruleset && refcount_dec_and_test(&ruleset->usage)) { + INIT_WORK(&ruleset->work_free, free_ruleset_work); + schedule_work(&ruleset->work_free); + } +} + +/** + * landlock_merge_ruleset - Merge a ruleset with a domain + * + * @parent: Parent domain. + * @ruleset: New ruleset to be merged. + * + * Returns the intersection of @parent and @ruleset, or returns @parent if + * @ruleset is empty, or returns a duplicate of @ruleset if @parent is empty. + */ +struct landlock_ruleset *landlock_merge_ruleset( + struct landlock_ruleset *const parent, + struct landlock_ruleset *const ruleset) +{ + struct landlock_ruleset *new_dom; + u32 num_layers; + int err; + + might_sleep(); + if (WARN_ON_ONCE(!ruleset || parent == ruleset)) + return ERR_PTR(-EINVAL); + + if (parent) { + if (parent->num_layers >= LANDLOCK_MAX_NUM_LAYERS) + return ERR_PTR(-E2BIG); + num_layers = parent->num_layers + 1; + } else { + num_layers = 1; + } + + /* Creates a new domain... */ + new_dom = create_ruleset(num_layers); + if (IS_ERR(new_dom)) + return new_dom; + new_dom->hierarchy = kzalloc(sizeof(*new_dom->hierarchy), + GFP_KERNEL_ACCOUNT); + if (!new_dom->hierarchy) { + err = -ENOMEM; + goto out_put_dom; + } + refcount_set(&new_dom->hierarchy->usage, 1); + + /* ...as a child of @parent... */ + err = inherit_ruleset(parent, new_dom); + if (err) + goto out_put_dom; + + /* ...and including @ruleset. */ + err = merge_ruleset(new_dom, ruleset); + if (err) + goto out_put_dom; + + return new_dom; + +out_put_dom: + landlock_put_ruleset(new_dom); + return ERR_PTR(err); +} + +/* + * The returned access has the same lifetime as @ruleset. + */ +const struct landlock_rule *landlock_find_rule( + const struct landlock_ruleset *const ruleset, + const struct landlock_object *const object) +{ + const struct rb_node *node; + + if (!object) + return NULL; + node = ruleset->root.rb_node; + while (node) { + struct landlock_rule *this = rb_entry(node, + struct landlock_rule, node); + + if (this->object == object) + return this; + if (this->object < object) + node = node->rb_right; + else + node = node->rb_left; + } + return NULL; +} diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h new file mode 100644 index 000000000000..2d3ed7ec5a0a --- /dev/null +++ b/security/landlock/ruleset.h @@ -0,0 +1,165 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Ruleset management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_RULESET_H +#define _SECURITY_LANDLOCK_RULESET_H + +#include <linux/mutex.h> +#include <linux/rbtree.h> +#include <linux/refcount.h> +#include <linux/workqueue.h> + +#include "object.h" + +/** + * struct landlock_layer - Access rights for a given layer + */ +struct landlock_layer { + /** + * @level: Position of this layer in the layer stack. + */ + u16 level; + /** + * @access: Bitfield of allowed actions on the kernel object. They are + * relative to the object type (e.g. %LANDLOCK_ACTION_FS_READ). + */ + u16 access; +}; + +/** + * struct landlock_rule - Access rights tied to an object + */ +struct landlock_rule { + /** + * @node: Node in the ruleset's red-black tree. + */ + struct rb_node node; + /** + * @object: Pointer to identify a kernel object (e.g. an inode). This + * is used as a key for this ruleset element. This pointer is set once + * and never modified. It always points to an allocated object because + * each rule increments the refcount of its object. + */ + struct landlock_object *object; + /** + * @num_layers: Number of entries in @layers. + */ + u32 num_layers; + /** + * @layers: Stack of layers, from the latest to the newest, implemented + * as a flexible array member (FAM). + */ + struct landlock_layer layers[]; +}; + +/** + * struct landlock_hierarchy - Node in a ruleset hierarchy + */ +struct landlock_hierarchy { + /** + * @parent: Pointer to the parent node, or NULL if it is a root + * Landlock domain. + */ + struct landlock_hierarchy *parent; + /** + * @usage: Number of potential children domains plus their parent + * domain. + */ + refcount_t usage; +}; + +/** + * struct landlock_ruleset - Landlock ruleset + * + * This data structure must contain unique entries, be updatable, and quick to + * match an object. + */ +struct landlock_ruleset { + /** + * @root: Root of a red-black tree containing &struct landlock_rule + * nodes. Once a ruleset is tied to a process (i.e. as a domain), this + * tree is immutable until @usage reaches zero. + */ + struct rb_root root; + /** + * @hierarchy: Enables hierarchy identification even when a parent + * domain vanishes. This is needed for the ptrace protection. + */ + struct landlock_hierarchy *hierarchy; + union { + /** + * @work_free: Enables to free a ruleset within a lockless + * section. This is only used by + * landlock_put_ruleset_deferred() when @usage reaches zero. + * The fields @lock, @usage, @num_rules, @num_layers and + * @fs_access_masks are then unused. + */ + struct work_struct work_free; + struct { + /** + * @lock: Protects against concurrent modifications of + * @root, if @usage is greater than zero. + */ + struct mutex lock; + /** + * @usage: Number of processes (i.e. domains) or file + * descriptors referencing this ruleset. + */ + refcount_t usage; + /** + * @num_rules: Number of non-overlapping (i.e. not for + * the same object) rules in this ruleset. + */ + u32 num_rules; + /** + * @num_layers: Number of layers that are used in this + * ruleset. This enables to check that all the layers + * allow an access request. A value of 0 identifies a + * non-merged ruleset (i.e. not a domain). + */ + u32 num_layers; + /** + * @fs_access_masks: Contains the subset of filesystem + * actions that are restricted by a ruleset. A domain + * saves all layers of merged rulesets in a stack + * (FAM), starting from the first layer to the last + * one. These layers are used when merging rulesets, + * for user space backward compatibility (i.e. + * future-proof), and to properly handle merged + * rulesets without overlapping access rights. These + * layers are set once and never changed for the + * lifetime of the ruleset. + */ + u16 fs_access_masks[]; + }; + }; +}; + +struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask); + +void landlock_put_ruleset(struct landlock_ruleset *const ruleset); +void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset); + +int landlock_insert_rule(struct landlock_ruleset *const ruleset, + struct landlock_object *const object, const u32 access); + +struct landlock_ruleset *landlock_merge_ruleset( + struct landlock_ruleset *const parent, + struct landlock_ruleset *const ruleset); + +const struct landlock_rule *landlock_find_rule( + const struct landlock_ruleset *const ruleset, + const struct landlock_object *const object); + +static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset) +{ + if (ruleset) + refcount_inc(&ruleset->usage); +} + +#endif /* _SECURITY_LANDLOCK_RULESET_H */
On Tue, Mar 16, 2021 at 09:42:42PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock ruleset is mainly a red-black tree with Landlock rules as nodes. This enables quick update and lookup to match a requested access, e.g. to a file. A ruleset is usable through a dedicated file descriptor (cf. following commit implementing syscalls) which enables a process to create and populate a ruleset with new rules.
A domain is a ruleset tied to a set of processes. This group of rules defines the security policy enforced on these processes and their future children. A domain can transition to a new domain which is the intersection of all its constraints and those of a ruleset provided by the current process. This modification only impact the current process. This means that a process can only gain more constraints (i.e. lose accesses) over time.
Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net
(Aside: you appear to be self-adding your Link: tags -- AIUI, this is normally done by whoever pulls your series. I've only seen Link: tags added when needing to refer to something else not included in the series.)
[...] +static void put_rule(struct landlock_rule *const rule) +{
- might_sleep();
- if (!rule)
return;
- landlock_put_object(rule->object);
- kfree(rule);
+}
I'd expect this to be named "release" rather than "put" since it doesn't do any lifetime reference counting.
+static void build_check_ruleset(void) +{
- const struct landlock_ruleset ruleset = {
.num_rules = ~0,
.num_layers = ~0,
- };
- BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES);
- BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS);
+}
This is checking that the largest possible stored value is correctly within the LANDLOCK_MAX_* macro value?
[...]
The locking all looks right, and given your test coverage and syzkaller work, it's hard for me to think of ways to prove it out any better. :)
Reviewed-by: Kees Cook keescook@chromium.org
On 19/03/2021 19:40, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:42PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
A Landlock ruleset is mainly a red-black tree with Landlock rules as nodes. This enables quick update and lookup to match a requested access, e.g. to a file. A ruleset is usable through a dedicated file descriptor (cf. following commit implementing syscalls) which enables a process to create and populate a ruleset with new rules.
A domain is a ruleset tied to a set of processes. This group of rules defines the security policy enforced on these processes and their future children. A domain can transition to a new domain which is the intersection of all its constraints and those of a ruleset provided by the current process. This modification only impact the current process. This means that a process can only gain more constraints (i.e. lose accesses) over time.
Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net
(Aside: you appear to be self-adding your Link: tags -- AIUI, this is normally done by whoever pulls your series. I've only seen Link: tags added when needing to refer to something else not included in the series.)
It is an insurance to not lose history. :)
[...] +static void put_rule(struct landlock_rule *const rule) +{
- might_sleep();
- if (!rule)
return;
- landlock_put_object(rule->object);
- kfree(rule);
+}
I'd expect this to be named "release" rather than "put" since it doesn't do any lifetime reference counting.
It does decrement rule->object->usage .
+static void build_check_ruleset(void) +{
- const struct landlock_ruleset ruleset = {
.num_rules = ~0,
.num_layers = ~0,
- };
- BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES);
- BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS);
+}
This is checking that the largest possible stored value is correctly within the LANDLOCK_MAX_* macro value?
Yes, there is builtin checks for all Landlock limits.
[...]
The locking all looks right, and given your test coverage and syzkaller work, it's hard for me to think of ways to prove it out any better. :)
Thanks!
Reviewed-by: Kees Cook keescook@chromium.org
On Fri, Mar 19, 2021 at 08:03:22PM +0100, Mickaël Salaün wrote:
On 19/03/2021 19:40, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:42PM +0100, Mickaël Salaün wrote:
[...] +static void put_rule(struct landlock_rule *const rule) +{
- might_sleep();
- if (!rule)
return;
- landlock_put_object(rule->object);
- kfree(rule);
+}
I'd expect this to be named "release" rather than "put" since it doesn't do any lifetime reference counting.
It does decrement rule->object->usage .
Well, landlock_put_object() decrements rule->object's lifetime. It seems "rule" doesn't have a lifetime. (There is no refcounter on rule.) I just find it strange to see "put" without a matching "get". Not a big deal.
On Fri, 19 Mar 2021, Mickaël Salaün wrote:
Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net
(Aside: you appear to be self-adding your Link: tags -- AIUI, this is normally done by whoever pulls your series. I've only seen Link: tags added when needing to refer to something else not included in the series.)
It is an insurance to not lose history. :)
How will history be lost? The code is in the repo and discussions can easily be found by searching for subjects or message IDs.
Is anyone else doing this self linking?
On 24/03/2021 21:31, James Morris wrote:
On Fri, 19 Mar 2021, Mickaël Salaün wrote:
Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net
(Aside: you appear to be self-adding your Link: tags -- AIUI, this is normally done by whoever pulls your series. I've only seen Link: tags added when needing to refer to something else not included in the series.)
It is an insurance to not lose history. :)
How will history be lost? The code is in the repo and discussions can easily be found by searching for subjects or message IDs.
The (full and ordered) history may be hard to find without any Message-ID in commit messages. The Lore links keep that information (in the commit message) and redirect to the related archived email thread, which is very handy. For instance, Linus can rely on those links to judge the quality of a patch: https://lore.kernel.org/lkml/CAHk-=wh7xY3UF7zEc0BNVNjOox59jYBW-Gfi7=emm+BXPW...
Is anyone else doing this self linking?
I don't know, but it doesn't hurt. This way, if you're using git am without b4 am -l (or forgot to add links manually), the history is still pointed out by these self-reference links. I find it convenient and it is a safeguard to not forget them, no matter who takes the patches.
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
A Landlock ruleset is mainly a red-black tree with Landlock rules as nodes. This enables quick update and lookup to match a requested access, e.g. to a file. A ruleset is usable through a dedicated file descriptor (cf. following commit implementing syscalls) which enables a process to create and populate a ruleset with new rules.
A domain is a ruleset tied to a set of processes. This group of rules defines the security policy enforced on these processes and their future children. A domain can transition to a new domain which is the intersection of all its constraints and those of a ruleset provided by the current process. This modification only impact the current process. This means that a process can only gain more constraints (i.e. lose accesses) over time.
Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-3-mic@digikod.net
Reviewed-by: Jann Horn jannh@google.com
From: Mickaël Salaün mic@linux.microsoft.com
Process's credentials point to a Landlock domain, which is underneath implemented with a ruleset. In the following commits, this domain is used to check and enforce the ptrace and filesystem security policies. A domain is inherited from a parent to its child the same way a thread inherits a seccomp policy.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-4-mic@digikod.net ---
Changes since v28: * Add Acked-by Serge Hallyn.
Changes since v25: * Rename function to landlock_add_cred_hooks().
Changes since v23: * Add an early check for the current domain in hook_cred_free() to avoid superfluous call. * Cosmetic cleanup to make the code more readable.
Changes since v22: * Add Reviewed-by Jann Horn.
Changes since v21: * Fix copyright dates.
Changes since v17: * Constify returned domain pointers from landlock_get_current_domain() and landlock_get_task_domain() helpers.
Changes since v15: * Optimize landlocked() for current thread. * Display the greeting message when everything is initialized.
Changes since v14: * Uses pr_fmt from common.h . * Constify variables. * Remove useless NULL initialization.
Changes since v13: * totally get ride of the seccomp dependency * only keep credential management and LSM setup.
Previous changes: https://lore.kernel.org/lkml/20191104172146.30797-4-mic@digikod.net/ --- security/Kconfig | 10 +++---- security/landlock/Makefile | 3 +- security/landlock/common.h | 20 +++++++++++++ security/landlock/cred.c | 46 ++++++++++++++++++++++++++++++ security/landlock/cred.h | 58 ++++++++++++++++++++++++++++++++++++++ security/landlock/setup.c | 31 ++++++++++++++++++++ security/landlock/setup.h | 16 +++++++++++ 7 files changed, 178 insertions(+), 6 deletions(-) create mode 100644 security/landlock/common.h create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h
diff --git a/security/Kconfig b/security/Kconfig index 15a4342b5d01..0ced7fd33e4d 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -278,11 +278,11 @@ endchoice
config LSM string "Ordered list of enabled LSMs" - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO - default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" + default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK + default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR + default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO + default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC + default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" help A comma-separated list of LSMs, in initialization order. Any LSMs left off this list will be ignored. This can be diff --git a/security/landlock/Makefile b/security/landlock/Makefile index d846eba445bb..041ea242e627 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := object.o ruleset.o +landlock-y := setup.o object.o ruleset.o \ + cred.o diff --git a/security/landlock/common.h b/security/landlock/common.h new file mode 100644 index 000000000000..5dc0fe15707d --- /dev/null +++ b/security/landlock/common.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Common constants and helpers + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_COMMON_H +#define _SECURITY_LANDLOCK_COMMON_H + +#define LANDLOCK_NAME "landlock" + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) LANDLOCK_NAME ": " fmt + +#endif /* _SECURITY_LANDLOCK_COMMON_H */ diff --git a/security/landlock/cred.c b/security/landlock/cred.c new file mode 100644 index 000000000000..6725af24c684 --- /dev/null +++ b/security/landlock/cred.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Credential hooks + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/cred.h> +#include <linux/lsm_hooks.h> + +#include "common.h" +#include "cred.h" +#include "ruleset.h" +#include "setup.h" + +static int hook_cred_prepare(struct cred *const new, + const struct cred *const old, const gfp_t gfp) +{ + struct landlock_ruleset *const old_dom = landlock_cred(old)->domain; + + if (old_dom) { + landlock_get_ruleset(old_dom); + landlock_cred(new)->domain = old_dom; + } + return 0; +} + +static void hook_cred_free(struct cred *const cred) +{ + struct landlock_ruleset *const dom = landlock_cred(cred)->domain; + + if (dom) + landlock_put_ruleset_deferred(dom); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(cred_prepare, hook_cred_prepare), + LSM_HOOK_INIT(cred_free, hook_cred_free), +}; + +__init void landlock_add_cred_hooks(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/cred.h b/security/landlock/cred.h new file mode 100644 index 000000000000..5f99d3decade --- /dev/null +++ b/security/landlock/cred.h @@ -0,0 +1,58 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Credential hooks + * + * Copyright © 2019-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2019-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_CRED_H +#define _SECURITY_LANDLOCK_CRED_H + +#include <linux/cred.h> +#include <linux/init.h> +#include <linux/rcupdate.h> + +#include "ruleset.h" +#include "setup.h" + +struct landlock_cred_security { + struct landlock_ruleset *domain; +}; + +static inline struct landlock_cred_security *landlock_cred( + const struct cred *cred) +{ + return cred->security + landlock_blob_sizes.lbs_cred; +} + +static inline const struct landlock_ruleset *landlock_get_current_domain(void) +{ + return landlock_cred(current_cred())->domain; +} + +/* + * The call needs to come from an RCU read-side critical section. + */ +static inline const struct landlock_ruleset *landlock_get_task_domain( + const struct task_struct *const task) +{ + return landlock_cred(__task_cred(task))->domain; +} + +static inline bool landlocked(const struct task_struct *const task) +{ + bool has_dom; + + if (task == current) + return !!landlock_get_current_domain(); + + rcu_read_lock(); + has_dom = !!landlock_get_task_domain(task); + rcu_read_unlock(); + return has_dom; +} + +__init void landlock_add_cred_hooks(void); + +#endif /* _SECURITY_LANDLOCK_CRED_H */ diff --git a/security/landlock/setup.c b/security/landlock/setup.c new file mode 100644 index 000000000000..8661112fb238 --- /dev/null +++ b/security/landlock/setup.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Security framework setup + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/init.h> +#include <linux/lsm_hooks.h> + +#include "common.h" +#include "cred.h" +#include "setup.h" + +struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { + .lbs_cred = sizeof(struct landlock_cred_security), +}; + +static int __init landlock_init(void) +{ + landlock_add_cred_hooks(); + pr_info("Up and running.\n"); + return 0; +} + +DEFINE_LSM(LANDLOCK_NAME) = { + .name = LANDLOCK_NAME, + .init = landlock_init, + .blobs = &landlock_blob_sizes, +}; diff --git a/security/landlock/setup.h b/security/landlock/setup.h new file mode 100644 index 000000000000..9fdbf33fcc33 --- /dev/null +++ b/security/landlock/setup.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Security framework setup + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_SETUP_H +#define _SECURITY_LANDLOCK_SETUP_H + +#include <linux/lsm_hooks.h> + +extern struct lsm_blob_sizes landlock_blob_sizes; + +#endif /* _SECURITY_LANDLOCK_SETUP_H */
On Tue, Mar 16, 2021 at 09:42:43PM +0100, Mickaël Salaün wrote:
config LSM string "Ordered list of enabled LSMs"
- default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" help A comma-separated list of LSMs, in initialization order. Any LSMs left off this list will be ignored. This can be
There was some discussion long ago about landlock needing to be last in the list because it was unprivileged. Is that no longer true? (And what is the justification for its position in the list?)
diff --git a/security/landlock/common.h b/security/landlock/common.h new file mode 100644 index 000000000000..5dc0fe15707d --- /dev/null +++ b/security/landlock/common.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Landlock LSM - Common constants and helpers
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#ifndef _SECURITY_LANDLOCK_COMMON_H +#define _SECURITY_LANDLOCK_COMMON_H
+#define LANDLOCK_NAME "landlock"
+#ifdef pr_fmt +#undef pr_fmt +#endif
When I see "#undef pr_fmt" I think there is a header ordering problem.
[...]
Everything else looks like regular boilerplate for an LSM. :)
Reviewed-by: Kees Cook keescook@chromium.org
On 19/03/2021 19:45, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:43PM +0100, Mickaël Salaün wrote:
config LSM string "Ordered list of enabled LSMs"
- default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" help A comma-separated list of LSMs, in initialization order. Any LSMs left off this list will be ignored. This can be
There was some discussion long ago about landlock needing to be last in the list because it was unprivileged. Is that no longer true? (And what is the justification for its position in the list?)
Indeed, I wanted to put Landlock last because it was an unprivileged programmable access-control, which could lead to side-channel attacks against other access-controls (e.g. to infer enforced policies). This is not valid anymore because Landlock is not using eBPF, only the BPF LSM does that (which is not the only reason why it is the last stacked).
diff --git a/security/landlock/common.h b/security/landlock/common.h new file mode 100644 index 000000000000..5dc0fe15707d --- /dev/null +++ b/security/landlock/common.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Landlock LSM - Common constants and helpers
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- */
+#ifndef _SECURITY_LANDLOCK_COMMON_H +#define _SECURITY_LANDLOCK_COMMON_H
+#define LANDLOCK_NAME "landlock"
+#ifdef pr_fmt +#undef pr_fmt +#endif
When I see "#undef pr_fmt" I think there is a header ordering problem.
Not is this case, it's a "namespace" definition. :)
[...]
Everything else looks like regular boilerplate for an LSM. :)
Reviewed-by: Kees Cook keescook@chromium.org
From: Mickaël Salaün mic@linux.microsoft.com
Using ptrace(2) and related debug features on a target process can lead to a privilege escalation. Indeed, ptrace(2) can be used by an attacker to impersonate another task and to remain undetected while performing malicious activities. Thanks to ptrace_may_access(), various part of the kernel can check if a tracer is more privileged than a tracee.
A landlocked process has fewer privileges than a non-landlocked process and must then be subject to additional restrictions when manipulating processes. To be allowed to use ptrace(2) and related syscalls on a target process, a landlocked process must have a subset of the target process's rules (i.e. the tracee must be in a sub-domain of the tracer).
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-5-mic@digikod.net ---
Changes since v28: * Add Acked-by Serge Hallyn.
Changes since v25: * Rename function to landlock_add_ptrace_hooks().
Changes since v22: * Add Reviewed-by: Jann Horn jannh@google.com
Changes since v21: * Fix copyright dates.
Changes since v14: * Constify variables.
Changes since v13: * Make the ptrace restriction mandatory, like in the v10. * Remove the eBPF dependency.
Previous changes: https://lore.kernel.org/lkml/20191104172146.30797-5-mic@digikod.net/ --- security/landlock/Makefile | 2 +- security/landlock/ptrace.c | 120 +++++++++++++++++++++++++++++++++++++ security/landlock/ptrace.h | 14 +++++ security/landlock/setup.c | 2 + 4 files changed, 137 insertions(+), 1 deletion(-) create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h
diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 041ea242e627..f1d1eb72fa76 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
landlock-y := setup.o object.o ruleset.o \ - cred.o + cred.o ptrace.o diff --git a/security/landlock/ptrace.c b/security/landlock/ptrace.c new file mode 100644 index 000000000000..f55b82446de2 --- /dev/null +++ b/security/landlock/ptrace.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Ptrace hooks + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2019-2020 ANSSI + */ + +#include <asm/current.h> +#include <linux/cred.h> +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/lsm_hooks.h> +#include <linux/rcupdate.h> +#include <linux/sched.h> + +#include "common.h" +#include "cred.h" +#include "ptrace.h" +#include "ruleset.h" +#include "setup.h" + +/** + * domain_scope_le - Checks domain ordering for scoped ptrace + * + * @parent: Parent domain. + * @child: Potential child of @parent. + * + * Checks if the @parent domain is less or equal to (i.e. an ancestor, which + * means a subset of) the @child domain. + */ +static bool domain_scope_le(const struct landlock_ruleset *const parent, + const struct landlock_ruleset *const child) +{ + const struct landlock_hierarchy *walker; + + if (!parent) + return true; + if (!child) + return false; + for (walker = child->hierarchy; walker; walker = walker->parent) { + if (walker == parent->hierarchy) + /* @parent is in the scoped hierarchy of @child. */ + return true; + } + /* There is no relationship between @parent and @child. */ + return false; +} + +static bool task_is_scoped(const struct task_struct *const parent, + const struct task_struct *const child) +{ + bool is_scoped; + const struct landlock_ruleset *dom_parent, *dom_child; + + rcu_read_lock(); + dom_parent = landlock_get_task_domain(parent); + dom_child = landlock_get_task_domain(child); + is_scoped = domain_scope_le(dom_parent, dom_child); + rcu_read_unlock(); + return is_scoped; +} + +static int task_ptrace(const struct task_struct *const parent, + const struct task_struct *const child) +{ + /* Quick return for non-landlocked tasks. */ + if (!landlocked(parent)) + return 0; + if (task_is_scoped(parent, child)) + return 0; + return -EPERM; +} + +/** + * hook_ptrace_access_check - Determines whether the current process may access + * another + * + * @child: Process to be accessed. + * @mode: Mode of attachment. + * + * If the current task has Landlock rules, then the child must have at least + * the same rules. Else denied. + * + * Determines whether a process may access another, returning 0 if permission + * granted, -errno if denied. + */ +static int hook_ptrace_access_check(struct task_struct *const child, + const unsigned int mode) +{ + return task_ptrace(current, child); +} + +/** + * hook_ptrace_traceme - Determines whether another process may trace the + * current one + * + * @parent: Task proposed to be the tracer. + * + * If the parent has Landlock rules, then the current task must have the same + * or more rules. Else denied. + * + * Determines whether the nominated task is permitted to trace the current + * process, returning 0 if permission is granted, -errno if denied. + */ +static int hook_ptrace_traceme(struct task_struct *const parent) +{ + return task_ptrace(parent, current); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(ptrace_access_check, hook_ptrace_access_check), + LSM_HOOK_INIT(ptrace_traceme, hook_ptrace_traceme), +}; + +__init void landlock_add_ptrace_hooks(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/ptrace.h b/security/landlock/ptrace.h new file mode 100644 index 000000000000..265b220ae3bf --- /dev/null +++ b/security/landlock/ptrace.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Ptrace hooks + * + * Copyright © 2017-2019 Mickaël Salaün mic@digikod.net + * Copyright © 2019 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_PTRACE_H +#define _SECURITY_LANDLOCK_PTRACE_H + +__init void landlock_add_ptrace_hooks(void); + +#endif /* _SECURITY_LANDLOCK_PTRACE_H */ diff --git a/security/landlock/setup.c b/security/landlock/setup.c index 8661112fb238..a5d6ef334991 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,6 +11,7 @@
#include "common.h" #include "cred.h" +#include "ptrace.h" #include "setup.h"
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { @@ -20,6 +21,7 @@ struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { static int __init landlock_init(void) { landlock_add_cred_hooks(); + landlock_add_ptrace_hooks(); pr_info("Up and running.\n"); return 0; }
On Tue, Mar 16, 2021 at 09:42:44PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
Using ptrace(2) and related debug features on a target process can lead to a privilege escalation. Indeed, ptrace(2) can be used by an attacker to impersonate another task and to remain undetected while performing malicious activities. Thanks to ptrace_may_access(), various part of the kernel can check if a tracer is more privileged than a tracee.
A landlocked process has fewer privileges than a non-landlocked process and must then be subject to additional restrictions when manipulating processes. To be allowed to use ptrace(2) and related syscalls on a target process, a landlocked process must have a subset of the target process's rules (i.e. the tracee must be in a sub-domain of the tracer).
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com
Reviewed-by: Kees Cook keescook@chromium.org
From: Casey Schaufler casey@schaufler-ca.com
Move management of the superblock->sb_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules, the modules tell the infrastructure how much space is required, and the space is allocated there.
Cc: Kees Cook keescook@chromium.org Cc: John Johansen john.johansen@canonical.com Signed-off-by: Casey Schaufler casey@schaufler-ca.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Stephen Smalley stephen.smalley.work@gmail.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-6-mic@digikod.net ---
Changes since v28: * Add Acked-by Serge Hallyn.
Changes since v26: * Rebase on commit b159e86b5a2a ("selinux: drop super_block backpointer from superblock_security_struct"). No change in the patch itself, only a trivial conflict because of an updated nearby line in selinux_set_mnt_opts() variable declarations.
Changes since v20: * Remove all Reviewed-by except Stephen Smalley: https://lore.kernel.org/lkml/CAEjxPJ7ARJO57MBW66=xsBzMMRb=9uLgqocK5eskHCaiVM... * Cosmetic fix in the commit message.
Changes since v17: * Rebase the original LSM stacking patch from v5.3 to v5.7: I fixed some diff conflicts caused by code moves and function renames in selinux/include/objsec.h and selinux/hooks.c . I checked that it builds but I didn't test the changes for SELinux nor SMACK. https://lore.kernel.org/r/20190829232935.7099-2-casey@schaufler-ca.com --- include/linux/lsm_hooks.h | 1 + security/security.c | 46 ++++++++++++++++++++---- security/selinux/hooks.c | 58 ++++++++++++------------------- security/selinux/include/objsec.h | 6 ++++ security/selinux/ss/services.c | 3 +- security/smack/smack.h | 6 ++++ security/smack/smack_lsm.c | 35 +++++-------------- 7 files changed, 85 insertions(+), 70 deletions(-)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index fb7f3193753d..75715998a95f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1573,6 +1573,7 @@ struct lsm_blob_sizes { int lbs_cred; int lbs_file; int lbs_inode; + int lbs_superblock; int lbs_ipc; int lbs_msg_msg; int lbs_task; diff --git a/security/security.c b/security/security.c index 5ac96b16f8fa..e9c29480eb18 100644 --- a/security/security.c +++ b/security/security.c @@ -203,6 +203,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed) lsm_set_blob_size(&needed->lbs_inode, &blob_sizes.lbs_inode); lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc); lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg); + lsm_set_blob_size(&needed->lbs_superblock, &blob_sizes.lbs_superblock); lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task); }
@@ -333,12 +334,13 @@ static void __init ordered_lsm_init(void) for (lsm = ordered_lsms; *lsm; lsm++) prepare_lsm(*lsm);
- init_debug("cred blob size = %d\n", blob_sizes.lbs_cred); - init_debug("file blob size = %d\n", blob_sizes.lbs_file); - init_debug("inode blob size = %d\n", blob_sizes.lbs_inode); - init_debug("ipc blob size = %d\n", blob_sizes.lbs_ipc); - init_debug("msg_msg blob size = %d\n", blob_sizes.lbs_msg_msg); - init_debug("task blob size = %d\n", blob_sizes.lbs_task); + init_debug("cred blob size = %d\n", blob_sizes.lbs_cred); + init_debug("file blob size = %d\n", blob_sizes.lbs_file); + init_debug("inode blob size = %d\n", blob_sizes.lbs_inode); + init_debug("ipc blob size = %d\n", blob_sizes.lbs_ipc); + init_debug("msg_msg blob size = %d\n", blob_sizes.lbs_msg_msg); + init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock); + init_debug("task blob size = %d\n", blob_sizes.lbs_task);
/* * Create any kmem_caches needed for blobs @@ -670,6 +672,27 @@ static void __init lsm_early_task(struct task_struct *task) panic("%s: Early task alloc failed.\n", __func__); }
+/** + * lsm_superblock_alloc - allocate a composite superblock blob + * @sb: the superblock that needs a blob + * + * Allocate the superblock blob for all the modules + * + * Returns 0, or -ENOMEM if memory can't be allocated. + */ +static int lsm_superblock_alloc(struct super_block *sb) +{ + if (blob_sizes.lbs_superblock == 0) { + sb->s_security = NULL; + return 0; + } + + sb->s_security = kzalloc(blob_sizes.lbs_superblock, GFP_KERNEL); + if (sb->s_security == NULL) + return -ENOMEM; + return 0; +} + /* * The default value of the LSM hook is defined in linux/lsm_hook_defs.h and * can be accessed with: @@ -867,12 +890,21 @@ int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *
int security_sb_alloc(struct super_block *sb) { - return call_int_hook(sb_alloc_security, 0, sb); + int rc = lsm_superblock_alloc(sb); + + if (unlikely(rc)) + return rc; + rc = call_int_hook(sb_alloc_security, 0, sb); + if (unlikely(rc)) + security_sb_free(sb); + return rc; }
void security_sb_free(struct super_block *sb) { call_void_hook(sb_free_security, sb); + kfree(sb->s_security); + sb->s_security = NULL; }
void security_free_mnt_opts(void **mnt_opts) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index ddd097790d47..2ed9c995263a 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -322,7 +322,7 @@ static void inode_free_security(struct inode *inode)
if (!isec) return; - sbsec = inode->i_sb->s_security; + sbsec = selinux_superblock(inode->i_sb); /* * As not all inode security structures are in a list, we check for * empty list outside of the lock to make sure that we won't waste @@ -340,13 +340,6 @@ static void inode_free_security(struct inode *inode) } }
-static void superblock_free_security(struct super_block *sb) -{ - struct superblock_security_struct *sbsec = sb->s_security; - sb->s_security = NULL; - kfree(sbsec); -} - struct selinux_mnt_opts { const char *fscontext, *context, *rootcontext, *defcontext; }; @@ -458,7 +451,7 @@ static int selinux_is_genfs_special_handling(struct super_block *sb)
static int selinux_is_sblabel_mnt(struct super_block *sb) { - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb);
/* * IMPORTANT: Double-check logic in this function when adding a new @@ -535,7 +528,7 @@ static int sb_check_xattr_support(struct super_block *sb)
static int sb_finish_set_opts(struct super_block *sb) { - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb); struct dentry *root = sb->s_root; struct inode *root_inode = d_backing_inode(root); int rc = 0; @@ -626,7 +619,7 @@ static int selinux_set_mnt_opts(struct super_block *sb, unsigned long *set_kern_flags) { const struct cred *cred = current_cred(); - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb); struct dentry *root = sb->s_root; struct selinux_mnt_opts *opts = mnt_opts; struct inode_security_struct *root_isec; @@ -863,8 +856,8 @@ static int selinux_set_mnt_opts(struct super_block *sb, static int selinux_cmp_sb_context(const struct super_block *oldsb, const struct super_block *newsb) { - struct superblock_security_struct *old = oldsb->s_security; - struct superblock_security_struct *new = newsb->s_security; + struct superblock_security_struct *old = selinux_superblock(oldsb); + struct superblock_security_struct *new = selinux_superblock(newsb); char oldflags = old->flags & SE_MNTMASK; char newflags = new->flags & SE_MNTMASK;
@@ -896,8 +889,9 @@ static int selinux_sb_clone_mnt_opts(const struct super_block *oldsb, unsigned long *set_kern_flags) { int rc = 0; - const struct superblock_security_struct *oldsbsec = oldsb->s_security; - struct superblock_security_struct *newsbsec = newsb->s_security; + const struct superblock_security_struct *oldsbsec = + selinux_superblock(oldsb); + struct superblock_security_struct *newsbsec = selinux_superblock(newsb);
int set_fscontext = (oldsbsec->flags & FSCONTEXT_MNT); int set_context = (oldsbsec->flags & CONTEXT_MNT); @@ -1076,7 +1070,7 @@ static int show_sid(struct seq_file *m, u32 sid)
static int selinux_sb_show_options(struct seq_file *m, struct super_block *sb) { - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb); int rc;
if (!(sbsec->flags & SE_SBINITIALIZED)) @@ -1427,7 +1421,7 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent if (isec->sclass == SECCLASS_FILE) isec->sclass = inode_mode_to_security_class(inode->i_mode);
- sbsec = inode->i_sb->s_security; + sbsec = selinux_superblock(inode->i_sb); if (!(sbsec->flags & SE_SBINITIALIZED)) { /* Defer initialization until selinux_complete_init, after the initial policy is loaded and the security @@ -1778,7 +1772,8 @@ selinux_determine_inode_label(const struct task_security_struct *tsec, const struct qstr *name, u16 tclass, u32 *_new_isid) { - const struct superblock_security_struct *sbsec = dir->i_sb->s_security; + const struct superblock_security_struct *sbsec = + selinux_superblock(dir->i_sb);
if ((sbsec->flags & SE_SBINITIALIZED) && (sbsec->behavior == SECURITY_FS_USE_MNTPOINT)) { @@ -1809,7 +1804,7 @@ static int may_create(struct inode *dir, int rc;
dsec = inode_security(dir); - sbsec = dir->i_sb->s_security; + sbsec = selinux_superblock(dir->i_sb);
sid = tsec->sid;
@@ -1958,7 +1953,7 @@ static int superblock_has_perm(const struct cred *cred, struct superblock_security_struct *sbsec; u32 sid = cred_sid(cred);
- sbsec = sb->s_security; + sbsec = selinux_superblock(sb); return avc_has_perm(&selinux_state, sid, sbsec->sid, SECCLASS_FILESYSTEM, perms, ad); } @@ -2587,11 +2582,7 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
static int selinux_sb_alloc_security(struct super_block *sb) { - struct superblock_security_struct *sbsec; - - sbsec = kzalloc(sizeof(struct superblock_security_struct), GFP_KERNEL); - if (!sbsec) - return -ENOMEM; + struct superblock_security_struct *sbsec = selinux_superblock(sb);
mutex_init(&sbsec->lock); INIT_LIST_HEAD(&sbsec->isec_head); @@ -2599,16 +2590,10 @@ static int selinux_sb_alloc_security(struct super_block *sb) sbsec->sid = SECINITSID_UNLABELED; sbsec->def_sid = SECINITSID_FILE; sbsec->mntpoint_sid = SECINITSID_UNLABELED; - sb->s_security = sbsec;
return 0; }
-static void selinux_sb_free_security(struct super_block *sb) -{ - superblock_free_security(sb); -} - static inline int opt_len(const char *s) { bool open_quote = false; @@ -2687,7 +2672,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void **mnt_opts) static int selinux_sb_remount(struct super_block *sb, void *mnt_opts) { struct selinux_mnt_opts *opts = mnt_opts; - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb); u32 sid; int rc;
@@ -2925,7 +2910,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir, int rc; char *context;
- sbsec = dir->i_sb->s_security; + sbsec = selinux_superblock(dir->i_sb);
newsid = tsec->create_sid;
@@ -3227,7 +3212,7 @@ static int selinux_inode_setxattr(struct user_namespace *mnt_userns, if (!selinux_initialized(&selinux_state)) return (inode_owner_or_capable(mnt_userns, inode) ? 0 : -EPERM);
- sbsec = inode->i_sb->s_security; + sbsec = selinux_superblock(inode->i_sb); if (!(sbsec->flags & SBLABEL_MNT)) return -EOPNOTSUPP;
@@ -3472,13 +3457,14 @@ static int selinux_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags) { struct inode_security_struct *isec = inode_security_novalidate(inode); - struct superblock_security_struct *sbsec = inode->i_sb->s_security; + struct superblock_security_struct *sbsec; u32 newsid; int rc;
if (strcmp(name, XATTR_SELINUX_SUFFIX)) return -EOPNOTSUPP;
+ sbsec = selinux_superblock(inode->i_sb); if (!(sbsec->flags & SBLABEL_MNT)) return -EOPNOTSUPP;
@@ -6975,6 +6961,7 @@ struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { .lbs_inode = sizeof(struct inode_security_struct), .lbs_ipc = sizeof(struct ipc_security_struct), .lbs_msg_msg = sizeof(struct msg_security_struct), + .lbs_superblock = sizeof(struct superblock_security_struct), };
#ifdef CONFIG_PERF_EVENTS @@ -7075,7 +7062,6 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds), LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
- LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security), LSM_HOOK_INIT(sb_free_mnt_opts, selinux_free_mnt_opts), LSM_HOOK_INIT(sb_remount, selinux_sb_remount), LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount), diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index ca4d7ab6a835..2953132408bf 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -188,4 +188,10 @@ static inline u32 current_sid(void) return tsec->sid; }
+static inline struct superblock_security_struct *selinux_superblock( + const struct super_block *superblock) +{ + return superblock->s_security + selinux_blob_sizes.lbs_superblock; +} + #endif /* _SELINUX_OBJSEC_H_ */ diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index 3438d0130378..9cea2e6c809f 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -47,6 +47,7 @@ #include <linux/sched.h> #include <linux/audit.h> #include <linux/vmalloc.h> +#include <linux/lsm_hooks.h> #include <net/netlabel.h>
#include "flask.h" @@ -2875,7 +2876,7 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb) struct sidtab *sidtab; int rc = 0; struct ocontext *c; - struct superblock_security_struct *sbsec = sb->s_security; + struct superblock_security_struct *sbsec = selinux_superblock(sb); const char *fstype = sb->s_type->name;
if (!selinux_initialized(state)) { diff --git a/security/smack/smack.h b/security/smack/smack.h index a9768b12716b..7077b18c79ec 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -357,6 +357,12 @@ static inline struct smack_known **smack_ipc(const struct kern_ipc_perm *ipc) return ipc->security + smack_blob_sizes.lbs_ipc; }
+static inline struct superblock_smack *smack_superblock( + const struct super_block *superblock) +{ + return superblock->s_security + smack_blob_sizes.lbs_superblock; +} + /* * Is the directory transmuting? */ diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 12a45e61c1a5..ee3e29603c9c 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -535,12 +535,7 @@ static int smack_syslog(int typefrom_file) */ static int smack_sb_alloc_security(struct super_block *sb) { - struct superblock_smack *sbsp; - - sbsp = kzalloc(sizeof(struct superblock_smack), GFP_KERNEL); - - if (sbsp == NULL) - return -ENOMEM; + struct superblock_smack *sbsp = smack_superblock(sb);
sbsp->smk_root = &smack_known_floor; sbsp->smk_default = &smack_known_floor; @@ -549,22 +544,10 @@ static int smack_sb_alloc_security(struct super_block *sb) /* * SMK_SB_INITIALIZED will be zero from kzalloc. */ - sb->s_security = sbsp;
return 0; }
-/** - * smack_sb_free_security - free a superblock blob - * @sb: the superblock getting the blob - * - */ -static void smack_sb_free_security(struct super_block *sb) -{ - kfree(sb->s_security); - sb->s_security = NULL; -} - struct smack_mnt_opts { const char *fsdefault, *fsfloor, *fshat, *fsroot, *fstransmute; }; @@ -772,7 +755,7 @@ static int smack_set_mnt_opts(struct super_block *sb, { struct dentry *root = sb->s_root; struct inode *inode = d_backing_inode(root); - struct superblock_smack *sp = sb->s_security; + struct superblock_smack *sp = smack_superblock(sb); struct inode_smack *isp; struct smack_known *skp; struct smack_mnt_opts *opts = mnt_opts; @@ -871,7 +854,7 @@ static int smack_set_mnt_opts(struct super_block *sb, */ static int smack_sb_statfs(struct dentry *dentry) { - struct superblock_smack *sbp = dentry->d_sb->s_security; + struct superblock_smack *sbp = smack_superblock(dentry->d_sb); int rc; struct smk_audit_info ad;
@@ -905,7 +888,7 @@ static int smack_bprm_creds_for_exec(struct linux_binprm *bprm) if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task) return 0;
- sbsp = inode->i_sb->s_security; + sbsp = smack_superblock(inode->i_sb); if ((sbsp->smk_flags & SMK_SB_UNTRUSTED) && isp->smk_task != sbsp->smk_root) return 0; @@ -1157,7 +1140,7 @@ static int smack_inode_rename(struct inode *old_inode, */ static int smack_inode_permission(struct inode *inode, int mask) { - struct superblock_smack *sbsp = inode->i_sb->s_security; + struct superblock_smack *sbsp = smack_superblock(inode->i_sb); struct smk_audit_info ad; int no_block = mask & MAY_NOT_BLOCK; int rc; @@ -1400,7 +1383,7 @@ static int smack_inode_removexattr(struct user_namespace *mnt_userns, */ if (strcmp(name, XATTR_NAME_SMACK) == 0) { struct super_block *sbp = dentry->d_sb; - struct superblock_smack *sbsp = sbp->s_security; + struct superblock_smack *sbsp = smack_superblock(sbp);
isp->smk_inode = sbsp->smk_default; } else if (strcmp(name, XATTR_NAME_SMACKEXEC) == 0) @@ -1670,7 +1653,7 @@ static int smack_mmap_file(struct file *file, isp = smack_inode(file_inode(file)); if (isp->smk_mmap == NULL) return 0; - sbsp = file_inode(file)->i_sb->s_security; + sbsp = smack_superblock(file_inode(file)->i_sb); if (sbsp->smk_flags & SMK_SB_UNTRUSTED && isp->smk_mmap != sbsp->smk_root) return -EACCES; @@ -3285,7 +3268,7 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) return;
sbp = inode->i_sb; - sbsp = sbp->s_security; + sbsp = smack_superblock(sbp); /* * We're going to use the superblock default label * if there's no label on the file. @@ -4700,6 +4683,7 @@ struct lsm_blob_sizes smack_blob_sizes __lsm_ro_after_init = { .lbs_inode = sizeof(struct inode_smack), .lbs_ipc = sizeof(struct smack_known *), .lbs_msg_msg = sizeof(struct smack_known *), + .lbs_superblock = sizeof(struct superblock_smack), };
static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { @@ -4711,7 +4695,6 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(fs_context_parse_param, smack_fs_context_parse_param),
LSM_HOOK_INIT(sb_alloc_security, smack_sb_alloc_security), - LSM_HOOK_INIT(sb_free_security, smack_sb_free_security), LSM_HOOK_INIT(sb_free_mnt_opts, smack_free_mnt_opts), LSM_HOOK_INIT(sb_eat_lsm_opts, smack_sb_eat_lsm_opts), LSM_HOOK_INIT(sb_statfs, smack_sb_statfs),
On Tue, Mar 16, 2021 at 09:42:45PM +0100, Mickaël Salaün wrote:
From: Casey Schaufler casey@schaufler-ca.com
Move management of the superblock->sb_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules, the modules tell the infrastructure how much space is required, and the space is allocated there.
Cc: Kees Cook keescook@chromium.org Cc: John Johansen john.johansen@canonical.com Signed-off-by: Casey Schaufler casey@schaufler-ca.com
Reviewed-by: Kees Cook keescook@chromium.org
From: Mickaël Salaün mic@linux.microsoft.com
The sb_delete security hook is called when shutting down a superblock, which may be useful to release kernel objects tied to the superblock's lifetime (e.g. inodes).
This new hook is needed by Landlock to release (ephemerally) tagged struct inodes. This comes from the unprivileged nature of Landlock described in the next commit.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-7-mic@digikod.net ---
Changes since v28: * Extend hook description (suggested by Serge Hallyn). * Add Acked-by Serge Hallyn.
Changes since v22: * Add Reviewed-by: Jann Horn jannh@google.com
Changes since v17: * Initial patch to replace the direct call to landlock_release_inodes() (requested by James Morris). https://lore.kernel.org/lkml/alpine.LRH.2.21.2005150536440.7929@namei.org/ --- fs/super.c | 1 + include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 3 +++ include/linux/security.h | 4 ++++ security/security.c | 5 +++++ 5 files changed, 14 insertions(+)
diff --git a/fs/super.c b/fs/super.c index 8c1baca35c16..11b7e7213fd1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -454,6 +454,7 @@ void generic_shutdown_super(struct super_block *sb) evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); + security_sb_delete(sb);
if (sb->s_dio_done_wq) { destroy_workqueue(sb->s_dio_done_wq); diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 477a597db013..e8adadbf9581 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -59,6 +59,7 @@ LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc, LSM_HOOK(int, -ENOPARAM, fs_context_parse_param, struct fs_context *fc, struct fs_parameter *param) LSM_HOOK(int, 0, sb_alloc_security, struct super_block *sb) +LSM_HOOK(void, LSM_RET_VOID, sb_delete, struct super_block *sb) LSM_HOOK(void, LSM_RET_VOID, sb_free_security, struct super_block *sb) LSM_HOOK(void, LSM_RET_VOID, sb_free_mnt_opts, void *mnt_opts) LSM_HOOK(int, 0, sb_eat_lsm_opts, char *orig, void **mnt_opts) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 75715998a95f..cc2eaaaec0e4 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -108,6 +108,9 @@ * allocated. * @sb contains the super_block structure to be modified. * Return 0 if operation was successful. + * @sb_delete: + * Release objects tied to a superblock (e.g. inodes). + * @sb contains the super_block structure being released. * @sb_free_security: * Deallocate and clear the sb->s_security field. * @sb contains the super_block structure to be modified. diff --git a/include/linux/security.h b/include/linux/security.h index 8aeebd6646dc..90298baa4551 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -291,6 +291,7 @@ void security_bprm_committed_creds(struct linux_binprm *bprm); int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc); int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param); int security_sb_alloc(struct super_block *sb); +void security_sb_delete(struct super_block *sb); void security_sb_free(struct super_block *sb); void security_free_mnt_opts(void **mnt_opts); int security_sb_eat_lsm_opts(char *options, void **mnt_opts); @@ -631,6 +632,9 @@ static inline int security_sb_alloc(struct super_block *sb) return 0; }
+static inline void security_sb_delete(struct super_block *sb) +{ } + static inline void security_sb_free(struct super_block *sb) { }
diff --git a/security/security.c b/security/security.c index e9c29480eb18..bb666f992497 100644 --- a/security/security.c +++ b/security/security.c @@ -900,6 +900,11 @@ int security_sb_alloc(struct super_block *sb) return rc; }
+void security_sb_delete(struct super_block *sb) +{ + call_void_hook(sb_delete, sb); +} + void security_sb_free(struct super_block *sb) { call_void_hook(sb_free_security, sb);
On Tue, Mar 16, 2021 at 09:42:46PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
The sb_delete security hook is called when shutting down a superblock, which may be useful to release kernel objects tied to the superblock's lifetime (e.g. inodes).
This new hook is needed by Landlock to release (ephemerally) tagged struct inodes. This comes from the unprivileged nature of Landlock described in the next commit.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com
Reviewed-by: Kees Cook keescook@chromium.org
From: Mickaël Salaün mic@linux.microsoft.com
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain. To enable an unprivileged process to express a file hierarchy, it first needs to open a directory (or a file) and pass this file descriptor to the kernel through landlock_add_rule(2). When checking if a file access request is allowed, we walk from the requested dentry to the real root, following the different mount layers. The access to each "tagged" inodes are collected according to their rule layer level, and ANDed to create access to the requested file hierarchy. This makes possible to identify a lot of files without tagging every inodes nor modifying the filesystem, while still following the view and understanding the user has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not keep the same struct inodes for the same inodes whereas these inodes are in use.
This commit adds a minimal set of supported filesystem access-control which doesn't enable to restrict all file-related actions. This is the result of multiple discussions to minimize the code of Landlock to ease review. Thanks to the Landlock design, extending this access-control without breaking user space will not be a problem. Moreover, seccomp filters can be used to restrict the use of syscall families which may not be currently handled by Landlock.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Anton Ivanov anton.ivanov@cambridgegreys.com Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Jeff Dike jdike@addtoit.com Cc: Kees Cook keescook@chromium.org Cc: Richard Weinberger richard@nod.at Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Link: https://lore.kernel.org/r/20210316204252.427806-8-mic@digikod.net ---
Changes since v29: * Remove a useless unlock/lock for the first loop walk in hook_sb_delete(). This also makes the code clearer but doesn't change the garantees for iput(). * Rename iput_inode to prev_inode, which shows its origin.
Changes since v28: * Fix race conditions that could be caused by concurrent calls to release_inode() and hook_sb_delete(). * Avoid livelock when a lot of inodes are tagged. * Improve concurrency locking and add comments to explain the specific lock rules. * Add an inode_free_security hook to check that release_inode() and hook_sb_delete() do their job. * Add early return to check_access_path() to check if the access request is empty. This doesn't change the semantic. * Reword the first description sentence (suggested by Serge Hallyn).
Changes since v27: * Fix domains with layers of non-overlapping access rights (cf. layout1.non_overlapping_accesses test) thanks to a stack of access rights per layer (replacing ORed access rights). This avoids too-restrictive domains. * Cosmetic fixes and updates in comments and Kconfig.
Changes since v26: * Check each rule of a path to enable a more permissive and pragmatic access control per layer. Suggested by Jann Horn: https://lore.kernel.org/lkml/CAG48ez1O0VTwEiRd3KqexoF78WR+cmP5bGk5Kh5Cs7aPep... * Rename check_access_path_continue() to unmask_layers() and make it return the new layer mask. * Avoid double domain check in hook_file_open(). * In the documentation, add utime(2) as another example of unhandled syscalls. Indeed, using `touch` to test write access may be tempting. * Remove outdated comment about OverlayFS. * Rename the landlock.h ifdef to align with most similar files. * Fix spelling.
Changes since v25: * Move build_check_layer() to ruleset.c, and add built-time checks for the fs_access_mask and access variables according to _LANDLOCK_ACCESS_FS_MASK. * Move limits to a dedicated file and rename them: _LANDLOCK_ACCESS_FS_LAST and _LANDLOCK_ACCESS_FS_MASK. * Set build_check_layer() as non-inline to trigger a warning if it is not called. * Use BITS_PER_TYPE() macro. * Rename function to landlock_add_fs_hooks(). * Cosmetic variable renames.
Changes since v24: * Use the new struct landlock_rule and landlock_layer to not mix accesses from different layers. Revert "Enforce deterministic interleaved path rules" from v24, and fix the layer check. This enables to follow a sane semantic: an access is granted if, for each policy layer, at least one rule encountered on the pathwalk grants the access, regardless of their position in the layer stack (suggested by Jann Horn). See layout1.interleaved_masked_accesses tests from tools/testing/selftests/landlock/fs_test.c for corner cases. * Add build-time checks for layers. * Use the new landlock_insert_rule() API.
Changes since v23: * Enforce deterministic interleaved path rules. To have consistent layered rules, granting access to a path implies that all accesses tied to inodes, from the requested file to the real root, must be checked. Otherwise, stacked rules may result to overzealous restrictions. By excluding the ability to add exceptions in the same layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get deterministic interleaved path rules. This removes an optimization which could be replaced by a proper cache mechanism. This also further simplifies and explain check_access_path_continue(). * Fix memory allocation error handling in landlock_create_object() calls. This prevent to inadvertently hold an inode. * In get_inode_object(), improve comments, make code more readable and move kfree() call out of the lock window. * Use the simplified landlock_insert_rule() API.
Changes since v22: * Simplify check_access_path_continue() (suggested by Jann Horn). * Remove prefetch() call for now (suggested by Jann Horn). * Fix spelling and remove superfluous comment (spotted by Jann Horn). * Cosmetic variable renaming.
Changes since v21: * Rename ARCH_EPHEMERAL_STATES to ARCH_EPHEMERAL_INODES (suggested by James Morris). * Remove the LANDLOCK_ACCESS_FS_CHROOT right because chroot(2) (which requires CAP_SYS_CHROOT) doesn't enable to bypass Landlock (as tests demonstrate it), and because it is often used by sandboxes, it would be counterproductive to forbid it. This also reduces the code size. * Clean up documentation.
Changes since v19: * Fix spelling (spotted by Randy Dunlap).
Changes since v18: * Remove useless include. * Fix spelling.
Changes since v17: * Replace landlock_release_inodes() with security_sb_delete() (requested by James Morris). * Replace struct super_block->s_landlock_inode_refs with the LSM infrastructure management of the superblock (requested by James Morris). * Fix mknod restriction with a zero mode (spotted by Vincent Dagonneau). * Minimize executed code in path_mknod and file_open hooks when the current tasks is not sandboxed. * Remove useless checks on the file pointer and inode in hook_file_open() . * Constify domain pointers. * Rename inode_landlock() to landlock_inode(). * Import include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* from the ruleset and domain management patch. * Explain the rational of this minimal set of access-control. https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.ne...
Changes since v16: * Add ARCH_EPHEMERAL_STATES and enable it for UML.
Changes since v15: * Replace layer_levels and layer_depth with a bitfield of layers: this enables to properly manage superset and subset of access rights, whatever their order in the stack of layers. Cf. https://lore.kernel.org/lkml/e07fe473-1801-01cc-12ae-b3167f95250e@digikod.ne... * Allow to open pipes and similar special files through /proc/self/fd/. * Properly handle internal filesystems such as nsfs: always allow these kind of roots because disconnected path cannot be evaluated. * Remove the LANDLOCK_ACCESS_FS_LINK_TO and LANDLOCK_ACCESS_FS_RENAME_{TO,FROM}, but use the LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and LANDLOCK_ACCESS_FS_MAKE_* instead. Indeed, it is not possible for now (and not really useful) to express the semantic of a source and a destination. * Check access rights to remove a directory or a file with rename(2). * Forbid reparenting when linking or renaming. This is needed to easily protect against possible privilege escalation by changing the place of a file or directory in relation to an enforced access policy (from the set of layers). This will be relaxed in the future. * Update hooks to take into account replacement of the object's self and beneath access bitfields with one. Simplify the code. * Check file related access rights. * Check d_is_negative() instead of !d_backing_inode() in check_access_path_continue(), and continue the path walk while there is no mapped inode e.g., with rename(2). * Check private inode in check_access_path(). * Optimize get_file_access() when dealing with a directory. * Add missing atomic.h .
Changes since v14: * Simplify the object, rule and ruleset management at the expense of a less aggressive memory freeing (contributed by Jann Horn, with additional modifications): - Rewrite release_inode() to use inode->sb->s_landlock_inode_refs. - Remove useless checks in landlock_release_inodes(), clean object pointer according to the new struct landlock_object and wait for all iput() to complete. - Rewrite get_inode_object() according to the new struct landlock_object. If there is a race-condition when cleaning up an object, we retry until the concurrent thread finished the object cleaning. Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0... * Fix nested domains by implementing a notion of layer level and depth: - Check for matching level ranges when walking through a file path. - Only allow access if every layer granted the access request. * Handles files without mount points (e.g. pipes). * Hardens path walk by checking inode pointer values. * Prefetches d_parent when walking to the root directory. * Remove useless inode_alloc_security hook() (suggested by Jann Horn): already initialized by lsm_inode_alloc(). * Remove the inode_free_security hook. * Remove access checks that may be required for FD-only requests: truncate, getattr, lock, chmod, chown, chgrp, ioctl. This will be handle in a future evolution of Landlock, but right now the goal is to lighten the code to ease review. * Constify variables. * Move ABI checks into syscall.c . * Cosmetic variable renames.
Changes since v11: * Add back, revamp and make a fully working filesystem access-control based on paths and inodes. * Remove the eBPF dependency.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/ --- MAINTAINERS | 1 + arch/Kconfig | 7 + arch/um/Kconfig | 1 + include/uapi/linux/landlock.h | 75 ++++ security/landlock/Kconfig | 2 +- security/landlock/Makefile | 2 +- security/landlock/fs.c | 687 ++++++++++++++++++++++++++++++++++ security/landlock/fs.h | 56 +++ security/landlock/limits.h | 4 + security/landlock/ruleset.c | 4 + security/landlock/setup.c | 7 + security/landlock/setup.h | 2 + 12 files changed, 846 insertions(+), 2 deletions(-) create mode 100644 include/uapi/linux/landlock.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h
diff --git a/MAINTAINERS b/MAINTAINERS index 87a2738dfdec..70ec117efa8a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10003,6 +10003,7 @@ L: linux-security-module@vger.kernel.org S: Supported W: https://landlock.io T: git https://github.com/landlock-lsm/linux.git +F: include/uapi/linux/landlock.h F: security/landlock/ K: landlock K: LANDLOCK diff --git a/arch/Kconfig b/arch/Kconfig index ecfd3520b676..8160ab7e3e03 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1013,6 +1013,13 @@ config COMPAT_32BIT_TIME config ARCH_NO_PREEMPT bool
+config ARCH_EPHEMERAL_INODES + def_bool n + help + An arch should select this symbol if it doesn't keep track of inode + instances on its own, but instead relies on something else (e.g. the + host kernel for an UML kernel). + config ARCH_SUPPORTS_RT bool
diff --git a/arch/um/Kconfig b/arch/um/Kconfig index c3030db3325f..57cfd9a1c082 100644 --- a/arch/um/Kconfig +++ b/arch/um/Kconfig @@ -5,6 +5,7 @@ menu "UML-specific options" config UML bool default y + select ARCH_EPHEMERAL_INODES select ARCH_HAS_KCOV select ARCH_NO_PREEMPT select HAVE_ARCH_AUDITSYSCALL diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h new file mode 100644 index 000000000000..f69877099c8e --- /dev/null +++ b/include/uapi/linux/landlock.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Landlock - User space API + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _UAPI_LINUX_LANDLOCK_H +#define _UAPI_LINUX_LANDLOCK_H + +/** + * DOC: fs_access + * + * A set of actions on kernel objects may be defined by an attribute (e.g. + * &struct landlock_path_beneath_attr) including a bitmask of access. + * + * Filesystem flags + * ~~~~~~~~~~~~~~~~ + * + * These flags enable to restrict a sandboxed process to a set of actions on + * files and directories. Files or directories opened before the sandboxing + * are not subject to these restrictions. + * + * A file can only receive these access rights: + * + * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file. + * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access. + * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access. + * + * A directory can receive access rights related to files or directories. The + * following access right is applied to the directory itself, and the + * directories beneath it: + * + * - %LANDLOCK_ACCESS_FS_READ_DIR: Open a directory or list its content. + * + * However, the following access rights only apply to the content of a + * directory, not the directory itself: + * + * - %LANDLOCK_ACCESS_FS_REMOVE_DIR: Remove an empty directory or rename one. + * - %LANDLOCK_ACCESS_FS_REMOVE_FILE: Unlink (or rename) a file. + * - %LANDLOCK_ACCESS_FS_MAKE_CHAR: Create (or rename or link) a character + * device. + * - %LANDLOCK_ACCESS_FS_MAKE_DIR: Create (or rename) a directory. + * - %LANDLOCK_ACCESS_FS_MAKE_REG: Create (or rename or link) a regular file. + * - %LANDLOCK_ACCESS_FS_MAKE_SOCK: Create (or rename or link) a UNIX domain + * socket. + * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create (or rename or link) a named pipe. + * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create (or rename or link) a block device. + * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create (or rename or link) a symbolic link. + * + * .. warning:: + * + * It is currently not possible to restrict some file-related actions + * accessible through these syscall families: :manpage:`chdir(2)`, + * :manpage:`truncate(2)`, :manpage:`stat(2)`, :manpage:`flock(2)`, + * :manpage:`chmod(2)`, :manpage:`chown(2)`, :manpage:`setxattr(2)`, + * :manpage:`utime(2)`, :manpage:`ioctl(2)`, :manpage:`fcntl(2)`. + * Future Landlock evolutions will enable to restrict them. + */ +#define LANDLOCK_ACCESS_FS_EXECUTE (1ULL << 0) +#define LANDLOCK_ACCESS_FS_WRITE_FILE (1ULL << 1) +#define LANDLOCK_ACCESS_FS_READ_FILE (1ULL << 2) +#define LANDLOCK_ACCESS_FS_READ_DIR (1ULL << 3) +#define LANDLOCK_ACCESS_FS_REMOVE_DIR (1ULL << 4) +#define LANDLOCK_ACCESS_FS_REMOVE_FILE (1ULL << 5) +#define LANDLOCK_ACCESS_FS_MAKE_CHAR (1ULL << 6) +#define LANDLOCK_ACCESS_FS_MAKE_DIR (1ULL << 7) +#define LANDLOCK_ACCESS_FS_MAKE_REG (1ULL << 8) +#define LANDLOCK_ACCESS_FS_MAKE_SOCK (1ULL << 9) +#define LANDLOCK_ACCESS_FS_MAKE_FIFO (1ULL << 10) +#define LANDLOCK_ACCESS_FS_MAKE_BLOCK (1ULL << 11) +#define LANDLOCK_ACCESS_FS_MAKE_SYM (1ULL << 12) + +#endif /* _UAPI_LINUX_LANDLOCK_H */ diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig index c1e862a38410..8e33c4e8ffb8 100644 --- a/security/landlock/Kconfig +++ b/security/landlock/Kconfig @@ -2,7 +2,7 @@
config SECURITY_LANDLOCK bool "Landlock support" - depends on SECURITY + depends on SECURITY && !ARCH_EPHEMERAL_INODES select SECURITY_PATH help Landlock is a sandboxing mechanism that enables processes to restrict diff --git a/security/landlock/Makefile b/security/landlock/Makefile index f1d1eb72fa76..92e3d80ab8ed 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
landlock-y := setup.o object.o ruleset.o \ - cred.o ptrace.o + cred.o ptrace.o fs.o diff --git a/security/landlock/fs.c b/security/landlock/fs.c new file mode 100644 index 000000000000..e3b710d6f56d --- /dev/null +++ b/security/landlock/fs.c @@ -0,0 +1,687 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Filesystem management and hooks + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/atomic.h> +#include <linux/bitops.h> +#include <linux/bits.h> +#include <linux/compiler_types.h> +#include <linux/dcache.h> +#include <linux/err.h> +#include <linux/fs.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/limits.h> +#include <linux/list.h> +#include <linux/lsm_hooks.h> +#include <linux/mount.h> +#include <linux/namei.h> +#include <linux/path.h> +#include <linux/rcupdate.h> +#include <linux/spinlock.h> +#include <linux/stat.h> +#include <linux/types.h> +#include <linux/wait_bit.h> +#include <linux/workqueue.h> +#include <uapi/linux/landlock.h> + +#include "common.h" +#include "cred.h" +#include "fs.h" +#include "limits.h" +#include "object.h" +#include "ruleset.h" +#include "setup.h" + +/* Underlying object management */ + +static void release_inode(struct landlock_object *const object) + __releases(object->lock) +{ + struct inode *const inode = object->underobj; + struct super_block *sb; + + if (!inode) { + spin_unlock(&object->lock); + return; + } + + /* + * Protects against concurrent use by hook_sb_delete() of the reference + * to the underlying inode. + */ + object->underobj = NULL; + /* + * Makes sure that if the filesystem is concurrently unmounted, + * hook_sb_delete() will wait for us to finish iput(). + */ + sb = inode->i_sb; + atomic_long_inc(&landlock_superblock(sb)->inode_refs); + spin_unlock(&object->lock); + /* + * Because object->underobj was not NULL, hook_sb_delete() and + * get_inode_object() guarantee that it is safe to reset + * landlock_inode(inode)->object while it is not NULL. It is therefore + * not necessary to lock inode->i_lock. + */ + rcu_assign_pointer(landlock_inode(inode)->object, NULL); + /* + * Now, new rules can safely be tied to @inode with get_inode_object(). + */ + + iput(inode); + if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs)) + wake_up_var(&landlock_superblock(sb)->inode_refs); +} + +static const struct landlock_object_underops landlock_fs_underops = { + .release = release_inode +}; + +/* Ruleset management */ + +static struct landlock_object *get_inode_object(struct inode *const inode) +{ + struct landlock_object *object, *new_object; + struct landlock_inode_security *inode_sec = landlock_inode(inode); + + rcu_read_lock(); +retry: + object = rcu_dereference(inode_sec->object); + if (object) { + if (likely(refcount_inc_not_zero(&object->usage))) { + rcu_read_unlock(); + return object; + } + /* + * We are racing with release_inode(), the object is going + * away. Wait for release_inode(), then retry. + */ + spin_lock(&object->lock); + spin_unlock(&object->lock); + goto retry; + } + rcu_read_unlock(); + + /* + * If there is no object tied to @inode, then create a new one (without + * holding any locks). + */ + new_object = landlock_create_object(&landlock_fs_underops, inode); + if (IS_ERR(new_object)) + return new_object; + + /* Protects against concurrent get_inode_object() calls. */ + spin_lock(&inode->i_lock); + object = rcu_dereference_protected(inode_sec->object, + lockdep_is_held(&inode->i_lock)); + if (unlikely(object)) { + /* Someone else just created the object, bail out and retry. */ + spin_unlock(&inode->i_lock); + kfree(new_object); + + rcu_read_lock(); + goto retry; + } + + rcu_assign_pointer(inode_sec->object, new_object); + /* + * @inode will be released by hook_sb_delete() on its superblock + * shutdown. + */ + ihold(inode); + spin_unlock(&inode->i_lock); + return new_object; +} + +/* All access rights that can be tied to files. */ +#define ACCESS_FILE ( \ + LANDLOCK_ACCESS_FS_EXECUTE | \ + LANDLOCK_ACCESS_FS_WRITE_FILE | \ + LANDLOCK_ACCESS_FS_READ_FILE) + +/* + * @path: Should have been checked by get_path_from_fd(). + */ +int landlock_append_fs_rule(struct landlock_ruleset *const ruleset, + const struct path *const path, u32 access_rights) +{ + int err; + struct landlock_object *object; + + /* Files only get access rights that make sense. */ + if (!d_is_dir(path->dentry) && (access_rights | ACCESS_FILE) != + ACCESS_FILE) + return -EINVAL; + if (WARN_ON_ONCE(ruleset->num_layers != 1)) + return -EINVAL; + + /* Transforms relative access rights to absolute ones. */ + access_rights |= LANDLOCK_MASK_ACCESS_FS & ~ruleset->fs_access_masks[0]; + object = get_inode_object(d_backing_inode(path->dentry)); + if (IS_ERR(object)) + return PTR_ERR(object); + mutex_lock(&ruleset->lock); + err = landlock_insert_rule(ruleset, object, access_rights); + mutex_unlock(&ruleset->lock); + /* + * No need to check for an error because landlock_insert_rule() + * increments the refcount for the new object if needed. + */ + landlock_put_object(object); + return err; +} + +/* Access-control management */ + +static inline u64 unmask_layers( + const struct landlock_ruleset *const domain, + const struct path *const path, const u32 access_request, + u64 layer_mask) +{ + const struct landlock_rule *rule; + const struct inode *inode; + size_t i; + + if (d_is_negative(path->dentry)) + /* Continues to walk while there is no mapped inode. */ + return layer_mask; + inode = d_backing_inode(path->dentry); + rcu_read_lock(); + rule = landlock_find_rule(domain, + rcu_dereference(landlock_inode(inode)->object)); + rcu_read_unlock(); + if (!rule) + return layer_mask; + + /* + * An access is granted if, for each policy layer, at least one rule + * encountered on the pathwalk grants the requested accesses, + * regardless of their position in the layer stack. We must then check + * the remaining layers for each inode, from the first added layer to + * the last one. + */ + for (i = 0; i < rule->num_layers; i++) { + const struct landlock_layer *const layer = &rule->layers[i]; + const u64 layer_level = BIT_ULL(layer->level - 1); + + /* Checks that the layer grants access to the full request. */ + if ((layer->access & access_request) == access_request) { + layer_mask &= ~layer_level; + + if (layer_mask == 0) + return layer_mask; + } + } + return layer_mask; +} + +static int check_access_path(const struct landlock_ruleset *const domain, + const struct path *const path, u32 access_request) +{ + bool allowed = false; + struct path walker_path; + u64 layer_mask; + size_t i; + + /* Make sure all layers can be checked. */ + BUILD_BUG_ON(BITS_PER_TYPE(layer_mask) < LANDLOCK_MAX_NUM_LAYERS); + + if (!access_request) + return 0; + if (WARN_ON_ONCE(!domain || !path)) + return 0; + /* + * Allows access to pseudo filesystems that will never be mountable + * (e.g. sockfs, pipefs), but can still be reachable through + * /proc/self/fd . + */ + if ((path->dentry->d_sb->s_flags & SB_NOUSER) || + (d_is_positive(path->dentry) && + unlikely(IS_PRIVATE(d_backing_inode(path->dentry))))) + return 0; + if (WARN_ON_ONCE(domain->num_layers < 1)) + return -EACCES; + + /* Saves all layers handling a subset of requested accesses. */ + layer_mask = 0; + for (i = 0; i < domain->num_layers; i++) { + if (domain->fs_access_masks[i] & access_request) + layer_mask |= BIT_ULL(i); + } + /* An access request not handled by the domain is allowed. */ + if (layer_mask == 0) + return 0; + + walker_path = *path; + path_get(&walker_path); + /* + * We need to walk through all the hierarchy to not miss any relevant + * restriction. + */ + while (true) { + struct dentry *parent_dentry; + + layer_mask = unmask_layers(domain, &walker_path, + access_request, layer_mask); + if (layer_mask == 0) { + /* Stops when a rule from each layer grants access. */ + allowed = true; + break; + } + +jump_up: + if (walker_path.dentry == walker_path.mnt->mnt_root) { + if (follow_up(&walker_path)) { + /* Ignores hidden mount points. */ + goto jump_up; + } else { + /* + * Stops at the real root. Denies access + * because not all layers have granted access. + */ + allowed = false; + break; + } + } + if (unlikely(IS_ROOT(walker_path.dentry))) { + /* + * Stops at disconnected root directories. Only allows + * access to internal filesystems (e.g. nsfs, which is + * reachable through /proc/self/ns). + */ + allowed = !!(walker_path.mnt->mnt_flags & MNT_INTERNAL); + break; + } + parent_dentry = dget_parent(walker_path.dentry); + dput(walker_path.dentry); + walker_path.dentry = parent_dentry; + } + path_put(&walker_path); + return allowed ? 0 : -EACCES; +} + +static inline int current_check_access_path(const struct path *const path, + const u32 access_request) +{ + const struct landlock_ruleset *const dom = + landlock_get_current_domain(); + + if (!dom) + return 0; + return check_access_path(dom, path, access_request); +} + +/* Inode hooks */ + +static void hook_inode_free_security(struct inode *const inode) +{ + /* + * All inodes must already have been untied from their object by + * release_inode() or hook_sb_delete(). + */ + WARN_ON_ONCE(landlock_inode(inode)->object); +} + +/* Super-block hooks */ + +/* + * Release the inodes used in a security policy. + * + * Cf. fsnotify_unmount_inodes() and invalidate_inodes() + */ +static void hook_sb_delete(struct super_block *const sb) +{ + struct inode *inode, *prev_inode = NULL; + + if (!landlock_initialized) + return; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + struct landlock_object *object; + + /* Only handles referenced inodes. */ + if (!atomic_read(&inode->i_count)) + continue; + + /* + * Checks I_FREEING and I_WILL_FREE to protect against a race + * condition when release_inode() just called iput(), which + * could lead to a NULL dereference of inode->security or a + * second call to iput() for the same Landlock object. Also + * checks I_NEW because such inode cannot be tied to an object. + */ + spin_lock(&inode->i_lock); + if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { + spin_unlock(&inode->i_lock); + continue; + } + + rcu_read_lock(); + object = rcu_dereference(landlock_inode(inode)->object); + if (!object) { + rcu_read_unlock(); + spin_unlock(&inode->i_lock); + continue; + } + /* Keeps a reference to this inode until the next loop walk. */ + __iget(inode); + spin_unlock(&inode->i_lock); + + /* + * If there is no concurrent release_inode() ongoing, then we + * are in charge of calling iput() on this inode, otherwise we + * will just wait for it to finish. + */ + spin_lock(&object->lock); + if (object->underobj == inode) { + object->underobj = NULL; + spin_unlock(&object->lock); + rcu_read_unlock(); + + /* + * Because object->underobj was not NULL, + * release_inode() and get_inode_object() guarantee + * that it is safe to reset + * landlock_inode(inode)->object while it is not NULL. + * It is therefore not necessary to lock inode->i_lock. + */ + rcu_assign_pointer(landlock_inode(inode)->object, NULL); + /* + * At this point, we own the ihold() reference that was + * originally set up by get_inode_object() and the + * __iget() reference that we just set in this loop + * walk. Therefore the following call to iput() will + * not sleep nor drop the inode because there is now at + * least two references to it. + */ + iput(inode); + } else { + spin_unlock(&object->lock); + rcu_read_unlock(); + } + + if (prev_inode) { + /* + * At this point, we still own the __iget() reference + * that we just set in this loop walk. Therefore we + * can drop the list lock and know that the inode won't + * disappear from under us until the next loop walk. + */ + spin_unlock(&sb->s_inode_list_lock); + /* + * We can now actually put the inode reference from the + * previous loop walk, which is not needed anymore. + */ + iput(prev_inode); + cond_resched(); + spin_lock(&sb->s_inode_list_lock); + } + prev_inode = inode; + } + spin_unlock(&sb->s_inode_list_lock); + + /* Puts the inode reference from the last loop walk, if any. */ + if (prev_inode) + iput(prev_inode); + /* Waits for pending iput() in release_inode(). */ + wait_var_event(&landlock_superblock(sb)->inode_refs, !atomic_long_read( + &landlock_superblock(sb)->inode_refs)); +} + +/* + * Because a Landlock security policy is defined according to the filesystem + * layout (i.e. the mount namespace), changing it may grant access to files not + * previously allowed. + * + * To make it simple, deny any filesystem layout modification by landlocked + * processes. Non-landlocked processes may still change the namespace of a + * landlocked process, but this kind of threat must be handled by a system-wide + * access-control security policy. + * + * This could be lifted in the future if Landlock can safely handle mount + * namespace updates requested by a landlocked process. Indeed, we could + * update the current domain (which is currently read-only) by taking into + * account the accesses of the source and the destination of a new mount point. + * However, it would also require to make all the child domains dynamically + * inherit these new constraints. Anyway, for backward compatibility reasons, + * a dedicated user space option would be required (e.g. as a ruleset command + * option). + */ +static int hook_sb_mount(const char *const dev_name, + const struct path *const path, const char *const type, + const unsigned long flags, void *const data) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +static int hook_move_mount(const struct path *const from_path, + const struct path *const to_path) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* + * Removing a mount point may reveal a previously hidden file hierarchy, which + * may then grant access to files, which may have previously been forbidden. + */ +static int hook_sb_umount(struct vfsmount *const mnt, const int flags) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +static int hook_sb_remount(struct super_block *const sb, void *const mnt_opts) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* + * pivot_root(2), like mount(2), changes the current mount namespace. It must + * then be forbidden for a landlocked process. + * + * However, chroot(2) may be allowed because it only changes the relative root + * directory of the current process. Moreover, it can be used to restrict the + * view of the filesystem. + */ +static int hook_sb_pivotroot(const struct path *const old_path, + const struct path *const new_path) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* Path hooks */ + +static inline u32 get_mode_access(const umode_t mode) +{ + switch (mode & S_IFMT) { + case S_IFLNK: + return LANDLOCK_ACCESS_FS_MAKE_SYM; + case 0: + /* A zero mode translates to S_IFREG. */ + case S_IFREG: + return LANDLOCK_ACCESS_FS_MAKE_REG; + case S_IFDIR: + return LANDLOCK_ACCESS_FS_MAKE_DIR; + case S_IFCHR: + return LANDLOCK_ACCESS_FS_MAKE_CHAR; + case S_IFBLK: + return LANDLOCK_ACCESS_FS_MAKE_BLOCK; + case S_IFIFO: + return LANDLOCK_ACCESS_FS_MAKE_FIFO; + case S_IFSOCK: + return LANDLOCK_ACCESS_FS_MAKE_SOCK; + default: + WARN_ON_ONCE(1); + return 0; + } +} + +/* + * Creating multiple links or renaming may lead to privilege escalations if not + * handled properly. Indeed, we must be sure that the source doesn't gain more + * privileges by being accessible from the destination. This is getting more + * complex when dealing with multiple layers. The whole picture can be seen as + * a multilayer partial ordering problem. A future version of Landlock will + * deal with that. + */ +static int hook_path_link(struct dentry *const old_dentry, + const struct path *const new_dir, + struct dentry *const new_dentry) +{ + const struct landlock_ruleset *const dom = + landlock_get_current_domain(); + + if (!dom) + return 0; + /* The mount points are the same for old and new paths, cf. EXDEV. */ + if (old_dentry->d_parent != new_dir->dentry) + /* For now, forbids reparenting. */ + return -EACCES; + if (unlikely(d_is_negative(old_dentry))) + return -EACCES; + return check_access_path(dom, new_dir, + get_mode_access(d_backing_inode(old_dentry)->i_mode)); +} + +static inline u32 maybe_remove(const struct dentry *const dentry) +{ + if (d_is_negative(dentry)) + return 0; + return d_is_dir(dentry) ? LANDLOCK_ACCESS_FS_REMOVE_DIR : + LANDLOCK_ACCESS_FS_REMOVE_FILE; +} + +static int hook_path_rename(const struct path *const old_dir, + struct dentry *const old_dentry, + const struct path *const new_dir, + struct dentry *const new_dentry) +{ + const struct landlock_ruleset *const dom = + landlock_get_current_domain(); + + if (!dom) + return 0; + /* The mount points are the same for old and new paths, cf. EXDEV. */ + if (old_dir->dentry != new_dir->dentry) + /* For now, forbids reparenting. */ + return -EACCES; + if (WARN_ON_ONCE(d_is_negative(old_dentry))) + return -EACCES; + /* RENAME_EXCHANGE is handled because directories are the same. */ + return check_access_path(dom, old_dir, maybe_remove(old_dentry) | + maybe_remove(new_dentry) | + get_mode_access(d_backing_inode(old_dentry)->i_mode)); +} + +static int hook_path_mkdir(const struct path *const dir, + struct dentry *const dentry, const umode_t mode) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_DIR); +} + +static int hook_path_mknod(const struct path *const dir, + struct dentry *const dentry, const umode_t mode, + const unsigned int dev) +{ + const struct landlock_ruleset *const dom = + landlock_get_current_domain(); + + if (!dom) + return 0; + return check_access_path(dom, dir, get_mode_access(mode)); +} + +static int hook_path_symlink(const struct path *const dir, + struct dentry *const dentry, const char *const old_name) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_SYM); +} + +static int hook_path_unlink(const struct path *const dir, + struct dentry *const dentry) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_FILE); +} + +static int hook_path_rmdir(const struct path *const dir, + struct dentry *const dentry) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_DIR); +} + +/* File hooks */ + +static inline u32 get_file_access(const struct file *const file) +{ + u32 access = 0; + + if (file->f_mode & FMODE_READ) { + /* A directory can only be opened in read mode. */ + if (S_ISDIR(file_inode(file)->i_mode)) + return LANDLOCK_ACCESS_FS_READ_DIR; + access = LANDLOCK_ACCESS_FS_READ_FILE; + } + if (file->f_mode & FMODE_WRITE) + access |= LANDLOCK_ACCESS_FS_WRITE_FILE; + /* __FMODE_EXEC is indeed part of f_flags, not f_mode. */ + if (file->f_flags & __FMODE_EXEC) + access |= LANDLOCK_ACCESS_FS_EXECUTE; + return access; +} + +static int hook_file_open(struct file *const file) +{ + const struct landlock_ruleset *const dom = + landlock_get_current_domain(); + + if (!dom) + return 0; + /* + * Because a file may be opened with O_PATH, get_file_access() may + * return 0. This case will be handled with a future Landlock + * evolution. + */ + return check_access_path(dom, &file->f_path, get_file_access(file)); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), + + LSM_HOOK_INIT(sb_delete, hook_sb_delete), + LSM_HOOK_INIT(sb_mount, hook_sb_mount), + LSM_HOOK_INIT(move_mount, hook_move_mount), + LSM_HOOK_INIT(sb_umount, hook_sb_umount), + LSM_HOOK_INIT(sb_remount, hook_sb_remount), + LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot), + + LSM_HOOK_INIT(path_link, hook_path_link), + LSM_HOOK_INIT(path_rename, hook_path_rename), + LSM_HOOK_INIT(path_mkdir, hook_path_mkdir), + LSM_HOOK_INIT(path_mknod, hook_path_mknod), + LSM_HOOK_INIT(path_symlink, hook_path_symlink), + LSM_HOOK_INIT(path_unlink, hook_path_unlink), + LSM_HOOK_INIT(path_rmdir, hook_path_rmdir), + + LSM_HOOK_INIT(file_open, hook_file_open), +}; + +__init void landlock_add_fs_hooks(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/fs.h b/security/landlock/fs.h new file mode 100644 index 000000000000..9f14ec4d8d48 --- /dev/null +++ b/security/landlock/fs.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Filesystem management and hooks + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_FS_H +#define _SECURITY_LANDLOCK_FS_H + +#include <linux/fs.h> +#include <linux/init.h> +#include <linux/rcupdate.h> + +#include "ruleset.h" +#include "setup.h" + +struct landlock_inode_security { + /* + * @object: Weak pointer to an allocated object. All writes (i.e. + * creating a new object or removing one) are protected by the + * underlying inode->i_lock. Disassociating @object from the inode is + * additionally protected by @object->lock, from the time @object's + * usage refcount drops to zero to the time this pointer is nulled out. + * Cf. release_inode(). + */ + struct landlock_object __rcu *object; +}; + +struct landlock_superblock_security { + /* + * @inode_refs: References to Landlock underlying objects. + * Cf. struct super_block->s_fsnotify_inode_refs . + */ + atomic_long_t inode_refs; +}; + +static inline struct landlock_inode_security *landlock_inode( + const struct inode *const inode) +{ + return inode->i_security + landlock_blob_sizes.lbs_inode; +} + +static inline struct landlock_superblock_security *landlock_superblock( + const struct super_block *const superblock) +{ + return superblock->s_security + landlock_blob_sizes.lbs_superblock; +} + +__init void landlock_add_fs_hooks(void); + +int landlock_append_fs_rule(struct landlock_ruleset *const ruleset, + const struct path *const path, u32 access_hierarchy); + +#endif /* _SECURITY_LANDLOCK_FS_H */ diff --git a/security/landlock/limits.h b/security/landlock/limits.h index b734f597bb0e..2a0a1095ee27 100644 --- a/security/landlock/limits.h +++ b/security/landlock/limits.h @@ -10,8 +10,12 @@ #define _SECURITY_LANDLOCK_LIMITS_H
#include <linux/limits.h> +#include <uapi/linux/landlock.h>
#define LANDLOCK_MAX_NUM_LAYERS 64 #define LANDLOCK_MAX_NUM_RULES U32_MAX
+#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_MAKE_SYM +#define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1) + #endif /* _SECURITY_LANDLOCK_LIMITS_H */ diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c index 59c86126ea1c..9c15aea180a1 100644 --- a/security/landlock/ruleset.c +++ b/security/landlock/ruleset.c @@ -116,9 +116,11 @@ static void build_check_ruleset(void) .num_rules = ~0, .num_layers = ~0, }; + typeof(ruleset.fs_access_masks[0]) fs_access_mask = ~0;
BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES); BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS); + BUILD_BUG_ON(fs_access_mask < LANDLOCK_MASK_ACCESS_FS); }
/** @@ -217,9 +219,11 @@ static void build_check_layer(void) { const struct landlock_layer layer = { .level = ~0, + .access = ~0, };
BUILD_BUG_ON(layer.level < LANDLOCK_MAX_NUM_LAYERS); + BUILD_BUG_ON(layer.access < LANDLOCK_MASK_ACCESS_FS); }
/* @ruleset must be locked by the caller. */ diff --git a/security/landlock/setup.c b/security/landlock/setup.c index a5d6ef334991..f8e8e980454c 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,17 +11,24 @@
#include "common.h" #include "cred.h" +#include "fs.h" #include "ptrace.h" #include "setup.h"
+bool landlock_initialized __lsm_ro_after_init = false; + struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct landlock_cred_security), + .lbs_inode = sizeof(struct landlock_inode_security), + .lbs_superblock = sizeof(struct landlock_superblock_security), };
static int __init landlock_init(void) { landlock_add_cred_hooks(); landlock_add_ptrace_hooks(); + landlock_add_fs_hooks(); + landlock_initialized = true; pr_info("Up and running.\n"); return 0; } diff --git a/security/landlock/setup.h b/security/landlock/setup.h index 9fdbf33fcc33..1daffab1ab4b 100644 --- a/security/landlock/setup.h +++ b/security/landlock/setup.h @@ -11,6 +11,8 @@
#include <linux/lsm_hooks.h>
+extern bool landlock_initialized; + extern struct lsm_blob_sizes landlock_blob_sizes;
#endif /* _SECURITY_LANDLOCK_SETUP_H */
This commit adds a minimal set of supported filesystem access-control which doesn't enable to restrict all file-related actions.
It would be great to get some more review/acks on this patch, particularly from VFS/FS folk.
On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain. To enable an unprivileged process to express a file hierarchy, it first needs to open a directory (or a file) and pass this file descriptor to the kernel through landlock_add_rule(2). When checking if a file access request is allowed, we walk from the requested dentry to the real root, following the different mount layers. The access to each "tagged" inodes are collected according to their rule layer level, and ANDed to create access to the requested file hierarchy. This makes possible to identify a lot of files without tagging every inodes nor modifying the filesystem, while still following the view and understanding the user has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not keep the same struct inodes for the same inodes whereas these inodes are in use.
This commit adds a minimal set of supported filesystem access-control which doesn't enable to restrict all file-related actions. This is the result of multiple discussions to minimize the code of Landlock to ease review. Thanks to the Landlock design, extending this access-control without breaking user space will not be a problem. Moreover, seccomp filters can be used to restrict the use of syscall families which may not be currently handled by Landlock.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Anton Ivanov anton.ivanov@cambridgegreys.com Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Jeff Dike jdike@addtoit.com Cc: Kees Cook keescook@chromium.org Cc: Richard Weinberger richard@nod.at Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Link: https://lore.kernel.org/r/20210316204252.427806-8-mic@digikod.net [...]
- spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
struct landlock_object *object;
/* Only handles referenced inodes. */
if (!atomic_read(&inode->i_count))
continue;
/*
* Checks I_FREEING and I_WILL_FREE to protect against a race
* condition when release_inode() just called iput(), which
* could lead to a NULL dereference of inode->security or a
* second call to iput() for the same Landlock object. Also
* checks I_NEW because such inode cannot be tied to an object.
*/
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
This (and elsewhere here) seems like a lot of inode internals getting exposed. Can any of this be repurposed into helpers? I see this test scattered around the kernel a fair bit:
$ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l 9
+static inline u32 get_mode_access(const umode_t mode) +{
- switch (mode & S_IFMT) {
- case S_IFLNK:
return LANDLOCK_ACCESS_FS_MAKE_SYM;
- case 0:
/* A zero mode translates to S_IFREG. */
- case S_IFREG:
return LANDLOCK_ACCESS_FS_MAKE_REG;
- case S_IFDIR:
return LANDLOCK_ACCESS_FS_MAKE_DIR;
- case S_IFCHR:
return LANDLOCK_ACCESS_FS_MAKE_CHAR;
- case S_IFBLK:
return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
- case S_IFIFO:
return LANDLOCK_ACCESS_FS_MAKE_FIFO;
- case S_IFSOCK:
return LANDLOCK_ACCESS_FS_MAKE_SOCK;
- default:
WARN_ON_ONCE(1);
return 0;
- }
I'm assuming this won't be reachable from userspace.
[...] index a5d6ef334991..f8e8e980454c 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,17 +11,24 @@ #include "common.h" #include "cred.h" +#include "fs.h" #include "ptrace.h" #include "setup.h" +bool landlock_initialized __lsm_ro_after_init = false;
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct landlock_cred_security),
- .lbs_inode = sizeof(struct landlock_inode_security),
- .lbs_superblock = sizeof(struct landlock_superblock_security),
}; static int __init landlock_init(void) { landlock_add_cred_hooks(); landlock_add_ptrace_hooks();
- landlock_add_fs_hooks();
- landlock_initialized = true;
I think this landlock_initialized is logically separate from the optional DEFINE_LSM "enabled" variable, but I thought I'd double check. :)
It seems like it's used here to avoid releasing superblocks before landlock_init() is called? What is the scenario where that happens?
pr_info("Up and running.\n"); return 0; } diff --git a/security/landlock/setup.h b/security/landlock/setup.h index 9fdbf33fcc33..1daffab1ab4b 100644 --- a/security/landlock/setup.h +++ b/security/landlock/setup.h @@ -11,6 +11,8 @@ #include <linux/lsm_hooks.h> +extern bool landlock_initialized;
extern struct lsm_blob_sizes landlock_blob_sizes;
#endif /* _SECURITY_LANDLOCK_SETUP_H */
2.30.2
The locking and inode semantics are pretty complex, but since, again, it's got significant test and syzkaller coverage, it looks good to me.
With the inode helper cleanup:
Reviewed-by: Kees Cook keescook@chromium.org
On 19/03/2021 19:57, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain. To enable an unprivileged process to express a file hierarchy, it first needs to open a directory (or a file) and pass this file descriptor to the kernel through landlock_add_rule(2). When checking if a file access request is allowed, we walk from the requested dentry to the real root, following the different mount layers. The access to each "tagged" inodes are collected according to their rule layer level, and ANDed to create access to the requested file hierarchy. This makes possible to identify a lot of files without tagging every inodes nor modifying the filesystem, while still following the view and understanding the user has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not keep the same struct inodes for the same inodes whereas these inodes are in use.
This commit adds a minimal set of supported filesystem access-control which doesn't enable to restrict all file-related actions. This is the result of multiple discussions to minimize the code of Landlock to ease review. Thanks to the Landlock design, extending this access-control without breaking user space will not be a problem. Moreover, seccomp filters can be used to restrict the use of syscall families which may not be currently handled by Landlock.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Anton Ivanov anton.ivanov@cambridgegreys.com Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Jeff Dike jdike@addtoit.com Cc: Kees Cook keescook@chromium.org Cc: Richard Weinberger richard@nod.at Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Link: https://lore.kernel.org/r/20210316204252.427806-8-mic@digikod.net [...]
- spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
struct landlock_object *object;
/* Only handles referenced inodes. */
if (!atomic_read(&inode->i_count))
continue;
/*
* Checks I_FREEING and I_WILL_FREE to protect against a race
* condition when release_inode() just called iput(), which
* could lead to a NULL dereference of inode->security or a
* second call to iput() for the same Landlock object. Also
* checks I_NEW because such inode cannot be tied to an object.
*/
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
This (and elsewhere here) seems like a lot of inode internals getting exposed. Can any of this be repurposed into helpers? I see this test scattered around the kernel a fair bit:
$ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l 9
Dealing with the filesystem is complex. Some helpers could probably be added, but with a series dedicated to the filesystem. I can work on that once this series is merged.
+static inline u32 get_mode_access(const umode_t mode) +{
- switch (mode & S_IFMT) {
- case S_IFLNK:
return LANDLOCK_ACCESS_FS_MAKE_SYM;
- case 0:
/* A zero mode translates to S_IFREG. */
- case S_IFREG:
return LANDLOCK_ACCESS_FS_MAKE_REG;
- case S_IFDIR:
return LANDLOCK_ACCESS_FS_MAKE_DIR;
- case S_IFCHR:
return LANDLOCK_ACCESS_FS_MAKE_CHAR;
- case S_IFBLK:
return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
- case S_IFIFO:
return LANDLOCK_ACCESS_FS_MAKE_FIFO;
- case S_IFSOCK:
return LANDLOCK_ACCESS_FS_MAKE_SOCK;
- default:
WARN_ON_ONCE(1);
return 0;
- }
I'm assuming this won't be reachable from userspace.
It should not, only a bogus kernel code could.
[...] index a5d6ef334991..f8e8e980454c 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,17 +11,24 @@ #include "common.h" #include "cred.h" +#include "fs.h" #include "ptrace.h" #include "setup.h" +bool landlock_initialized __lsm_ro_after_init = false;
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct landlock_cred_security),
- .lbs_inode = sizeof(struct landlock_inode_security),
- .lbs_superblock = sizeof(struct landlock_superblock_security),
}; static int __init landlock_init(void) { landlock_add_cred_hooks(); landlock_add_ptrace_hooks();
- landlock_add_fs_hooks();
- landlock_initialized = true;
I think this landlock_initialized is logically separate from the optional DEFINE_LSM "enabled" variable, but I thought I'd double check. :)
An LSM can be marked as enabled (at boot) but not yet initialized.
It seems like it's used here to avoid releasing superblocks before landlock_init() is called? What is the scenario where that happens?
It is a condition for LSM hooks, syscalls and superblock management.
pr_info("Up and running.\n"); return 0; } diff --git a/security/landlock/setup.h b/security/landlock/setup.h index 9fdbf33fcc33..1daffab1ab4b 100644 --- a/security/landlock/setup.h +++ b/security/landlock/setup.h @@ -11,6 +11,8 @@ #include <linux/lsm_hooks.h> +extern bool landlock_initialized;
extern struct lsm_blob_sizes landlock_blob_sizes;
#endif /* _SECURITY_LANDLOCK_SETUP_H */
2.30.2
The locking and inode semantics are pretty complex, but since, again, it's got significant test and syzkaller coverage, it looks good to me.
With the inode helper cleanup:
Reviewed-by: Kees Cook keescook@chromium.org
On 19/03/2021 20:19, Mickaël Salaün wrote:
On 19/03/2021 19:57, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:47PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain. To enable an unprivileged process to express a file hierarchy, it first needs to open a directory (or a file) and pass this file descriptor to the kernel through landlock_add_rule(2). When checking if a file access request is allowed, we walk from the requested dentry to the real root, following the different mount layers. The access to each "tagged" inodes are collected according to their rule layer level, and ANDed to create access to the requested file hierarchy. This makes possible to identify a lot of files without tagging every inodes nor modifying the filesystem, while still following the view and understanding the user has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not keep the same struct inodes for the same inodes whereas these inodes are in use.
This commit adds a minimal set of supported filesystem access-control which doesn't enable to restrict all file-related actions. This is the result of multiple discussions to minimize the code of Landlock to ease review. Thanks to the Landlock design, extending this access-control without breaking user space will not be a problem. Moreover, seccomp filters can be used to restrict the use of syscall families which may not be currently handled by Landlock.
Cc: Al Viro viro@zeniv.linux.org.uk Cc: Anton Ivanov anton.ivanov@cambridgegreys.com Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Jeff Dike jdike@addtoit.com Cc: Kees Cook keescook@chromium.org Cc: Richard Weinberger richard@nod.at Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Link: https://lore.kernel.org/r/20210316204252.427806-8-mic@digikod.net [...]
- spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
struct landlock_object *object;
/* Only handles referenced inodes. */
if (!atomic_read(&inode->i_count))
continue;
/*
* Checks I_FREEING and I_WILL_FREE to protect against a race
* condition when release_inode() just called iput(), which
* could lead to a NULL dereference of inode->security or a
* second call to iput() for the same Landlock object. Also
* checks I_NEW because such inode cannot be tied to an object.
*/
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
This (and elsewhere here) seems like a lot of inode internals getting exposed. Can any of this be repurposed into helpers? I see this test scattered around the kernel a fair bit:
$ git grep I_FREEING | grep I_WILL_FREE | grep I_NEW | wc -l 9
Dealing with the filesystem is complex. Some helpers could probably be added, but with a series dedicated to the filesystem. I can work on that once this series is merged.
+static inline u32 get_mode_access(const umode_t mode) +{
- switch (mode & S_IFMT) {
- case S_IFLNK:
return LANDLOCK_ACCESS_FS_MAKE_SYM;
- case 0:
/* A zero mode translates to S_IFREG. */
- case S_IFREG:
return LANDLOCK_ACCESS_FS_MAKE_REG;
- case S_IFDIR:
return LANDLOCK_ACCESS_FS_MAKE_DIR;
- case S_IFCHR:
return LANDLOCK_ACCESS_FS_MAKE_CHAR;
- case S_IFBLK:
return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
- case S_IFIFO:
return LANDLOCK_ACCESS_FS_MAKE_FIFO;
- case S_IFSOCK:
return LANDLOCK_ACCESS_FS_MAKE_SOCK;
- default:
WARN_ON_ONCE(1);
return 0;
- }
I'm assuming this won't be reachable from userspace.
It should not, only a bogus kernel code could.
[...] index a5d6ef334991..f8e8e980454c 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,17 +11,24 @@ #include "common.h" #include "cred.h" +#include "fs.h" #include "ptrace.h" #include "setup.h" +bool landlock_initialized __lsm_ro_after_init = false;
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct landlock_cred_security),
- .lbs_inode = sizeof(struct landlock_inode_security),
- .lbs_superblock = sizeof(struct landlock_superblock_security),
}; static int __init landlock_init(void) { landlock_add_cred_hooks(); landlock_add_ptrace_hooks();
- landlock_add_fs_hooks();
- landlock_initialized = true;
I think this landlock_initialized is logically separate from the optional DEFINE_LSM "enabled" variable, but I thought I'd double check. :)
An LSM can be marked as enabled (at boot) but not yet initialized.
It seems like it's used here to avoid releasing superblocks before landlock_init() is called? What is the scenario where that happens?
It is a condition for LSM hooks, syscalls and superblock management.
pr_info("Up and running.\n"); return 0; } diff --git a/security/landlock/setup.h b/security/landlock/setup.h index 9fdbf33fcc33..1daffab1ab4b 100644 --- a/security/landlock/setup.h +++ b/security/landlock/setup.h @@ -11,6 +11,8 @@ #include <linux/lsm_hooks.h> +extern bool landlock_initialized;
extern struct lsm_blob_sizes landlock_blob_sizes;
#endif /* _SECURITY_LANDLOCK_SETUP_H */
2.30.2
The locking and inode semantics are pretty complex, but since, again, it's got significant test and syzkaller coverage, it looks good to me.
With the inode helper cleanup:
I think the inode helper would have to be in a separate patch focused on fs/ (like all matches of your greps, except Landlock). Are you OK if I send a patch for that once Landlock is merged?
Reviewed-by: Kees Cook keescook@chromium.org
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain.
[...]
+static void release_inode(struct landlock_object *const object)
__releases(object->lock)
+{
struct inode *const inode = object->underobj;
struct super_block *sb;
if (!inode) {
spin_unlock(&object->lock);
return;
}
/*
* Protects against concurrent use by hook_sb_delete() of the reference
* to the underlying inode.
*/
object->underobj = NULL;
/*
* Makes sure that if the filesystem is concurrently unmounted,
* hook_sb_delete() will wait for us to finish iput().
*/
sb = inode->i_sb;
atomic_long_inc(&landlock_superblock(sb)->inode_refs);
spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
/*
* Now, new rules can safely be tied to @inode with get_inode_object().
*/
iput(inode);
if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
[...]
+static struct landlock_object *get_inode_object(struct inode *const inode) +{
struct landlock_object *object, *new_object;
struct landlock_inode_security *inode_sec = landlock_inode(inode);
rcu_read_lock();
+retry:
object = rcu_dereference(inode_sec->object);
if (object) {
if (likely(refcount_inc_not_zero(&object->usage))) {
rcu_read_unlock();
return object;
}
/*
* We are racing with release_inode(), the object is going
* away. Wait for release_inode(), then retry.
*/
spin_lock(&object->lock);
spin_unlock(&object->lock);
goto retry;
}
rcu_read_unlock();
/*
* If there is no object tied to @inode, then create a new one (without
* holding any locks).
*/
new_object = landlock_create_object(&landlock_fs_underops, inode);
if (IS_ERR(new_object))
return new_object;
/* Protects against concurrent get_inode_object() calls. */
spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
rcu_dereference_protected() requires that inode_sec->object is not concurrently changed, but I think another thread could call get_inode_object() while we're in landlock_create_object(), and then we could race with the NULL write in release_inode() here? (It wouldn't actually be a UAF though because we're not actually accessing `object` here.) Or am I missing a lock that prevents this?
In v28 this wasn't an issue because release_inode() was holding inode->i_lock (and object->lock) during the NULL store; but in v29 and this version the NULL store in release_inode() moved out of the locked region. I think you could just move the NULL store in release_inode() back up (and maybe add a comment explaining the locking rules for landlock_inode(...)->object)?
(Or alternatively you could use rcu_dereference_raw() with a comment explaining that the read pointer is only used to check for NULL-ness, and that it is guaranteed that the pointer can't change if it is NULL and we're holding the lock. But that'd be needlessly complicated, I think.)
if (unlikely(object)) {
/* Someone else just created the object, bail out and retry. */
spin_unlock(&inode->i_lock);
kfree(new_object);
rcu_read_lock();
goto retry;
}
rcu_assign_pointer(inode_sec->object, new_object);
/*
* @inode will be released by hook_sb_delete() on its superblock
* shutdown.
*/
ihold(inode);
spin_unlock(&inode->i_lock);
return new_object;
+}
On 23/03/2021 01:13, Jann Horn wrote:
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain.
[...]
+static void release_inode(struct landlock_object *const object)
__releases(object->lock)
+{
struct inode *const inode = object->underobj;
struct super_block *sb;
if (!inode) {
spin_unlock(&object->lock);
return;
}
/*
* Protects against concurrent use by hook_sb_delete() of the reference
* to the underlying inode.
*/
object->underobj = NULL;
/*
* Makes sure that if the filesystem is concurrently unmounted,
* hook_sb_delete() will wait for us to finish iput().
*/
sb = inode->i_sb;
atomic_long_inc(&landlock_superblock(sb)->inode_refs);
spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
/*
* Now, new rules can safely be tied to @inode with get_inode_object().
*/
iput(inode);
if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
[...]
+static struct landlock_object *get_inode_object(struct inode *const inode) +{
struct landlock_object *object, *new_object;
struct landlock_inode_security *inode_sec = landlock_inode(inode);
rcu_read_lock();
+retry:
object = rcu_dereference(inode_sec->object);
if (object) {
if (likely(refcount_inc_not_zero(&object->usage))) {
rcu_read_unlock();
return object;
}
/*
* We are racing with release_inode(), the object is going
* away. Wait for release_inode(), then retry.
*/
spin_lock(&object->lock);
spin_unlock(&object->lock);
goto retry;
}
rcu_read_unlock();
/*
* If there is no object tied to @inode, then create a new one (without
* holding any locks).
*/
new_object = landlock_create_object(&landlock_fs_underops, inode);
if (IS_ERR(new_object))
return new_object;
/* Protects against concurrent get_inode_object() calls. */
spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
rcu_dereference_protected() requires that inode_sec->object is not concurrently changed, but I think another thread could call get_inode_object() while we're in landlock_create_object(), and then we could race with the NULL write in release_inode() here? (It wouldn't actually be a UAF though because we're not actually accessing `object` here.) Or am I missing a lock that prevents this?
In v28 this wasn't an issue because release_inode() was holding inode->i_lock (and object->lock) during the NULL store; but in v29 and this version the NULL store in release_inode() moved out of the locked region. I think you could just move the NULL store in release_inode() back up (and maybe add a comment explaining the locking rules for landlock_inode(...)->object)?
(Or alternatively you could use rcu_dereference_raw() with a comment explaining that the read pointer is only used to check for NULL-ness, and that it is guaranteed that the pointer can't change if it is NULL and we're holding the lock. But that'd be needlessly complicated, I think.)
To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in release_inode() or in hook_sb_delete(), the landlock_inode(inode)->object need to be non-NULL, which implies that a call to get_inode_object(inode) either "retry" (because release_inode is only called by landlock_put_object, which set object->usage to 0) until it creates a new object, or reuses the existing referenced object (and increments object->usage). The worse case would be if get_inode_object(inode) is called just before the rcu_assign_pointer(landlock_inode(inode)->object, NULL) from hook_sb_delete(), which would result in an object with a NULL underobj, which is the expected behavior (and checked by release_inode).
The line rcu_assign_pointer(inode_sec->object, new_object) from get_inode_object() can only be reached if the underlying inode doesn't reference an object, in which case hook_sb_delete() will not reach the rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this same inode.
This works because get_inode_object(inode) is mutually exclusive to itself with the same inode (i.e. an inode can only point to an object that references this same inode).
I tried to explain this with the comment "Protects against concurrent get_inode_object() calls" in get_inode_object(), and the comments just before both rcu_assign_pointer(landlock_inode(inode)->object, NULL).
if (unlikely(object)) {
/* Someone else just created the object, bail out and retry. */
spin_unlock(&inode->i_lock);
kfree(new_object);
rcu_read_lock();
goto retry;
}
rcu_assign_pointer(inode_sec->object, new_object);
/*
* @inode will be released by hook_sb_delete() on its superblock
* shutdown.
*/
ihold(inode);
spin_unlock(&inode->i_lock);
return new_object;
+}
On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün mic@digikod.net wrote:
On 23/03/2021 01:13, Jann Horn wrote:
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain.
[...]
+static void release_inode(struct landlock_object *const object)
__releases(object->lock)
+{
struct inode *const inode = object->underobj;
struct super_block *sb;
if (!inode) {
spin_unlock(&object->lock);
return;
}
/*
* Protects against concurrent use by hook_sb_delete() of the reference
* to the underlying inode.
*/
object->underobj = NULL;
/*
* Makes sure that if the filesystem is concurrently unmounted,
* hook_sb_delete() will wait for us to finish iput().
*/
sb = inode->i_sb;
atomic_long_inc(&landlock_superblock(sb)->inode_refs);
spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
/*
* Now, new rules can safely be tied to @inode with get_inode_object().
*/
iput(inode);
if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
[...]
+static struct landlock_object *get_inode_object(struct inode *const inode) +{
struct landlock_object *object, *new_object;
struct landlock_inode_security *inode_sec = landlock_inode(inode);
rcu_read_lock();
+retry:
object = rcu_dereference(inode_sec->object);
if (object) {
if (likely(refcount_inc_not_zero(&object->usage))) {
rcu_read_unlock();
return object;
}
/*
* We are racing with release_inode(), the object is going
* away. Wait for release_inode(), then retry.
*/
spin_lock(&object->lock);
spin_unlock(&object->lock);
goto retry;
}
rcu_read_unlock();
/*
* If there is no object tied to @inode, then create a new one (without
* holding any locks).
*/
new_object = landlock_create_object(&landlock_fs_underops, inode);
if (IS_ERR(new_object))
return new_object;
/* Protects against concurrent get_inode_object() calls. */
spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
rcu_dereference_protected() requires that inode_sec->object is not concurrently changed, but I think another thread could call get_inode_object() while we're in landlock_create_object(), and then we could race with the NULL write in release_inode() here? (It wouldn't actually be a UAF though because we're not actually accessing `object` here.) Or am I missing a lock that prevents this?
In v28 this wasn't an issue because release_inode() was holding inode->i_lock (and object->lock) during the NULL store; but in v29 and this version the NULL store in release_inode() moved out of the locked region. I think you could just move the NULL store in release_inode() back up (and maybe add a comment explaining the locking rules for landlock_inode(...)->object)?
(Or alternatively you could use rcu_dereference_raw() with a comment explaining that the read pointer is only used to check for NULL-ness, and that it is guaranteed that the pointer can't change if it is NULL and we're holding the lock. But that'd be needlessly complicated, I think.)
To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in release_inode() or in hook_sb_delete(), the landlock_inode(inode)->object need to be non-NULL,
Yes.
which implies that a call to get_inode_object(inode) either "retry" (because release_inode is only called by landlock_put_object, which set object->usage to 0) until it creates a new object, or reuses the existing referenced object (and increments object->usage).
But it can be that landlock_inode(inode)->object only becomes non-NULL after get_inode_object() has checked rcu_dereference(inode_sec->object).
The worse case would be if get_inode_object(inode) is called just before the rcu_assign_pointer(landlock_inode(inode)->object, NULL) from hook_sb_delete(), which would result in an object with a NULL underobj, which is the expected behavior (and checked by release_inode).
The scenario I'm talking about doesn't involve hook_sb_delete().
The line rcu_assign_pointer(inode_sec->object, new_object) from get_inode_object() can only be reached if the underlying inode doesn't reference an object,
Yes.
in which case hook_sb_delete() will not reach the rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this same inode.
This works because get_inode_object(inode) is mutually exclusive to itself with the same inode (i.e. an inode can only point to an object that references this same inode).
To clarify: You can concurrently call get_inode_object() multiple times on the same inode, right? There are no locks held on entry to that function.
I tried to explain this with the comment "Protects against concurrent get_inode_object() calls" in get_inode_object(), and the comments just before both rcu_assign_pointer(landlock_inode(inode)->object, NULL).
The scenario I'm talking about is:
Initially the inode does not have an associated landlock_object. There are two threads A and B. Thread A is going to execute get_inode_object(). Thread B is going to execute get_inode_object() followed immediately by landlock_put_object().
thread A: enters get_inode_object() thread A: rcu_dereference(inode_sec->object) returns NULL thread A: enters landlock_create_object() thread B: enters get_inode_object() thread B: rcu_dereference(inode_sec->object) returns NULL thread B: calls landlock_create_object() thread B: sets inode_sec->object while holding inode->i_lock thread B: leaves get_inode_object() thread B: enters landlock_put_object() thread B: object->usage drops to 0, object->lock is taken thread B: calls release_inode() thread B: drops object->lock thread A: returns from landlock_create_object() thread A: takes inode->i_lock
At this point, thread B will run:
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
while thread A runs:
rcu_dereference_protected(inode_sec->object, lockdep_is_held(&inode->i_lock));
meaning there is a (theoretical) data race, since rcu_dereference_protected() doesn't use READ_ONCE().
if (unlikely(object)) {
/* Someone else just created the object, bail out and retry. */
spin_unlock(&inode->i_lock);
kfree(new_object);
rcu_read_lock();
goto retry;
}
rcu_assign_pointer(inode_sec->object, new_object);
/*
* @inode will be released by hook_sb_delete() on its superblock
* shutdown.
*/
ihold(inode);
spin_unlock(&inode->i_lock);
return new_object;
+}
On 23/03/2021 18:49, Jann Horn wrote:
On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün mic@digikod.net wrote:
On 23/03/2021 01:13, Jann Horn wrote:
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain.
[...]
+static void release_inode(struct landlock_object *const object)
__releases(object->lock)
+{
struct inode *const inode = object->underobj;
struct super_block *sb;
if (!inode) {
spin_unlock(&object->lock);
return;
}
/*
* Protects against concurrent use by hook_sb_delete() of the reference
* to the underlying inode.
*/
object->underobj = NULL;
/*
* Makes sure that if the filesystem is concurrently unmounted,
* hook_sb_delete() will wait for us to finish iput().
*/
sb = inode->i_sb;
atomic_long_inc(&landlock_superblock(sb)->inode_refs);
spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
/*
* Now, new rules can safely be tied to @inode with get_inode_object().
*/
iput(inode);
if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
[...]
+static struct landlock_object *get_inode_object(struct inode *const inode) +{
struct landlock_object *object, *new_object;
struct landlock_inode_security *inode_sec = landlock_inode(inode);
rcu_read_lock();
+retry:
object = rcu_dereference(inode_sec->object);
if (object) {
if (likely(refcount_inc_not_zero(&object->usage))) {
rcu_read_unlock();
return object;
}
/*
* We are racing with release_inode(), the object is going
* away. Wait for release_inode(), then retry.
*/
spin_lock(&object->lock);
spin_unlock(&object->lock);
goto retry;
}
rcu_read_unlock();
/*
* If there is no object tied to @inode, then create a new one (without
* holding any locks).
*/
new_object = landlock_create_object(&landlock_fs_underops, inode);
if (IS_ERR(new_object))
return new_object;
/* Protects against concurrent get_inode_object() calls. */
spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
rcu_dereference_protected() requires that inode_sec->object is not concurrently changed, but I think another thread could call get_inode_object() while we're in landlock_create_object(), and then we could race with the NULL write in release_inode() here? (It wouldn't actually be a UAF though because we're not actually accessing `object` here.) Or am I missing a lock that prevents this?
In v28 this wasn't an issue because release_inode() was holding inode->i_lock (and object->lock) during the NULL store; but in v29 and this version the NULL store in release_inode() moved out of the locked region. I think you could just move the NULL store in release_inode() back up (and maybe add a comment explaining the locking rules for landlock_inode(...)->object)?
(Or alternatively you could use rcu_dereference_raw() with a comment explaining that the read pointer is only used to check for NULL-ness, and that it is guaranteed that the pointer can't change if it is NULL and we're holding the lock. But that'd be needlessly complicated, I think.)
To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in release_inode() or in hook_sb_delete(), the landlock_inode(inode)->object need to be non-NULL,
Yes.
which implies that a call to get_inode_object(inode) either "retry" (because release_inode is only called by landlock_put_object, which set object->usage to 0) until it creates a new object, or reuses the existing referenced object (and increments object->usage).
But it can be that landlock_inode(inode)->object only becomes non-NULL after get_inode_object() has checked rcu_dereference(inode_sec->object).
The worse case would be if get_inode_object(inode) is called just before the rcu_assign_pointer(landlock_inode(inode)->object, NULL) from hook_sb_delete(), which would result in an object with a NULL underobj, which is the expected behavior (and checked by release_inode).
The scenario I'm talking about doesn't involve hook_sb_delete().
The line rcu_assign_pointer(inode_sec->object, new_object) from get_inode_object() can only be reached if the underlying inode doesn't reference an object,
Yes.
in which case hook_sb_delete() will not reach the rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this same inode.
This works because get_inode_object(inode) is mutually exclusive to itself with the same inode (i.e. an inode can only point to an object that references this same inode).
To clarify: You can concurrently call get_inode_object() multiple times on the same inode, right? There are no locks held on entry to that function.
I tried to explain this with the comment "Protects against concurrent get_inode_object() calls" in get_inode_object(), and the comments just before both rcu_assign_pointer(landlock_inode(inode)->object, NULL).
The scenario I'm talking about is:
Initially the inode does not have an associated landlock_object. There are two threads A and B. Thread A is going to execute get_inode_object(). Thread B is going to execute get_inode_object() followed immediately by landlock_put_object().
thread A: enters get_inode_object() thread A: rcu_dereference(inode_sec->object) returns NULL thread A: enters landlock_create_object() thread B: enters get_inode_object() thread B: rcu_dereference(inode_sec->object) returns NULL thread B: calls landlock_create_object() thread B: sets inode_sec->object while holding inode->i_lock thread B: leaves get_inode_object() thread B: enters landlock_put_object() thread B: object->usage drops to 0, object->lock is taken thread B: calls release_inode() thread B: drops object->lock thread A: returns from landlock_create_object() thread A: takes inode->i_lock
At this point, thread B will run:
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
while thread A runs:
rcu_dereference_protected(inode_sec->object, lockdep_is_held(&inode->i_lock));
meaning there is a (theoretical) data race, since rcu_dereference_protected() doesn't use READ_ONCE().
Hum, I see, that is what I was missing. And that explain why there is (in practice) no impact on winning the race.
I would prefer to use rcu_access_pointer() instead of rcu_dereference_protected() to avoid pitfall, and it reflects what I was expecting:
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -117,9 +117,7 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
/* Protects against concurrent get_inode_object() calls. */ spin_lock(&inode->i_lock); - object = rcu_dereference_protected(inode_sec->object, - lockdep_is_held(&inode->i_lock)); - if (unlikely(object)) { + if (unlikely(rcu_access_pointer(inode_sec->object))) { /* Someone else just created the object, bail out and retry. */ spin_unlock(&inode->i_lock); kfree(new_object);
But I'm not sure about your proposition to move the NULL store in release_inode() back up. Do you mean to add back the inode lock in release_inode() like this?
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -59,16 +59,12 @@ static void release_inode(struct landlock_object *const object) * Makes sure that if the filesystem is concurrently unmounted, * hook_sb_delete() will wait for us to finish iput(). */ + spin_lock(&inode->i_lock); sb = inode->i_sb; atomic_long_inc(&landlock_superblock(sb)->inode_refs); spin_unlock(&object->lock); - /* - * Because object->underobj was not NULL, hook_sb_delete() and - * get_inode_object() guarantee that it is safe to reset - * landlock_inode(inode)->object while it is not NULL. It is therefore - * not necessary to lock inode->i_lock. - */ rcu_assign_pointer(landlock_inode(inode)->object, NULL); + spin_unlock(&inode->i_lock); /* * Now, new rules can safely be tied to @inode with get_inode_object(). */
I would prefer to avoid nested locks if it is not necessary though.
if (unlikely(object)) {
/* Someone else just created the object, bail out and retry. */
spin_unlock(&inode->i_lock);
kfree(new_object);
rcu_read_lock();
goto retry;
}
rcu_assign_pointer(inode_sec->object, new_object);
/*
* @inode will be released by hook_sb_delete() on its superblock
* shutdown.
*/
ihold(inode);
spin_unlock(&inode->i_lock);
return new_object;
+}
On Tue, Mar 23, 2021 at 8:22 PM Mickaël Salaün mic@digikod.net wrote:
On 23/03/2021 18:49, Jann Horn wrote:
On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün mic@digikod.net wrote:
On 23/03/2021 01:13, Jann Horn wrote:
On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün mic@digikod.net wrote:
Using Landlock objects and ruleset, it is possible to tag inodes according to a process's domain.
[...]
+static void release_inode(struct landlock_object *const object)
__releases(object->lock)
+{
struct inode *const inode = object->underobj;
struct super_block *sb;
if (!inode) {
spin_unlock(&object->lock);
return;
}
/*
* Protects against concurrent use by hook_sb_delete() of the reference
* to the underlying inode.
*/
object->underobj = NULL;
/*
* Makes sure that if the filesystem is concurrently unmounted,
* hook_sb_delete() will wait for us to finish iput().
*/
sb = inode->i_sb;
atomic_long_inc(&landlock_superblock(sb)->inode_refs);
spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
/*
* Now, new rules can safely be tied to @inode with get_inode_object().
*/
iput(inode);
if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
[...]
+static struct landlock_object *get_inode_object(struct inode *const inode) +{
struct landlock_object *object, *new_object;
struct landlock_inode_security *inode_sec = landlock_inode(inode);
rcu_read_lock();
+retry:
object = rcu_dereference(inode_sec->object);
if (object) {
if (likely(refcount_inc_not_zero(&object->usage))) {
rcu_read_unlock();
return object;
}
/*
* We are racing with release_inode(), the object is going
* away. Wait for release_inode(), then retry.
*/
spin_lock(&object->lock);
spin_unlock(&object->lock);
goto retry;
}
rcu_read_unlock();
/*
* If there is no object tied to @inode, then create a new one (without
* holding any locks).
*/
new_object = landlock_create_object(&landlock_fs_underops, inode);
if (IS_ERR(new_object))
return new_object;
/* Protects against concurrent get_inode_object() calls. */
spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
rcu_dereference_protected() requires that inode_sec->object is not concurrently changed, but I think another thread could call get_inode_object() while we're in landlock_create_object(), and then we could race with the NULL write in release_inode() here? (It wouldn't actually be a UAF though because we're not actually accessing `object` here.) Or am I missing a lock that prevents this?
In v28 this wasn't an issue because release_inode() was holding inode->i_lock (and object->lock) during the NULL store; but in v29 and this version the NULL store in release_inode() moved out of the locked region. I think you could just move the NULL store in release_inode() back up (and maybe add a comment explaining the locking rules for landlock_inode(...)->object)?
(Or alternatively you could use rcu_dereference_raw() with a comment explaining that the read pointer is only used to check for NULL-ness, and that it is guaranteed that the pointer can't change if it is NULL and we're holding the lock. But that'd be needlessly complicated, I think.)
To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in release_inode() or in hook_sb_delete(), the landlock_inode(inode)->object need to be non-NULL,
Yes.
which implies that a call to get_inode_object(inode) either "retry" (because release_inode is only called by landlock_put_object, which set object->usage to 0) until it creates a new object, or reuses the existing referenced object (and increments object->usage).
But it can be that landlock_inode(inode)->object only becomes non-NULL after get_inode_object() has checked rcu_dereference(inode_sec->object).
The worse case would be if get_inode_object(inode) is called just before the rcu_assign_pointer(landlock_inode(inode)->object, NULL) from hook_sb_delete(), which would result in an object with a NULL underobj, which is the expected behavior (and checked by release_inode).
The scenario I'm talking about doesn't involve hook_sb_delete().
The line rcu_assign_pointer(inode_sec->object, new_object) from get_inode_object() can only be reached if the underlying inode doesn't reference an object,
Yes.
in which case hook_sb_delete() will not reach the rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this same inode.
This works because get_inode_object(inode) is mutually exclusive to itself with the same inode (i.e. an inode can only point to an object that references this same inode).
To clarify: You can concurrently call get_inode_object() multiple times on the same inode, right? There are no locks held on entry to that function.
I tried to explain this with the comment "Protects against concurrent get_inode_object() calls" in get_inode_object(), and the comments just before both rcu_assign_pointer(landlock_inode(inode)->object, NULL).
The scenario I'm talking about is:
Initially the inode does not have an associated landlock_object. There are two threads A and B. Thread A is going to execute get_inode_object(). Thread B is going to execute get_inode_object() followed immediately by landlock_put_object().
thread A: enters get_inode_object() thread A: rcu_dereference(inode_sec->object) returns NULL thread A: enters landlock_create_object() thread B: enters get_inode_object() thread B: rcu_dereference(inode_sec->object) returns NULL thread B: calls landlock_create_object() thread B: sets inode_sec->object while holding inode->i_lock thread B: leaves get_inode_object() thread B: enters landlock_put_object() thread B: object->usage drops to 0, object->lock is taken thread B: calls release_inode() thread B: drops object->lock thread A: returns from landlock_create_object() thread A: takes inode->i_lock
At this point, thread B will run:
rcu_assign_pointer(landlock_inode(inode)->object, NULL);
while thread A runs:
rcu_dereference_protected(inode_sec->object, lockdep_is_held(&inode->i_lock));
meaning there is a (theoretical) data race, since rcu_dereference_protected() doesn't use READ_ONCE().
Hum, I see, that is what I was missing. And that explain why there is (in practice) no impact on winning the race.
I would prefer to use rcu_access_pointer() instead of rcu_dereference_protected() to avoid pitfall, and it reflects what I was expecting:
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -117,9 +117,7 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
/* Protects against concurrent get_inode_object() calls. */ spin_lock(&inode->i_lock);
object = rcu_dereference_protected(inode_sec->object,
lockdep_is_held(&inode->i_lock));
if (unlikely(object)) {
if (unlikely(rcu_access_pointer(inode_sec->object))) { /* Someone else just created the object, bail out and
retry. */ spin_unlock(&inode->i_lock); kfree(new_object);
Ah, yeah, that should work. I had forgotten about rcu_access_pointer().
But I'm not sure about your proposition to move the NULL store in release_inode() back up. Do you mean to add back the inode lock in release_inode() like this?
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -59,16 +59,12 @@ static void release_inode(struct landlock_object *const object) * Makes sure that if the filesystem is concurrently unmounted, * hook_sb_delete() will wait for us to finish iput(). */
spin_lock(&inode->i_lock); sb = inode->i_sb; atomic_long_inc(&landlock_superblock(sb)->inode_refs); spin_unlock(&object->lock);
/*
* Because object->underobj was not NULL, hook_sb_delete() and
* get_inode_object() guarantee that it is safe to reset
* landlock_inode(inode)->object while it is not NULL. It is therefore
* not necessary to lock inode->i_lock.
*/ rcu_assign_pointer(landlock_inode(inode)->object, NULL);
spin_unlock(&inode->i_lock); /* * Now, new rules can safely be tied to @inode with get_inode_object(). */
I would prefer to avoid nested locks if it is not necessary though.
Hm, yeah, you have a point there.
Doing it locklessly does make the locking rules a little complicated though, and you'll have to update the comment inside struct landlock_inode_security. At the moment, it says:
* @object: Weak pointer to an allocated object. All writes (i.e. * creating a new object or removing one) are protected by the * underlying inode->i_lock. Disassociating @object from the inode is * additionally protected by @object->lock, from the time @object's * usage refcount drops to zero to the time this pointer is nulled out.
which isn't true anymore.
From: Mickaël Salaün mic@linux.microsoft.com
These 3 system calls are designed to be used by unprivileged processes to sandbox themselves: * landlock_create_ruleset(2): Creates a ruleset and returns its file descriptor. * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor. * landlock_restrict_self(2): Enforces a ruleset on the calling thread and its future children (similar to seccomp). This syscall has the same usage restrictions as seccomp(2): the caller must have the no_new_privs attribute set or have CAP_SYS_ADMIN in the current user namespace.
All these syscalls have a "flags" argument (not currently used) to enable extensibility.
Here are the motivations for these new syscalls: * A sandboxed process may not have access to file systems, including /dev, /sys or /proc, but it should still be able to add more restrictions to itself. * Neither prctl(2) nor seccomp(2) (which was used in a previous version) fit well with the current definition of a Landlock security policy.
All passed structs (attributes) are checked at build time to ensure that they don't contain holes and that they are aligned the same way for each architecture.
See the user and kernel documentation for more details (provided by a following commit): * Documentation/userspace-api/landlock.rst * Documentation/security/landlock.rst
Cc: Arnd Bergmann arnd@arndb.de Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Acked-by: Serge Hallyn serge@hallyn.com Link: https://lore.kernel.org/r/20210316204252.427806-9-mic@digikod.net ---
Changes since v29: * Rebase on v5.12-rc3 and fix trivial conflict with mount_setattr(2).
Changes since v28: * Add Acked-by Serge Hallyn. * Add extra check to be sure to not include ruleset FDs in a dependency of a ruleset (e.g. a path_beneath rule). This case was already forbiden thanks to FMODE_PATH and other flags check, but it makes it explicit now.
Changes since v27: * Forbid creation of rules with an empty allowed_access value because they are now ignored (since v26) in path walks. * Rename landlock_enforce_ruleset_self(2) to landlock_restrict_self(2): shorter and consistent with the two other syscalls (i.e. verb + direct object). * Update ruleset access check according to the new access stack. * Improve landlock_add_rule(2) documentation. * Fix comment. * Remove Reviewed-by Jann Horn because of the above changes.
Changes since v26: * Rename landlock_enforce_ruleset_current(2) to landlock_enforce_ruleset_self(2). "current" makes sense for a kernel developer, but much less from a user space developer stand point. "self" is widely used to refer to the current task (e.g. /proc/self). "current" may refer to temporal properties, which could be added later to this syscall flags (cf. /proc/self/attr/{current,exec}). * Simplify build_check_abi(). * Rename syscall.c to syscalls.c . * Use less ambiguous comments. * Fix spelling.
Changes since v25: * Revert build_check_abi() as non-inline to trigger a warning if it is not called. * Use the new limit names.
Changes since v24: * Add Reviewed-by: Jann Horn jannh@google.com * Set build_check_abi() as inline.
Changes since v23: * Rewrite get_ruleset_from_fd() to please the 0-DAY CI Kernel Test Service that reported an uninitialized variable (false positive): https://lore.kernel.org/linux-security-module/202011101854.zGbWwusK-lkp@inte... Anyway, it is cleaner like this. * Add a comment about E2BIG which can be returned by landlock_enforce_ruleset_current(2) when there is no more room for another stacked ruleset (i.e. domain).
Changes since v22: * Replace security_capable() with ns_capable_noaudit() (suggested by Jann Horn) and explicitly return EPERM. * Fix landlock_enforce_ruleset_current(2)'s out_put_creds (spotted by Jann Horn). * Add __always_inline to copy_min_struct_from_user() to make its BUILD_BUG_ON() checks reliable (suggested by Jann Horn). * Simplify path assignation in get_path_from_fd() (suggested by Jann Horn). * Fix spelling (spotted by Jann Horn).
Changes since v21: * Fix and improve comments.
Changes since v20: * Remove two arguments to landlock_enforce_ruleset(2) (requested by Arnd Bergmann) and rename it to landlock_enforce_ruleset_current(2): remove the enum landlock_target_type and the target file descriptor (not used for now). A ruleset can only be enforced on the current thread. * Remove the size argument in landlock_add_rule() (requested by Arnd Bergmann). * Remove landlock_get_features(2) (suggested by Arnd Bergmann). * Simplify and rename copy_struct_if_any_from_user() to copy_min_struct_from_user(). * Rename "options" to "flags" to allign with current syscalls. * Rename some types and variables in a more consistent way. * Fix missing type declarations in syscalls.h .
Changes since v19: * Replace the landlock(2) syscall with 4 syscalls (one for each command): landlock_get_features(2), landlock_create_ruleset(2), landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by Arnd Bergmann). https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23b4a@digikod.ne... * Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled. * Add two new fields to landlock_attr_features to fit with the new syscalls: last_rule_type and last_target_type. This enable to easily identify which types are supported. * Pack landlock_attr_path_beneath struct because of the removed ruleset_fd. * Update documentation and fix spelling.
Changes since v18: * Remove useless include. * Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").
Changes since v17: * Synchronize syscall declaration. * Fix comment.
Changes since v16: * Add a size_attr_features field to struct landlock_attr_features for self-introspection, and move the access_fs field to be more consistent. * Replace __aligned_u64 types of attribute fields with __u16, __s32, __u32 and __u64, and check at build time that these structures does not contain hole and that they are aligned the same way (8-bits) on all architectures. This shrinks the size of the userspace ABI, which may be appreciated especially for struct landlock_attr_features which could grow a lot in the future. For instance, struct landlock_attr_features shrinks from 72 bytes to 32 bytes. This change also enables to remove 64-bits to 32-bits conversion checks. * Switch syscall attribute pointer and size arguments to follow similar syscall argument order (e.g. bpf, clone3, openat2). * Set LANDLOCK_OPT_* types to 32-bits. * Allow enforcement of empty ruleset, which enables deny-all policies. * Fix documentation inconsistency.
Changes since v15: * Do not add file descriptors referring to internal filesystems (e.g. nsfs) in a ruleset. * Replace is_user_mountable() with in-place clean checks. * Replace EBADR with EBADFD in get_ruleset_from_fd() and get_path_from_fd(). * Remove ruleset's show_fdinfo() for now.
Changes since v14: * Remove the security_file_open() check in get_path_from_fd(): an opened FD should not be restricted here, and even less with this hook. As a result, it is now allowed to add a path to a ruleset even if the access to this path is not allowed (without O_PATH). This doesn't change the fact that enforcing a ruleset can't grant any right, only remove some rights. The new layer levels add more consistent restrictions. * Check minimal landlock_attr_* size/content. This fix the case when no data was provided and e.g., FD 0 was interpreted as ruleset_fd. Now this leads to a returned -EINVAL. * Fix credential double-free error case. * Complete struct landlock_attr_size with size_attr_enforce. * Fix undefined reference to syscall when Landlock is not selected. * Remove f.file->f_path.mnt check (suggested by Al Viro). * Add build-time checks. * Move ABI checks from fs.c . * Constify variables. * Fix spelling. * Add comments.
Changes since v13: * New implementation, replacing the dependency on seccomp(2) and bpf(2). --- include/linux/syscalls.h | 7 + include/uapi/linux/landlock.h | 53 ++++ kernel/sys_ni.c | 5 + security/landlock/Makefile | 2 +- security/landlock/syscalls.c | 445 ++++++++++++++++++++++++++++++++++ 5 files changed, 511 insertions(+), 1 deletion(-) create mode 100644 security/landlock/syscalls.c
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2839dc9a7c01..fa3971012e1c 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -69,6 +69,8 @@ struct io_uring_params; struct clone_args; struct open_how; struct mount_attr; +struct landlock_ruleset_attr; +enum landlock_rule_type;
#include <linux/types.h> #include <linux/aio_abi.h> @@ -1041,6 +1043,11 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr __user *attr, + size_t size, __u32 flags); +asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type, + const void __user *rule_attr, __u32 flags); +asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
/* * Architecture-specific system calls diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index f69877099c8e..d1fc6af3381e 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -9,6 +9,59 @@ #ifndef _UAPI_LINUX_LANDLOCK_H #define _UAPI_LINUX_LANDLOCK_H
+#include <linux/types.h> + +/** + * struct landlock_ruleset_attr - Ruleset definition + * + * Argument of sys_landlock_create_ruleset(). This structure can grow in + * future versions. + */ +struct landlock_ruleset_attr { + /** + * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_) + * that is handled by this ruleset and should then be forbidden if no + * rule explicitly allow them. This is needed for backward + * compatibility reasons. + */ + __u64 handled_access_fs; +}; + +/** + * enum landlock_rule_type - Landlock rule type + * + * Argument of sys_landlock_add_rule(). + */ +enum landlock_rule_type { + /** + * @LANDLOCK_RULE_PATH_BENEATH: Type of a &struct + * landlock_path_beneath_attr . + */ + LANDLOCK_RULE_PATH_BENEATH = 1, +}; + +/** + * struct landlock_path_beneath_attr - Path hierarchy definition + * + * Argument of sys_landlock_add_rule(). + */ +struct landlock_path_beneath_attr { + /** + * @allowed_access: Bitmask of allowed actions for this file hierarchy + * (cf. `Filesystem flags`_). + */ + __u64 allowed_access; + /** + * @parent_fd: File descriptor, open with ``O_PATH``, which identifies + * the parent directory of a file hierarchy, or just a file. + */ + __s32 parent_fd; + /* + * This struct is packed to avoid trailing reserved members. + * Cf. security/landlock/syscalls.c:build_check_abi() + */ +} __attribute__((packed)); + /** * DOC: fs_access * diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 19aa806890d5..cce430cf2ff2 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -266,6 +266,11 @@ COND_SYSCALL(request_key); COND_SYSCALL(keyctl); COND_SYSCALL_COMPAT(keyctl);
+/* security/landlock/syscalls.c */ +COND_SYSCALL(landlock_create_ruleset); +COND_SYSCALL(landlock_add_rule); +COND_SYSCALL(landlock_restrict_self); + /* arch/example/kernel/sys_example.c */
/* mm/fadvise.c */ diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 92e3d80ab8ed..7bbd2f413b3e 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := setup.o object.o ruleset.o \ +landlock-y := setup.o syscalls.o object.o ruleset.o \ cred.o ptrace.o fs.o diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c new file mode 100644 index 000000000000..48d0073e3ffd --- /dev/null +++ b/security/landlock/syscalls.c @@ -0,0 +1,445 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - System call implementations and user space interfaces + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <asm/current.h> +#include <linux/anon_inodes.h> +#include <linux/build_bug.h> +#include <linux/capability.h> +#include <linux/compiler_types.h> +#include <linux/dcache.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/limits.h> +#include <linux/mount.h> +#include <linux/path.h> +#include <linux/sched.h> +#include <linux/security.h> +#include <linux/stddef.h> +#include <linux/syscalls.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <uapi/linux/landlock.h> + +#include "cred.h" +#include "fs.h" +#include "limits.h" +#include "ruleset.h" +#include "setup.h" + +/** + * copy_min_struct_from_user - Safe future-proof argument copying + * + * Extend copy_struct_from_user() to check for consistent user buffer. + * + * @dst: Kernel space pointer or NULL. + * @ksize: Actual size of the data pointed to by @dst. + * @ksize_min: Minimal required size to be copied. + * @src: User space pointer or NULL. + * @usize: (Alleged) size of the data pointed to by @src. + */ +static __always_inline int copy_min_struct_from_user(void *const dst, + const size_t ksize, const size_t ksize_min, + const void __user *const src, const size_t usize) +{ + /* Checks buffer inconsistencies. */ + BUILD_BUG_ON(!dst); + if (!src) + return -EFAULT; + + /* Checks size ranges. */ + BUILD_BUG_ON(ksize <= 0); + BUILD_BUG_ON(ksize < ksize_min); + if (usize < ksize_min) + return -EINVAL; + if (usize > PAGE_SIZE) + return -E2BIG; + + /* Copies user buffer and fills with zeros. */ + return copy_struct_from_user(dst, ksize, src, usize); +} + +/* + * This function only contains arithmetic operations with constants, leading to + * BUILD_BUG_ON(). The related code is evaluated and checked at build time, + * but it is then ignored thanks to compiler optimizations. + */ +static void build_check_abi(void) +{ + struct landlock_ruleset_attr ruleset_attr; + struct landlock_path_beneath_attr path_beneath_attr; + size_t ruleset_size, path_beneath_size; + + /* + * For each user space ABI structures, first checks that there is no + * hole in them, then checks that all architectures have the same + * struct size. + */ + ruleset_size = sizeof(ruleset_attr.handled_access_fs); + BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size); + BUILD_BUG_ON(sizeof(ruleset_attr) != 8); + + path_beneath_size = sizeof(path_beneath_attr.allowed_access); + path_beneath_size += sizeof(path_beneath_attr.parent_fd); + BUILD_BUG_ON(sizeof(path_beneath_attr) != path_beneath_size); + BUILD_BUG_ON(sizeof(path_beneath_attr) != 12); +} + +/* Ruleset handling */ + +static int fop_ruleset_release(struct inode *const inode, + struct file *const filp) +{ + struct landlock_ruleset *ruleset = filp->private_data; + + landlock_put_ruleset(ruleset); + return 0; +} + +static ssize_t fop_dummy_read(struct file *const filp, char __user *const buf, + const size_t size, loff_t *const ppos) +{ + /* Dummy handler to enable FMODE_CAN_READ. */ + return -EINVAL; +} + +static ssize_t fop_dummy_write(struct file *const filp, + const char __user *const buf, const size_t size, + loff_t *const ppos) +{ + /* Dummy handler to enable FMODE_CAN_WRITE. */ + return -EINVAL; +} + +/* + * A ruleset file descriptor enables to build a ruleset by adding (i.e. + * writing) rule after rule, without relying on the task's context. This + * reentrant design is also used in a read way to enforce the ruleset on the + * current task. + */ +static const struct file_operations ruleset_fops = { + .release = fop_ruleset_release, + .read = fop_dummy_read, + .write = fop_dummy_write, +}; + +/** + * sys_landlock_create_ruleset - Create a new ruleset + * + * @attr: Pointer to a &struct landlock_ruleset_attr identifying the scope of + * the new ruleset. + * @size: Size of the pointed &struct landlock_ruleset_attr (needed for + * backward and forward compatibility). + * @flags: Must be 0. + * + * This system call enables to create a new Landlock ruleset, and returns the + * related file descriptor on success. + * + * Possible returned errors are: + * + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time; + * - EINVAL: @flags is not 0, or unknown access, or too small @size; + * - E2BIG or EFAULT: @attr or @size inconsistencies; + * - ENOMSG: empty &landlock_ruleset_attr.handled_access_fs. + */ +SYSCALL_DEFINE3(landlock_create_ruleset, + const struct landlock_ruleset_attr __user *const, attr, + const size_t, size, const __u32, flags) +{ + struct landlock_ruleset_attr ruleset_attr; + struct landlock_ruleset *ruleset; + int err, ruleset_fd; + + /* Build-time checks. */ + build_check_abi(); + + if (!landlock_initialized) + return -EOPNOTSUPP; + + /* No flag for now. */ + if (flags) + return -EINVAL; + + /* Copies raw user space buffer. */ + err = copy_min_struct_from_user(&ruleset_attr, sizeof(ruleset_attr), + offsetofend(typeof(ruleset_attr), handled_access_fs), + attr, size); + if (err) + return err; + + /* Checks content (and 32-bits cast). */ + if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) != + LANDLOCK_MASK_ACCESS_FS) + return -EINVAL; + + /* Checks arguments and transforms to kernel struct. */ + ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + + /* Creates anonymous FD referring to the ruleset. */ + ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops, + ruleset, O_RDWR | O_CLOEXEC); + if (ruleset_fd < 0) + landlock_put_ruleset(ruleset); + return ruleset_fd; +} + +/* + * Returns an owned ruleset from a FD. It is thus needed to call + * landlock_put_ruleset() on the return value. + */ +static struct landlock_ruleset *get_ruleset_from_fd(const int fd, + const fmode_t mode) +{ + struct fd ruleset_f; + struct landlock_ruleset *ruleset; + + ruleset_f = fdget(fd); + if (!ruleset_f.file) + return ERR_PTR(-EBADF); + + /* Checks FD type and access right. */ + if (ruleset_f.file->f_op != &ruleset_fops) { + ruleset = ERR_PTR(-EBADFD); + goto out_fdput; + } + if (!(ruleset_f.file->f_mode & mode)) { + ruleset = ERR_PTR(-EPERM); + goto out_fdput; + } + ruleset = ruleset_f.file->private_data; + if (WARN_ON_ONCE(ruleset->num_layers != 1)) { + ruleset = ERR_PTR(-EINVAL); + goto out_fdput; + } + landlock_get_ruleset(ruleset); + +out_fdput: + fdput(ruleset_f); + return ruleset; +} + +/* Path handling */ + +/* + * @path: Must call put_path(@path) after the call if it succeeded. + */ +static int get_path_from_fd(const s32 fd, struct path *const path) +{ + struct fd f; + int err = 0; + + BUILD_BUG_ON(!__same_type(fd, + ((struct landlock_path_beneath_attr *)NULL)->parent_fd)); + + /* Handles O_PATH. */ + f = fdget_raw(fd); + if (!f.file) + return -EBADF; + /* + * Only allows O_PATH file descriptor: enables to restrict ambient + * filesystem access without requiring to open and risk leaking or + * misusing a file descriptor. Forbids ruleset FDs, internal + * filesystems (e.g. nsfs), including pseudo filesystems that will + * never be mountable (e.g. sockfs, pipefs). + */ + if (!(f.file->f_mode & FMODE_PATH) || + (f.file->f_op == &ruleset_fops) || + (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) || + (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) || + d_is_negative(f.file->f_path.dentry) || + IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) { + err = -EBADFD; + goto out_fdput; + } + *path = f.file->f_path; + path_get(path); + +out_fdput: + fdput(f); + return err; +} + +/** + * sys_landlock_add_rule - Add a new rule to a ruleset + * + * @ruleset_fd: File descriptor tied to the ruleset that should be extended + * with the new rule. + * @rule_type: Identify the structure type pointed to by @rule_attr (only + * LANDLOCK_RULE_PATH_BENEATH for now). + * @rule_attr: Pointer to a rule (only of type &struct + * landlock_path_beneath_attr for now). + * @flags: Must be 0. + * + * This system call enables to define a new rule and add it to an existing + * ruleset. + * + * Possible returned errors are: + * + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time; + * - EINVAL: @flags is not 0, or inconsistent access in the rule (i.e. + * &landlock_path_beneath_attr.allowed_access is not a subset of the rule's + * accesses); + * - ENOMSG: Empty accesses (e.g. &landlock_path_beneath_attr.allowed_access); + * - EBADF: @ruleset_fd is not a file descriptor for the current thread, or a + * member of @rule_attr is not a file descriptor as expected; + * - EBADFD: @ruleset_fd is not a ruleset file descriptor, or a member of + * @rule_attr is not the expected file descriptor type (e.g. file open + * without O_PATH); + * - EPERM: @ruleset_fd has no write access to the underlying ruleset; + * - EFAULT: @rule_attr inconsistency. + */ +SYSCALL_DEFINE4(landlock_add_rule, + const int, ruleset_fd, const enum landlock_rule_type, rule_type, + const void __user *const, rule_attr, const __u32, flags) +{ + struct landlock_path_beneath_attr path_beneath_attr; + struct path path; + struct landlock_ruleset *ruleset; + int res, err; + + if (!landlock_initialized) + return -EOPNOTSUPP; + + /* No flag for now. */ + if (flags) + return -EINVAL; + + if (rule_type != LANDLOCK_RULE_PATH_BENEATH) + return -EINVAL; + + /* Copies raw user space buffer, only one type for now. */ + res = copy_from_user(&path_beneath_attr, rule_attr, + sizeof(path_beneath_attr)); + if (res) + return -EFAULT; + + /* Gets and checks the ruleset. */ + ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_WRITE); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + + /* + * Informs about useless rule: empty allowed_access (i.e. deny rules) + * are ignored in path walks. + */ + if (!path_beneath_attr.allowed_access) { + err = -ENOMSG; + goto out_put_ruleset; + } + /* + * Checks that allowed_access matches the @ruleset constraints + * (ruleset->fs_access_masks[0] is automatically upgraded to 64-bits). + */ + if ((path_beneath_attr.allowed_access | ruleset->fs_access_masks[0]) != + ruleset->fs_access_masks[0]) { + err = -EINVAL; + goto out_put_ruleset; + } + + /* Gets and checks the new rule. */ + err = get_path_from_fd(path_beneath_attr.parent_fd, &path); + if (err) + goto out_put_ruleset; + + /* Imports the new rule. */ + err = landlock_append_fs_rule(ruleset, &path, + path_beneath_attr.allowed_access); + path_put(&path); + +out_put_ruleset: + landlock_put_ruleset(ruleset); + return err; +} + +/* Enforcement */ + +/** + * sys_landlock_restrict_self - Enforce a ruleset on the calling thread + * + * @ruleset_fd: File descriptor tied to the ruleset to merge with the target. + * @flags: Must be 0. + * + * This system call enables to enforce a Landlock ruleset on the current + * thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its + * namespace or is running with no_new_privs. This avoids scenarios where + * unprivileged tasks can affect the behavior of privileged children. + * + * Possible returned errors are: + * + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time; + * - EINVAL: @flags is not 0. + * - EBADF: @ruleset_fd is not a file descriptor for the current thread; + * - EBADFD: @ruleset_fd is not a ruleset file descriptor; + * - EPERM: @ruleset_fd has no read access to the underlying ruleset, or the + * current thread is not running with no_new_privs, or it doesn't have + * CAP_SYS_ADMIN in its namespace. + * - E2BIG: The maximum number of stacked rulesets is reached for the current + * thread. + */ +SYSCALL_DEFINE2(landlock_restrict_self, + const int, ruleset_fd, const __u32, flags) +{ + struct landlock_ruleset *new_dom, *ruleset; + struct cred *new_cred; + struct landlock_cred_security *new_llcred; + int err; + + if (!landlock_initialized) + return -EOPNOTSUPP; + + /* No flag for now. */ + if (flags) + return -EINVAL; + + /* + * Similar checks as for seccomp(2), except that an -EPERM may be + * returned. + */ + if (!task_no_new_privs(current) && + !ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN)) + return -EPERM; + + /* Gets and checks the ruleset. */ + ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + + /* Prepares new credentials. */ + new_cred = prepare_creds(); + if (!new_cred) { + err = -ENOMEM; + goto out_put_ruleset; + } + new_llcred = landlock_cred(new_cred); + + /* + * There is no possible race condition while copying and manipulating + * the current credentials because they are dedicated per thread. + */ + new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset); + if (IS_ERR(new_dom)) { + err = PTR_ERR(new_dom); + goto out_put_creds; + } + + /* Replaces the old (prepared) domain. */ + landlock_put_ruleset(new_llcred->domain); + new_llcred->domain = new_dom; + + landlock_put_ruleset(ruleset); + return commit_creds(new_cred); + +out_put_creds: + abort_creds(new_cred); + +out_put_ruleset: + landlock_put_ruleset(ruleset); + return err; +}
On Tue, Mar 16, 2021 at 09:42:48PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
These 3 system calls are designed to be used by unprivileged processes to sandbox themselves:
- landlock_create_ruleset(2): Creates a ruleset and returns its file descriptor.
- landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor.
- landlock_restrict_self(2): Enforces a ruleset on the calling thread and its future children (similar to seccomp). This syscall has the same usage restrictions as seccomp(2): the caller must have the no_new_privs attribute set or have CAP_SYS_ADMIN in the current user namespace.
All these syscalls have a "flags" argument (not currently used) to enable extensibility.
For the new-style extensible syscalls, you want only a "size" argument; "flags" should be within the argument structure.
(And to this end, why 3 syscalls instead of 1, if you're going to use extensibility anyway?)
+/**
- copy_min_struct_from_user - Safe future-proof argument copying
- Extend copy_struct_from_user() to check for consistent user buffer.
- @dst: Kernel space pointer or NULL.
- @ksize: Actual size of the data pointed to by @dst.
- @ksize_min: Minimal required size to be copied.
- @src: User space pointer or NULL.
- @usize: (Alleged) size of the data pointed to by @src.
- */
+static __always_inline int copy_min_struct_from_user(void *const dst,
const size_t ksize, const size_t ksize_min,
const void __user *const src, const size_t usize)
+{
- /* Checks buffer inconsistencies. */
- BUILD_BUG_ON(!dst);
- if (!src)
return -EFAULT;
- /* Checks size ranges. */
- BUILD_BUG_ON(ksize <= 0);
- BUILD_BUG_ON(ksize < ksize_min);
- if (usize < ksize_min)
return -EINVAL;
- if (usize > PAGE_SIZE)
return -E2BIG;
- /* Copies user buffer and fills with zeros. */
- return copy_struct_from_user(dst, ksize, src, usize);
+}
I still wish this was built into copy_struct_from_user(). :) But yes, this appears to be the way to protect one's self when using copy_struct_from_user().
+/**
- sys_landlock_create_ruleset - Create a new ruleset
- @attr: Pointer to a &struct landlock_ruleset_attr identifying the scope of
the new ruleset.
- @size: Size of the pointed &struct landlock_ruleset_attr (needed for
backward and forward compatibility).
- @flags: Must be 0.
- This system call enables to create a new Landlock ruleset, and returns the
- related file descriptor on success.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0, or unknown access, or too small @size;
- E2BIG or EFAULT: @attr or @size inconsistencies;
- ENOMSG: empty &landlock_ruleset_attr.handled_access_fs.
- */
+SYSCALL_DEFINE3(landlock_create_ruleset,
const struct landlock_ruleset_attr __user *const, attr,
const size_t, size, const __u32, flags)
+{
- struct landlock_ruleset_attr ruleset_attr;
- struct landlock_ruleset *ruleset;
- int err, ruleset_fd;
- /* Build-time checks. */
- build_check_abi();
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- /* Copies raw user space buffer. */
- err = copy_min_struct_from_user(&ruleset_attr, sizeof(ruleset_attr),
offsetofend(typeof(ruleset_attr), handled_access_fs),
The use of offsetofend() here appears to be kind of the "V1", "V2", ... sizes used in other extensible syscall implementations?
attr, size);
- if (err)
return err;
- /* Checks content (and 32-bits cast). */
- if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) !=
LANDLOCK_MASK_ACCESS_FS)
return -EINVAL;
- /* Checks arguments and transforms to kernel struct. */
- ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /* Creates anonymous FD referring to the ruleset. */
- ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops,
ruleset, O_RDWR | O_CLOEXEC);
- if (ruleset_fd < 0)
landlock_put_ruleset(ruleset);
- return ruleset_fd;
+}
+/*
- Returns an owned ruleset from a FD. It is thus needed to call
- landlock_put_ruleset() on the return value.
- */
+static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
const fmode_t mode)
+{
- struct fd ruleset_f;
- struct landlock_ruleset *ruleset;
- ruleset_f = fdget(fd);
- if (!ruleset_f.file)
return ERR_PTR(-EBADF);
- /* Checks FD type and access right. */
- if (ruleset_f.file->f_op != &ruleset_fops) {
ruleset = ERR_PTR(-EBADFD);
goto out_fdput;
- }
- if (!(ruleset_f.file->f_mode & mode)) {
ruleset = ERR_PTR(-EPERM);
goto out_fdput;
- }
- ruleset = ruleset_f.file->private_data;
- if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
ruleset = ERR_PTR(-EINVAL);
goto out_fdput;
- }
- landlock_get_ruleset(ruleset);
+out_fdput:
- fdput(ruleset_f);
- return ruleset;
+}
+/* Path handling */
+/*
- @path: Must call put_path(@path) after the call if it succeeded.
- */
+static int get_path_from_fd(const s32 fd, struct path *const path) +{
- struct fd f;
- int err = 0;
- BUILD_BUG_ON(!__same_type(fd,
((struct landlock_path_beneath_attr *)NULL)->parent_fd));
- /* Handles O_PATH. */
- f = fdget_raw(fd);
- if (!f.file)
return -EBADF;
- /*
* Only allows O_PATH file descriptor: enables to restrict ambient
* filesystem access without requiring to open and risk leaking or
* misusing a file descriptor. Forbids ruleset FDs, internal
* filesystems (e.g. nsfs), including pseudo filesystems that will
* never be mountable (e.g. sockfs, pipefs).
*/
- if (!(f.file->f_mode & FMODE_PATH) ||
(f.file->f_op == &ruleset_fops) ||
(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
(f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
d_is_negative(f.file->f_path.dentry) ||
IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
err = -EBADFD;
goto out_fdput;
- }
- *path = f.file->f_path;
- path_get(path);
+out_fdput:
- fdput(f);
- return err;
+}
+/**
- sys_landlock_add_rule - Add a new rule to a ruleset
- @ruleset_fd: File descriptor tied to the ruleset that should be extended
with the new rule.
- @rule_type: Identify the structure type pointed to by @rule_attr (only
LANDLOCK_RULE_PATH_BENEATH for now).
- @rule_attr: Pointer to a rule (only of type &struct
landlock_path_beneath_attr for now).
- @flags: Must be 0.
- This system call enables to define a new rule and add it to an existing
- ruleset.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0, or inconsistent access in the rule (i.e.
- &landlock_path_beneath_attr.allowed_access is not a subset of the rule's
- accesses);
- ENOMSG: Empty accesses (e.g. &landlock_path_beneath_attr.allowed_access);
- EBADF: @ruleset_fd is not a file descriptor for the current thread, or a
- member of @rule_attr is not a file descriptor as expected;
- EBADFD: @ruleset_fd is not a ruleset file descriptor, or a member of
- @rule_attr is not the expected file descriptor type (e.g. file open
- without O_PATH);
- EPERM: @ruleset_fd has no write access to the underlying ruleset;
- EFAULT: @rule_attr inconsistency.
- */
+SYSCALL_DEFINE4(landlock_add_rule,
const int, ruleset_fd, const enum landlock_rule_type, rule_type,
const void __user *const, rule_attr, const __u32, flags)
+{
If this is an extensible syscall, I'd expect a structure holding rule_type, rule_attr, and flags.
- struct landlock_path_beneath_attr path_beneath_attr;
- struct path path;
- struct landlock_ruleset *ruleset;
- int res, err;
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- if (rule_type != LANDLOCK_RULE_PATH_BENEATH)
return -EINVAL;
- /* Copies raw user space buffer, only one type for now. */
- res = copy_from_user(&path_beneath_attr, rule_attr,
sizeof(path_beneath_attr));
- if (res)
return -EFAULT;
- /* Gets and checks the ruleset. */
- ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_WRITE);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /*
* Informs about useless rule: empty allowed_access (i.e. deny rules)
* are ignored in path walks.
*/
- if (!path_beneath_attr.allowed_access) {
err = -ENOMSG;
goto out_put_ruleset;
- }
- /*
* Checks that allowed_access matches the @ruleset constraints
* (ruleset->fs_access_masks[0] is automatically upgraded to 64-bits).
*/
- if ((path_beneath_attr.allowed_access | ruleset->fs_access_masks[0]) !=
ruleset->fs_access_masks[0]) {
err = -EINVAL;
goto out_put_ruleset;
- }
- /* Gets and checks the new rule. */
- err = get_path_from_fd(path_beneath_attr.parent_fd, &path);
- if (err)
goto out_put_ruleset;
- /* Imports the new rule. */
- err = landlock_append_fs_rule(ruleset, &path,
path_beneath_attr.allowed_access);
- path_put(&path);
+out_put_ruleset:
- landlock_put_ruleset(ruleset);
- return err;
+}
+/* Enforcement */
+/**
- sys_landlock_restrict_self - Enforce a ruleset on the calling thread
- @ruleset_fd: File descriptor tied to the ruleset to merge with the target.
- @flags: Must be 0.
- This system call enables to enforce a Landlock ruleset on the current
- thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its
- namespace or is running with no_new_privs. This avoids scenarios where
- unprivileged tasks can affect the behavior of privileged children.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0.
- EBADF: @ruleset_fd is not a file descriptor for the current thread;
- EBADFD: @ruleset_fd is not a ruleset file descriptor;
- EPERM: @ruleset_fd has no read access to the underlying ruleset, or the
- current thread is not running with no_new_privs, or it doesn't have
- CAP_SYS_ADMIN in its namespace.
- E2BIG: The maximum number of stacked rulesets is reached for the current
- thread.
- */
+SYSCALL_DEFINE2(landlock_restrict_self,
const int, ruleset_fd, const __u32, flags)
+{
Same observation about new style syscalls.
- struct landlock_ruleset *new_dom, *ruleset;
- struct cred *new_cred;
- struct landlock_cred_security *new_llcred;
- int err;
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- /*
* Similar checks as for seccomp(2), except that an -EPERM may be
* returned.
*/
- if (!task_no_new_privs(current) &&
!ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;
- /* Gets and checks the ruleset. */
- ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /* Prepares new credentials. */
- new_cred = prepare_creds();
- if (!new_cred) {
err = -ENOMEM;
goto out_put_ruleset;
- }
- new_llcred = landlock_cred(new_cred);
- /*
* There is no possible race condition while copying and manipulating
* the current credentials because they are dedicated per thread.
*/
- new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
- if (IS_ERR(new_dom)) {
err = PTR_ERR(new_dom);
goto out_put_creds;
- }
- /* Replaces the old (prepared) domain. */
- landlock_put_ruleset(new_llcred->domain);
- new_llcred->domain = new_dom;
- landlock_put_ruleset(ruleset);
- return commit_creds(new_cred);
+out_put_creds:
- abort_creds(new_cred);
+out_put_ruleset:
- landlock_put_ruleset(ruleset);
- return err;
+}
2.30.2
On 19/03/2021 20:06, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:48PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
These 3 system calls are designed to be used by unprivileged processes to sandbox themselves:
- landlock_create_ruleset(2): Creates a ruleset and returns its file descriptor.
- landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor.
- landlock_restrict_self(2): Enforces a ruleset on the calling thread and its future children (similar to seccomp). This syscall has the same usage restrictions as seccomp(2): the caller must have the no_new_privs attribute set or have CAP_SYS_ADMIN in the current user namespace.
All these syscalls have a "flags" argument (not currently used) to enable extensibility.
For the new-style extensible syscalls, you want only a "size" argument; "flags" should be within the argument structure.
Not necessarily, we should avoid complexity as much as possible. A flag argument is simple, and extensible arguments should be avoided as much as possible. Here it is only used for an extensible list of properties that can't be expressed otherwise.
(And to this end, why 3 syscalls instead of 1, if you're going to use extensibility anyway?)
This was discussed with people such as Arnd and Jann. Multiplexer syscalls should be avoided. Here are some background: https://lore.kernel.org/lkml/CAG48ez2V-eSH2+HL9zrYYD4QMpP4a5y8=mTQtk20PB0wUz...
+/**
- copy_min_struct_from_user - Safe future-proof argument copying
- Extend copy_struct_from_user() to check for consistent user buffer.
- @dst: Kernel space pointer or NULL.
- @ksize: Actual size of the data pointed to by @dst.
- @ksize_min: Minimal required size to be copied.
- @src: User space pointer or NULL.
- @usize: (Alleged) size of the data pointed to by @src.
- */
+static __always_inline int copy_min_struct_from_user(void *const dst,
const size_t ksize, const size_t ksize_min,
const void __user *const src, const size_t usize)
+{
- /* Checks buffer inconsistencies. */
- BUILD_BUG_ON(!dst);
- if (!src)
return -EFAULT;
- /* Checks size ranges. */
- BUILD_BUG_ON(ksize <= 0);
- BUILD_BUG_ON(ksize < ksize_min);
- if (usize < ksize_min)
return -EINVAL;
- if (usize > PAGE_SIZE)
return -E2BIG;
- /* Copies user buffer and fills with zeros. */
- return copy_struct_from_user(dst, ksize, src, usize);
+}
I still wish this was built into copy_struct_from_user(). :) But yes, this appears to be the way to protect one's self when using copy_struct_from_user().
+/**
- sys_landlock_create_ruleset - Create a new ruleset
- @attr: Pointer to a &struct landlock_ruleset_attr identifying the scope of
the new ruleset.
- @size: Size of the pointed &struct landlock_ruleset_attr (needed for
backward and forward compatibility).
- @flags: Must be 0.
- This system call enables to create a new Landlock ruleset, and returns the
- related file descriptor on success.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0, or unknown access, or too small @size;
- E2BIG or EFAULT: @attr or @size inconsistencies;
- ENOMSG: empty &landlock_ruleset_attr.handled_access_fs.
- */
+SYSCALL_DEFINE3(landlock_create_ruleset,
const struct landlock_ruleset_attr __user *const, attr,
const size_t, size, const __u32, flags)
+{
- struct landlock_ruleset_attr ruleset_attr;
- struct landlock_ruleset *ruleset;
- int err, ruleset_fd;
- /* Build-time checks. */
- build_check_abi();
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- /* Copies raw user space buffer. */
- err = copy_min_struct_from_user(&ruleset_attr, sizeof(ruleset_attr),
offsetofend(typeof(ruleset_attr), handled_access_fs),
The use of offsetofend() here appears to be kind of the "V1", "V2", ... sizes used in other extensible syscall implementations?
ruleset_attr is an extensible argument.
attr, size);
- if (err)
return err;
- /* Checks content (and 32-bits cast). */
- if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) !=
LANDLOCK_MASK_ACCESS_FS)
return -EINVAL;
- /* Checks arguments and transforms to kernel struct. */
- ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /* Creates anonymous FD referring to the ruleset. */
- ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops,
ruleset, O_RDWR | O_CLOEXEC);
- if (ruleset_fd < 0)
landlock_put_ruleset(ruleset);
- return ruleset_fd;
+}
+/*
- Returns an owned ruleset from a FD. It is thus needed to call
- landlock_put_ruleset() on the return value.
- */
+static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
const fmode_t mode)
+{
- struct fd ruleset_f;
- struct landlock_ruleset *ruleset;
- ruleset_f = fdget(fd);
- if (!ruleset_f.file)
return ERR_PTR(-EBADF);
- /* Checks FD type and access right. */
- if (ruleset_f.file->f_op != &ruleset_fops) {
ruleset = ERR_PTR(-EBADFD);
goto out_fdput;
- }
- if (!(ruleset_f.file->f_mode & mode)) {
ruleset = ERR_PTR(-EPERM);
goto out_fdput;
- }
- ruleset = ruleset_f.file->private_data;
- if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
ruleset = ERR_PTR(-EINVAL);
goto out_fdput;
- }
- landlock_get_ruleset(ruleset);
+out_fdput:
- fdput(ruleset_f);
- return ruleset;
+}
+/* Path handling */
+/*
- @path: Must call put_path(@path) after the call if it succeeded.
- */
+static int get_path_from_fd(const s32 fd, struct path *const path) +{
- struct fd f;
- int err = 0;
- BUILD_BUG_ON(!__same_type(fd,
((struct landlock_path_beneath_attr *)NULL)->parent_fd));
- /* Handles O_PATH. */
- f = fdget_raw(fd);
- if (!f.file)
return -EBADF;
- /*
* Only allows O_PATH file descriptor: enables to restrict ambient
* filesystem access without requiring to open and risk leaking or
* misusing a file descriptor. Forbids ruleset FDs, internal
* filesystems (e.g. nsfs), including pseudo filesystems that will
* never be mountable (e.g. sockfs, pipefs).
*/
- if (!(f.file->f_mode & FMODE_PATH) ||
(f.file->f_op == &ruleset_fops) ||
(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
(f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
d_is_negative(f.file->f_path.dentry) ||
IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
err = -EBADFD;
goto out_fdput;
- }
- *path = f.file->f_path;
- path_get(path);
+out_fdput:
- fdput(f);
- return err;
+}
+/**
- sys_landlock_add_rule - Add a new rule to a ruleset
- @ruleset_fd: File descriptor tied to the ruleset that should be extended
with the new rule.
- @rule_type: Identify the structure type pointed to by @rule_attr (only
LANDLOCK_RULE_PATH_BENEATH for now).
- @rule_attr: Pointer to a rule (only of type &struct
landlock_path_beneath_attr for now).
- @flags: Must be 0.
- This system call enables to define a new rule and add it to an existing
- ruleset.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0, or inconsistent access in the rule (i.e.
- &landlock_path_beneath_attr.allowed_access is not a subset of the rule's
- accesses);
- ENOMSG: Empty accesses (e.g. &landlock_path_beneath_attr.allowed_access);
- EBADF: @ruleset_fd is not a file descriptor for the current thread, or a
- member of @rule_attr is not a file descriptor as expected;
- EBADFD: @ruleset_fd is not a ruleset file descriptor, or a member of
- @rule_attr is not the expected file descriptor type (e.g. file open
- without O_PATH);
- EPERM: @ruleset_fd has no write access to the underlying ruleset;
- EFAULT: @rule_attr inconsistency.
- */
+SYSCALL_DEFINE4(landlock_add_rule,
const int, ruleset_fd, const enum landlock_rule_type, rule_type,
const void __user *const, rule_attr, const __u32, flags)
+{
If this is an extensible syscall, I'd expect a structure holding rule_type, rule_attr, and flags.
This does not use an extensible argument. rule_type specifies a type and rule_attr specifies the content of this type. This is simpler and sufficient.
- struct landlock_path_beneath_attr path_beneath_attr;
- struct path path;
- struct landlock_ruleset *ruleset;
- int res, err;
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- if (rule_type != LANDLOCK_RULE_PATH_BENEATH)
return -EINVAL;
- /* Copies raw user space buffer, only one type for now. */
- res = copy_from_user(&path_beneath_attr, rule_attr,
sizeof(path_beneath_attr));
- if (res)
return -EFAULT;
- /* Gets and checks the ruleset. */
- ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_WRITE);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /*
* Informs about useless rule: empty allowed_access (i.e. deny rules)
* are ignored in path walks.
*/
- if (!path_beneath_attr.allowed_access) {
err = -ENOMSG;
goto out_put_ruleset;
- }
- /*
* Checks that allowed_access matches the @ruleset constraints
* (ruleset->fs_access_masks[0] is automatically upgraded to 64-bits).
*/
- if ((path_beneath_attr.allowed_access | ruleset->fs_access_masks[0]) !=
ruleset->fs_access_masks[0]) {
err = -EINVAL;
goto out_put_ruleset;
- }
- /* Gets and checks the new rule. */
- err = get_path_from_fd(path_beneath_attr.parent_fd, &path);
- if (err)
goto out_put_ruleset;
- /* Imports the new rule. */
- err = landlock_append_fs_rule(ruleset, &path,
path_beneath_attr.allowed_access);
- path_put(&path);
+out_put_ruleset:
- landlock_put_ruleset(ruleset);
- return err;
+}
+/* Enforcement */
+/**
- sys_landlock_restrict_self - Enforce a ruleset on the calling thread
- @ruleset_fd: File descriptor tied to the ruleset to merge with the target.
- @flags: Must be 0.
- This system call enables to enforce a Landlock ruleset on the current
- thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its
- namespace or is running with no_new_privs. This avoids scenarios where
- unprivileged tasks can affect the behavior of privileged children.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0.
- EBADF: @ruleset_fd is not a file descriptor for the current thread;
- EBADFD: @ruleset_fd is not a ruleset file descriptor;
- EPERM: @ruleset_fd has no read access to the underlying ruleset, or the
- current thread is not running with no_new_privs, or it doesn't have
- CAP_SYS_ADMIN in its namespace.
- E2BIG: The maximum number of stacked rulesets is reached for the current
- thread.
- */
+SYSCALL_DEFINE2(landlock_restrict_self,
const int, ruleset_fd, const __u32, flags)
+{
Same observation about new style syscalls.
This design is simpler.
- struct landlock_ruleset *new_dom, *ruleset;
- struct cred *new_cred;
- struct landlock_cred_security *new_llcred;
- int err;
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- /*
* Similar checks as for seccomp(2), except that an -EPERM may be
* returned.
*/
- if (!task_no_new_privs(current) &&
!ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;
- /* Gets and checks the ruleset. */
- ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ);
- if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
- /* Prepares new credentials. */
- new_cred = prepare_creds();
- if (!new_cred) {
err = -ENOMEM;
goto out_put_ruleset;
- }
- new_llcred = landlock_cred(new_cred);
- /*
* There is no possible race condition while copying and manipulating
* the current credentials because they are dedicated per thread.
*/
- new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
- if (IS_ERR(new_dom)) {
err = PTR_ERR(new_dom);
goto out_put_creds;
- }
- /* Replaces the old (prepared) domain. */
- landlock_put_ruleset(new_llcred->domain);
- new_llcred->domain = new_dom;
- landlock_put_ruleset(ruleset);
- return commit_creds(new_cred);
+out_put_creds:
- abort_creds(new_cred);
+out_put_ruleset:
- landlock_put_ruleset(ruleset);
- return err;
+}
2.30.2
On 19/03/2021 22:53, Mickaël Salaün wrote:
On 19/03/2021 20:06, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:48PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
[...]
+/**
- sys_landlock_create_ruleset - Create a new ruleset
- @attr: Pointer to a &struct landlock_ruleset_attr identifying the scope of
the new ruleset.
- @size: Size of the pointed &struct landlock_ruleset_attr (needed for
backward and forward compatibility).
- @flags: Must be 0.
- This system call enables to create a new Landlock ruleset, and returns the
- related file descriptor on success.
- Possible returned errors are:
- EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- EINVAL: @flags is not 0, or unknown access, or too small @size;
- E2BIG or EFAULT: @attr or @size inconsistencies;
- ENOMSG: empty &landlock_ruleset_attr.handled_access_fs.
- */
+SYSCALL_DEFINE3(landlock_create_ruleset,
const struct landlock_ruleset_attr __user *const, attr,
const size_t, size, const __u32, flags)
+{
- struct landlock_ruleset_attr ruleset_attr;
- struct landlock_ruleset *ruleset;
- int err, ruleset_fd;
- /* Build-time checks. */
- build_check_abi();
- if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
return -EINVAL;
- /* Copies raw user space buffer. */
- err = copy_min_struct_from_user(&ruleset_attr, sizeof(ruleset_attr),
offsetofend(typeof(ruleset_attr), handled_access_fs),
The use of offsetofend() here appears to be kind of the "V1", "V2", ... sizes used in other extensible syscall implementations?
ruleset_attr is an extensible argument.
offsetofen() is used to set the minimum size of a valid argument. This code will then not change with future extended ruleset_attr.
From: Mickaël Salaün mic@linux.microsoft.com
Wire up the following system calls for all architectures: * landlock_create_ruleset(2) * landlock_add_rule(2) * landlock_restrict_self(2)
Cc: Arnd Bergmann arnd@arndb.de Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Link: https://lore.kernel.org/r/20210316204252.427806-10-mic@digikod.net ---
Changes since v29: * Rebase on v5.12-rc3 and fix trivial conflict with mount_setattr(2). * Synchronize syscall numbers with -next, which are the same as for v5.12-rc3.
Changes since v27: * Rename landlock_enforce_ruleset_self(2) to landlock_restrict_self(2). * Cosmetic fix: align TBL enries.
Changes since v26: * Rename landlock_enforce_ruleset_current(2) to landlock_enforce_ruleset_self(2).
Changes since v25: * Rebase and leave space for the new epoll_pwait2(2) and memfd_secret(2) from -next.
Changes since v21: * Rebase and leave space for watch_mount(2) from -next.
Changes since v20: * Remove landlock_get_features(2). * Decrease syscall numbers to stick to process_madvise(2) in -next. * Rename landlock_enforce_ruleset(2) to landlock_enforce_ruleset_current(2).
Changes since v19: * Increase syscall numbers by 4 to leave space for new ones (in linux-next): watch_mount(2), watch_sb(2), fsinfo(2) and process_madvise(2) (requested by Arnd Bergmann). * Replace the previous multiplexor landlock(2) with 4 syscalls: landlock_get_features(2), landlock_create_ruleset(2), landlock_add_rule(2) and landlock_enforce_ruleset(2).
Changes since v18: * Increase the syscall number because of the new faccessat2(2).
Changes since v14: * Add all architectures.
Changes since v13: * New implementation. --- arch/alpha/kernel/syscalls/syscall.tbl | 3 +++ arch/arm/tools/syscall.tbl | 3 +++ arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 6 ++++++ arch/ia64/kernel/syscalls/syscall.tbl | 3 +++ arch/m68k/kernel/syscalls/syscall.tbl | 3 +++ arch/microblaze/kernel/syscalls/syscall.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +++ arch/parisc/kernel/syscalls/syscall.tbl | 3 +++ arch/powerpc/kernel/syscalls/syscall.tbl | 3 +++ arch/s390/kernel/syscalls/syscall.tbl | 3 +++ arch/sh/kernel/syscalls/syscall.tbl | 3 +++ arch/sparc/kernel/syscalls/syscall.tbl | 3 +++ arch/x86/entry/syscalls/syscall_32.tbl | 3 +++ arch/x86/entry/syscalls/syscall_64.tbl | 3 +++ arch/xtensa/kernel/syscalls/syscall.tbl | 3 +++ include/uapi/asm-generic/unistd.h | 8 +++++++- 19 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 02f0244e005c..0924a3ac7bd9 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -482,3 +482,6 @@ 550 common process_madvise sys_process_madvise 551 common epoll_pwait2 sys_epoll_pwait2 552 common mount_setattr sys_mount_setattr +553 common landlock_create_ruleset sys_landlock_create_ruleset +554 common landlock_add_rule sys_landlock_add_rule +555 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index dcc1191291a2..dc1134b34cea 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -456,3 +456,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 949788f5ba40..d1cc2849dc00 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 443 +#define __NR_compat_syscalls 446 #endif
#define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 3d874f624056..54e11bce7677 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -893,6 +893,12 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise) __SYSCALL(__NR_epoll_pwait2, compat_sys_epoll_pwait2) #define __NR_mount_setattr 442 __SYSCALL(__NR_mount_setattr, sys_mount_setattr) +#define __NR_landlock_create_ruleset 443 +__SYSCALL(__NR_landlock_create_ruleset, sys_landlock_create_ruleset) +#define __NR_landlock_add_rule 444 +__SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule) +#define __NR_landlock_restrict_self 445 +__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
/* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index d89231166e19..1bb35159561a 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -363,3 +363,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 72bde6707dd3..e06e224523bb 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -442,3 +442,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index d603a5ec9338..9994a43eafb2 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -448,3 +448,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 8fd8c1790941..834333d84d3e 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -381,3 +381,6 @@ 440 n32 process_madvise sys_process_madvise 441 n32 epoll_pwait2 compat_sys_epoll_pwait2 442 n32 mount_setattr sys_mount_setattr +443 n32 landlock_create_ruleset sys_landlock_create_ruleset +444 n32 landlock_add_rule sys_landlock_add_rule +445 n32 landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index 169f21438065..935024e0f49b 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -357,3 +357,6 @@ 440 n64 process_madvise sys_process_madvise 441 n64 epoll_pwait2 sys_epoll_pwait2 442 n64 mount_setattr sys_mount_setattr +443 n64 landlock_create_ruleset sys_landlock_create_ruleset +444 n64 landlock_add_rule sys_landlock_add_rule +445 n64 landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 090d29ca80ff..f3f8bea8ce99 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -430,3 +430,6 @@ 440 o32 process_madvise sys_process_madvise 441 o32 epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 o32 mount_setattr sys_mount_setattr +443 o32 landlock_create_ruleset sys_landlock_create_ruleset +444 o32 landlock_add_rule sys_landlock_add_rule +445 o32 landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 271a92519683..1bddfeffdebd 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -440,3 +440,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 0b2480cf3e47..98548b8da879 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -522,3 +522,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3abef2144dac..ecb697fee2f3 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -445,3 +445,6 @@ 440 common process_madvise sys_process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index d08eebad6b7f..440c053eada5 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -445,3 +445,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 84403a99039c..f5f5d165c8c1 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -488,3 +488,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index a1c9f496fca6..995dc5b46dfc 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -447,3 +447,6 @@ 440 i386 process_madvise sys_process_madvise 441 i386 epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 442 i386 mount_setattr sys_mount_setattr +443 i386 landlock_create_ruleset sys_landlock_create_ruleset +444 i386 landlock_add_rule sys_landlock_add_rule +445 i386 landlock_restrict_self sys_landlock_restrict_self diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 7bf01cbe582f..a5207ee2b67b 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -364,6 +364,9 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self
# # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 365a9b849224..b43b96a862cd 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -413,3 +413,6 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 442 common mount_setattr sys_mount_setattr +443 common landlock_create_ruleset sys_landlock_create_ruleset +444 common landlock_add_rule sys_landlock_add_rule +445 common landlock_restrict_self sys_landlock_restrict_self diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index ce58cff99b66..02d8d7804a29 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -863,9 +863,15 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise) __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2) #define __NR_mount_setattr 442 __SYSCALL(__NR_mount_setattr, sys_mount_setattr) +#define __NR_landlock_create_ruleset 443 +__SYSCALL(__NR_landlock_create_ruleset, sys_landlock_create_ruleset) +#define __NR_landlock_add_rule 444 +__SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule) +#define __NR_landlock_restrict_self 445 +__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
#undef __NR_syscalls -#define __NR_syscalls 443 +#define __NR_syscalls 446
/* * 32 bit systems traditionally used different
From: Mickaël Salaün mic@linux.microsoft.com
Add a basic sandbox tool to launch a command which can only access a list of file hierarchies in a read-only or read-write way.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Jann Horn jannh@google.com Link: https://lore.kernel.org/r/20210316204252.427806-12-mic@digikod.net ---
Changes since v28: * Simplify Kconfig option title.
Changes since v27: * Add samples/landlock/ to MAINTAINERS. * Update landlock_restrict_self(2). * Tweak Kconfig title and description.
Changes since v25: * Improve comments and fix help (suggested by Jann Horn). * Add a safeguard for errno check (suggested by Jann Horn). * Allows users to not use all possible restrictions (e.g. use LL_FS_RO without LL_FS_RW). * Update syscall names. * Improve Makefile: - Replace hostprogs/always-y with userprogs-always-y, available since commit faabed295ccc ("kbuild: introduce hostprogs-always-y and userprogs-always-y"). - Depends on CC_CAN_LINK. * Add Reviewed-by Jann Horn.
Changes since v25: * Remove useless errno set in the syscall wrappers. * Cosmetic variable renames.
Changes since v23: * Re-add hints to help users understand the required kernel configuration. This was removed with the removal of landlock_get_features(2).
Changes since v21: * Remove LANDLOCK_ACCESS_FS_CHROOT. * Clean up help.
Changes since v20: * Update with new syscalls and type names. * Update errno check for EOPNOTSUPP. * Use the full syscall interfaces: explicitely set the "flags" field to zero.
Changes since v19: * Update with the new Landlock syscalls. * Comply with commit 5f2fb52fac15 ("kbuild: rename hostprogs-y/always to hostprogs/always-y").
Changes since v16: * Switch syscall attribute pointer and size arguments.
Changes since v15: * Update access right names. * Properly assign access right to files according to the new related syscall restriction. * Replace "select" with "depends on" HEADERS_INSTALL (suggested by Randy Dunlap).
Changes since v14: * Fix Kconfig dependency. * Remove access rights that may be required for FD-only requests: mmap, truncate, getattr, lock, chmod, chown, chgrp, ioctl. * Fix useless hardcoded syscall number. * Use execvpe(). * Follow symlinks. * Extend help with common file paths. * Constify variables. * Clean up comments. * Improve error message.
Changes since v11: * Add back the filesystem sandbox manager and update it to work with the new Landlock syscall.
Previous changes: https://lore.kernel.org/lkml/20190721213116.23476-9-mic@digikod.net/ --- MAINTAINERS | 1 + samples/Kconfig | 7 ++ samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 13 ++ samples/landlock/sandboxer.c | 238 +++++++++++++++++++++++++++++++++++ 6 files changed, 261 insertions(+) create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c
diff --git a/MAINTAINERS b/MAINTAINERS index 8cab5854844e..88175ed1f315 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10004,6 +10004,7 @@ S: Supported W: https://landlock.io T: git https://github.com/landlock-lsm/linux.git F: include/uapi/linux/landlock.h +F: samples/landlock/ F: security/landlock/ F: tools/testing/selftests/landlock/ K: landlock diff --git a/samples/Kconfig b/samples/Kconfig index e76cdfc50e25..b5a1a7aa7e23 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -124,6 +124,13 @@ config SAMPLE_HIDRAW bool "hidraw sample" depends on CC_CAN_LINK && HEADERS_INSTALL
+config SAMPLE_LANDLOCK + bool "Landlock example" + depends on CC_CAN_LINK && HEADERS_INSTALL + help + Build a simple Landlock sandbox manager able to start a process + restricted by a user-defined filesystem access control policy. + config SAMPLE_PIDFD bool "pidfd sample" depends on CC_CAN_LINK && HEADERS_INSTALL diff --git a/samples/Makefile b/samples/Makefile index c3392a595e4b..087e0988ccc5 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB) += kdb/ obj-$(CONFIG_SAMPLE_KFIFO) += kfifo/ obj-$(CONFIG_SAMPLE_KOBJECT) += kobject/ obj-$(CONFIG_SAMPLE_KPROBES) += kprobes/ +subdir-$(CONFIG_SAMPLE_LANDLOCK) += landlock obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch/ subdir-$(CONFIG_SAMPLE_PIDFD) += pidfd obj-$(CONFIG_SAMPLE_QMI_CLIENT) += qmi/ diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore new file mode 100644 index 000000000000..f43668b2d318 --- /dev/null +++ b/samples/landlock/.gitignore @@ -0,0 +1 @@ +/sandboxer diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile new file mode 100644 index 000000000000..5d601e51c2eb --- /dev/null +++ b/samples/landlock/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: BSD-3-Clause + +userprogs-always-y := sandboxer + +userccflags += -I usr/include + +.PHONY: all clean + +all: + $(MAKE) -C ../.. samples/landlock/ + +clean: + $(MAKE) -C ../.. M=samples/landlock/ clean diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c new file mode 100644 index 000000000000..7a15910d2171 --- /dev/null +++ b/samples/landlock/sandboxer.c @@ -0,0 +1,238 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* + * Simple Landlock sandbox manager able to launch a process restricted by a + * user-defined filesystem access control policy. + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2020 ANSSI + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <linux/landlock.h> +#include <linux/prctl.h> +#include <stddef.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/prctl.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <unistd.h> + +#ifndef landlock_create_ruleset +static inline int landlock_create_ruleset( + const struct landlock_ruleset_attr *const attr, + const size_t size, const __u32 flags) +{ + return syscall(__NR_landlock_create_ruleset, attr, size, flags); +} +#endif + +#ifndef landlock_add_rule +static inline int landlock_add_rule(const int ruleset_fd, + const enum landlock_rule_type rule_type, + const void *const rule_attr, const __u32 flags) +{ + return syscall(__NR_landlock_add_rule, ruleset_fd, rule_type, + rule_attr, flags); +} +#endif + +#ifndef landlock_restrict_self +static inline int landlock_restrict_self(const int ruleset_fd, + const __u32 flags) +{ + return syscall(__NR_landlock_restrict_self, ruleset_fd, flags); +} +#endif + +#define ENV_FS_RO_NAME "LL_FS_RO" +#define ENV_FS_RW_NAME "LL_FS_RW" +#define ENV_PATH_TOKEN ":" + +static int parse_path(char *env_path, const char ***const path_list) +{ + int i, num_paths = 0; + + if (env_path) { + num_paths++; + for (i = 0; env_path[i]; i++) { + if (env_path[i] == ENV_PATH_TOKEN[0]) + num_paths++; + } + } + *path_list = malloc(num_paths * sizeof(**path_list)); + for (i = 0; i < num_paths; i++) + (*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN); + + return num_paths; +} + +#define ACCESS_FILE ( \ + LANDLOCK_ACCESS_FS_EXECUTE | \ + LANDLOCK_ACCESS_FS_WRITE_FILE | \ + LANDLOCK_ACCESS_FS_READ_FILE) + +static int populate_ruleset( + const char *const env_var, const int ruleset_fd, + const __u64 allowed_access) +{ + int num_paths, i, ret = 1; + char *env_path_name; + const char **path_list = NULL; + struct landlock_path_beneath_attr path_beneath = { + .parent_fd = -1, + }; + + env_path_name = getenv(env_var); + if (!env_path_name) { + /* Prevents users to forget a setting. */ + fprintf(stderr, "Missing environment variable %s\n", env_var); + return 1; + } + env_path_name = strdup(env_path_name); + unsetenv(env_var); + num_paths = parse_path(env_path_name, &path_list); + if (num_paths == 1 && path_list[0][0] == '\0') { + /* + * Allows to not use all possible restrictions (e.g. use + * LL_FS_RO without LL_FS_RW). + */ + ret = 0; + goto out_free_name; + } + + for (i = 0; i < num_paths; i++) { + struct stat statbuf; + + path_beneath.parent_fd = open(path_list[i], O_PATH | + O_CLOEXEC); + if (path_beneath.parent_fd < 0) { + fprintf(stderr, "Failed to open "%s": %s\n", + path_list[i], + strerror(errno)); + goto out_free_name; + } + if (fstat(path_beneath.parent_fd, &statbuf)) { + close(path_beneath.parent_fd); + goto out_free_name; + } + path_beneath.allowed_access = allowed_access; + if (!S_ISDIR(statbuf.st_mode)) + path_beneath.allowed_access &= ACCESS_FILE; + if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, + &path_beneath, 0)) { + fprintf(stderr, "Failed to update the ruleset with "%s": %s\n", + path_list[i], strerror(errno)); + close(path_beneath.parent_fd); + goto out_free_name; + } + close(path_beneath.parent_fd); + } + ret = 0; + +out_free_name: + free(env_path_name); + return ret; +} + +#define ACCESS_FS_ROUGHLY_READ ( \ + LANDLOCK_ACCESS_FS_EXECUTE | \ + LANDLOCK_ACCESS_FS_READ_FILE | \ + LANDLOCK_ACCESS_FS_READ_DIR) + +#define ACCESS_FS_ROUGHLY_WRITE ( \ + LANDLOCK_ACCESS_FS_WRITE_FILE | \ + LANDLOCK_ACCESS_FS_REMOVE_DIR | \ + LANDLOCK_ACCESS_FS_REMOVE_FILE | \ + LANDLOCK_ACCESS_FS_MAKE_CHAR | \ + LANDLOCK_ACCESS_FS_MAKE_DIR | \ + LANDLOCK_ACCESS_FS_MAKE_REG | \ + LANDLOCK_ACCESS_FS_MAKE_SOCK | \ + LANDLOCK_ACCESS_FS_MAKE_FIFO | \ + LANDLOCK_ACCESS_FS_MAKE_BLOCK | \ + LANDLOCK_ACCESS_FS_MAKE_SYM) + +int main(const int argc, char *const argv[], char *const *const envp) +{ + const char *cmd_path; + char *const *cmd_argv; + int ruleset_fd; + struct landlock_ruleset_attr ruleset_attr = { + .handled_access_fs = ACCESS_FS_ROUGHLY_READ | + ACCESS_FS_ROUGHLY_WRITE, + }; + + if (argc < 2) { + fprintf(stderr, "usage: %s="..." %s="..." %s <cmd> [args]...\n\n", + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]); + fprintf(stderr, "Launch a command in a restricted environment.\n\n"); + fprintf(stderr, "Environment variables containing paths, " + "each separated by a colon:\n"); + fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n", + ENV_FS_RO_NAME); + fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n", + ENV_FS_RW_NAME); + fprintf(stderr, "\nexample:\n" + "%s="/bin:/lib:/usr:/proc:/etc:/dev/urandom" " + "%s="/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp" " + "%s bash -i\n", + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]); + return 1; + } + + ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); + if (ruleset_fd < 0) { + const int err = errno; + + perror("Failed to create a ruleset"); + switch (err) { + case ENOSYS: + fprintf(stderr, "Hint: Landlock is not supported by the current kernel. " + "To support it, build the kernel with " + "CONFIG_SECURITY_LANDLOCK=y and prepend " + ""landlock," to the content of CONFIG_LSM.\n"); + break; + case EOPNOTSUPP: + fprintf(stderr, "Hint: Landlock is currently disabled. " + "It can be enabled in the kernel configuration by " + "prepending "landlock," to the content of CONFIG_LSM, " + "or at boot time by setting the same content to the " + ""lsm" kernel parameter.\n"); + break; + } + return 1; + } + if (populate_ruleset(ENV_FS_RO_NAME, ruleset_fd, + ACCESS_FS_ROUGHLY_READ)) { + goto err_close_ruleset; + } + if (populate_ruleset(ENV_FS_RW_NAME, ruleset_fd, + ACCESS_FS_ROUGHLY_READ | ACCESS_FS_ROUGHLY_WRITE)) { + goto err_close_ruleset; + } + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("Failed to restrict privileges"); + goto err_close_ruleset; + } + if (landlock_restrict_self(ruleset_fd, 0)) { + perror("Failed to enforce ruleset"); + goto err_close_ruleset; + } + close(ruleset_fd); + + cmd_path = argv[1]; + cmd_argv = argv + 1; + execvpe(cmd_path, cmd_argv, envp); + fprintf(stderr, "Failed to execute "%s": %s\n", cmd_path, + strerror(errno)); + fprintf(stderr, "Hint: access to the binary, the interpreter or " + "shared libraries may be denied.\n"); + return 1; + +err_close_ruleset: + close(ruleset_fd); + return 1; +}
On Tue, Mar 16, 2021 at 09:42:51PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
Add a basic sandbox tool to launch a command which can only access a list of file hierarchies in a read-only or read-write way.
Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com
I'm very happy to see any example!
Reviewed-by: Kees Cook keescook@chromium.org
From: Mickaël Salaün mic@linux.microsoft.com
This documentation can be built with the Sphinx framework.
Cc: James Morris jmorris@namei.org Cc: Jann Horn jannh@google.com Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com Signed-off-by: Mickaël Salaün mic@linux.microsoft.com Reviewed-by: Vincent Dagonneau vincent.dagonneau@ssi.gouv.fr Link: https://lore.kernel.org/r/20210316204252.427806-13-mic@digikod.net ---
Changes since v28: * Reorder subsections by importance in the "Current limitations" section.
Changes since v27: * Update landlock_restrict_self(2). * Update date and copyright.
Changes since v25: * Explain the behavior of layered access rights. * Explain how bind mounts and overayfs mounts are handled by Landlock: merged overlayfs mount points have their own inodes, which makes these hierarchies independent from its upper and lower layers, unlike bind mounts which share the same inodes between the source hierarchy and the mount point hierarchy. New overlayfs mount and bind mount tests check these behaviors. * Synchronize with the new syscalls.c file and update syscall names. * Fix spelling. * Remove Reviewed-by Jann Horn because of the above changes.
Changes since v24: * Add Reviewed-by Jann Horn. * Add a paragraph to explain how the ruleset layers work. * Bump date.
Changes since v23: * Explain limitations for the maximum number of stacked ruleset, and the memory usage restrictions.
Changes since v22: * Fix spelling and remove obsolete sentence (spotted by Jann Horn). * Bump date.
Changes since v21: * Move the user space documentation to userspace-api/landlock.rst and the kernel documentation to security/landlock.rst . * Add license headers. * Add last update dates. * Update MAINTAINERS file. * Add (back) links to git.kernel.org . * Fix spelling.
Changes since v20: * Update examples and documentation with the new syscalls.
Changes since v19: * Update examples and documentation with the new syscalls.
Changes since v15: * Add current limitations.
Changes since v14: * Fix spelling (contributed by Randy Dunlap). * Extend documentation about inheritance and explain layer levels. * Remove the use of now-removed access rights. * Use GitHub links. * Improve kernel documentation. * Add section for tests. * Update example.
Changes since v13: * Rewrote the documentation according to the major revamp.
Previous changes: https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/ --- Documentation/security/index.rst | 1 + Documentation/security/landlock.rst | 79 ++++++ Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/landlock.rst | 307 +++++++++++++++++++++++ MAINTAINERS | 2 + 5 files changed, 390 insertions(+) create mode 100644 Documentation/security/landlock.rst create mode 100644 Documentation/userspace-api/landlock.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 8129405eb2cc..16335de04e8c 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -16,3 +16,4 @@ Security Documentation siphash tpm/index digsig + landlock diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst new file mode 100644 index 000000000000..ff145a661abd --- /dev/null +++ b/Documentation/security/landlock.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright © 2017-2020 Mickaël Salaün mic@digikod.net +.. Copyright © 2019-2020 ANSSI + +================================== +Landlock LSM: kernel documentation +================================== + +:Author: Mickaël Salaün +:Date: February 2021 + +Landlock's goal is to create scoped access-control (i.e. sandboxing). To +harden a whole system, this feature should be available to any process, +including unprivileged ones. Because such process may be compromised or +backdoored (i.e. untrusted), Landlock's features must be safe to use from the +kernel and other processes point of view. Landlock's interface must therefore +expose a minimal attack surface. + +Landlock is designed to be usable by unprivileged processes while following the +system security policy enforced by other access control mechanisms (e.g. DAC, +LSM). Indeed, a Landlock rule shall not interfere with other access-controls +enforced on the system, only add more restrictions. + +Any user can enforce Landlock rulesets on their processes. They are merged and +evaluated according to the inherited ones in a way that ensures that only more +constraints can be added. + +User space documentation can be found here: :doc:`/userspace-api/landlock`. + +Guiding principles for safe access controls +=========================================== + +* A Landlock rule shall be focused on access control on kernel objects instead + of syscall filtering (i.e. syscall arguments), which is the purpose of + seccomp-bpf. +* To avoid multiple kinds of side-channel attacks (e.g. leak of security + policies, CPU-based attacks), Landlock rules shall not be able to + programmatically communicate with user space. +* Kernel access check shall not slow down access request from unsandboxed + processes. +* Computation related to Landlock operations (e.g. enforcing a ruleset) shall + only impact the processes requesting them. + +Tests +===== + +Userspace tests for backward compatibility, ptrace restrictions and filesystem +support can be found here: `tools/testing/selftests/landlock/`_. + +Kernel structures +================= + +Object +------ + +.. kernel-doc:: security/landlock/object.h + :identifiers: + +Ruleset and domain +------------------ + +A domain is a read-only ruleset tied to a set of subjects (i.e. tasks' +credentials). Each time a ruleset is enforced on a task, the current domain is +duplicated and the ruleset is imported as a new layer of rules in the new +domain. Indeed, once in a domain, each rule is tied to a layer level. To +grant access to an object, at least one rule of each layer must allow the +requested action on the object. A task can then only transit to a new domain +that is the intersection of the constraints from the current domain and those +of a ruleset provided by the task. + +The definition of a subject is implicit for a task sandboxing itself, which +makes the reasoning much easier and helps avoid pitfalls. + +.. kernel-doc:: security/landlock/ruleset.h + :identifiers: + +.. Links +.. _tools/testing/selftests/landlock/: + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/... diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index d29b020e5622..744c6491610c 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -18,6 +18,7 @@ place where this information is gathered.
no_new_privs seccomp_filter + landlock unshare spec_ctrl accelerators/ocxl diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst new file mode 100644 index 000000000000..affe8b1d9c6e --- /dev/null +++ b/Documentation/userspace-api/landlock.rst @@ -0,0 +1,307 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright © 2017-2020 Mickaël Salaün mic@digikod.net +.. Copyright © 2019-2020 ANSSI +.. Copyright © 2021 Microsoft Corporation + +===================================== +Landlock: unprivileged access control +===================================== + +:Author: Mickaël Salaün +:Date: February 2021 + +The goal of Landlock is to enable to restrict ambient rights (e.g. global +filesystem access) for a set of processes. Because Landlock is a stackable +LSM, it makes possible to create safe security sandboxes as new security layers +in addition to the existing system-wide access-controls. This kind of sandbox +is expected to help mitigate the security impact of bugs or +unexpected/malicious behaviors in user space applications. Landlock empowers +any process, including unprivileged ones, to securely restrict themselves. + +Landlock rules +============== + +A Landlock rule describes an action on an object. An object is currently a +file hierarchy, and the related filesystem actions are defined in `Access +rights`_. A set of rules is aggregated in a ruleset, which can then restrict +the thread enforcing it, and its future children. + +Defining and enforcing a security policy +---------------------------------------- + +We first need to create the ruleset that will contain our rules. For this +example, the ruleset will contain rules that only allow read actions, but write +actions will be denied. The ruleset then needs to handle both of these kind of +actions. + +.. code-block:: c + + int ruleset_fd; + struct landlock_ruleset_attr ruleset_attr = { + .handled_access_fs = + LANDLOCK_ACCESS_FS_EXECUTE | + LANDLOCK_ACCESS_FS_WRITE_FILE | + LANDLOCK_ACCESS_FS_READ_FILE | + LANDLOCK_ACCESS_FS_READ_DIR | + LANDLOCK_ACCESS_FS_REMOVE_DIR | + LANDLOCK_ACCESS_FS_REMOVE_FILE | + LANDLOCK_ACCESS_FS_MAKE_CHAR | + LANDLOCK_ACCESS_FS_MAKE_DIR | + LANDLOCK_ACCESS_FS_MAKE_REG | + LANDLOCK_ACCESS_FS_MAKE_SOCK | + LANDLOCK_ACCESS_FS_MAKE_FIFO | + LANDLOCK_ACCESS_FS_MAKE_BLOCK | + LANDLOCK_ACCESS_FS_MAKE_SYM, + }; + + ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); + if (ruleset_fd < 0) { + perror("Failed to create a ruleset"); + return 1; + } + +We can now add a new rule to this ruleset thanks to the returned file +descriptor referring to this ruleset. The rule will only allow reading the +file hierarchy ``/usr``. Without another rule, write actions would then be +denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the +``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file +descriptor. + +.. code-block:: c + + int err; + struct landlock_path_beneath_attr path_beneath = { + .allowed_access = + LANDLOCK_ACCESS_FS_EXECUTE | + LANDLOCK_ACCESS_FS_READ_FILE | + LANDLOCK_ACCESS_FS_READ_DIR, + }; + + path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC); + if (path_beneath.parent_fd < 0) { + perror("Failed to open file"); + close(ruleset_fd); + return 1; + } + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, + &path_beneath, 0); + close(path_beneath.parent_fd); + if (err) { + perror("Failed to update ruleset"); + close(ruleset_fd); + return 1; + } + +We now have a ruleset with one rule allowing read access to ``/usr`` while +denying all other handled accesses for the filesystem. The next step is to +restrict the current thread from gaining more privileges (e.g. thanks to a SUID +binary). + +.. code-block:: c + + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("Failed to restrict privileges"); + close(ruleset_fd); + return 1; + } + +The current thread is now ready to sandbox itself with the ruleset. + +.. code-block:: c + + if (landlock_restrict_self(ruleset_fd, 0)) { + perror("Failed to enforce ruleset"); + close(ruleset_fd); + return 1; + } + close(ruleset_fd); + +If the `landlock_restrict_self` system call succeeds, the current thread is now +restricted and this policy will be enforced on all its subsequently created +children as well. Once a thread is landlocked, there is no way to remove its +security policy; only adding more restrictions is allowed. These threads are +now in a new Landlock domain, merge of their parent one (if any) with the new +ruleset. + +Full working code can be found in `samples/landlock/sandboxer.c`_. + +Layers of file path access rights +--------------------------------- + +Each time a thread enforces a ruleset on itself, it updates its Landlock domain +with a new layer of policy. Indeed, this complementary policy is stacked with +the potentially other rulesets already restricting this thread. A sandboxed +thread can then safely add more constraints to itself with a new enforced +ruleset. + +One policy layer grants access to a file path if at least one of its rules +encountered on the path grants the access. A sandboxed thread can only access +a file path if all its enforced policy layers grant the access as well as all +the other system access controls (e.g. filesystem DAC, other LSM policies, +etc.). + +Bind mounts and OverlayFS +------------------------- + +Landlock enables to restrict access to file hierarchies, which means that these +access rights can be propagated with bind mounts (cf. +:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`. + +A bind mount mirrors a source file hierarchy to a destination. The destination +hierarchy is then composed of the exact same files, on which Landlock rules can +be tied, either via the source or the destination path. These rules restrict +access when they are encountered on a path, which means that they can restrict +access to multiple file hierarchies at the same time, whether these hierarchies +are the result of bind mounts or not. + +An OverlayFS mount point consists of upper and lower layers. These layers are +combined in a merge directory, result of the mount point. This merge hierarchy +may include files from the upper and lower layers, but modifications performed +on the merge hierarchy only reflects on the upper layer. From a Landlock +policy point of view, each OverlayFS layers and merge hierarchies are +standalone and contains their own set of files and directories, which is +different from bind mounts. A policy restricting an OverlayFS layer will not +restrict the resulted merged hierarchy, and vice versa. + +Inheritance +----------- + +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain +restrictions from its parent. This is similar to the seccomp inheritance (cf. +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's +:manpage:`credentials(7)`. For instance, one process's thread may apply +Landlock rules to itself, but they will not be automatically applied to other +sibling threads (unlike POSIX thread credential changes, cf. +:manpage:`nptl(7)`). + +When a thread sandboxes itself, we have the guarantee that the related security +policy will stay enforced on all this thread's descendants. This allows +creating standalone and modular security policies per application, which will +automatically be composed between themselves according to their runtime parent +policies. + +Ptrace restrictions +------------------- + +A sandboxed process has less privileges than a non-sandboxed process and must +then be subject to additional restrictions when manipulating another process. +To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target +process, a sandboxed process should have a subset of the target process rules, +which means the tracee must be in a sub-domain of the tracer. + +Kernel interface +================ + +Access rights +------------- + +.. kernel-doc:: include/uapi/linux/landlock.h + :identifiers: fs_access + +Creating a new ruleset +---------------------- + +.. kernel-doc:: security/landlock/syscalls.c + :identifiers: sys_landlock_create_ruleset + +.. kernel-doc:: include/uapi/linux/landlock.h + :identifiers: landlock_ruleset_attr + +Extending a ruleset +------------------- + +.. kernel-doc:: security/landlock/syscalls.c + :identifiers: sys_landlock_add_rule + +.. kernel-doc:: include/uapi/linux/landlock.h + :identifiers: landlock_rule_type landlock_path_beneath_attr + +Enforcing a ruleset +------------------- + +.. kernel-doc:: security/landlock/syscalls.c + :identifiers: sys_landlock_restrict_self + +Current limitations +=================== + +File renaming and linking +------------------------- + +Because Landlock targets unprivileged access controls, it is needed to properly +handle composition of rules. Such property also implies rules nesting. +Properly handling multiple layers of ruleset, each one of them able to restrict +access to files, also implies to inherit the ruleset restrictions from a parent +to its hierarchy. Because files are identified and restricted by their +hierarchy, moving or linking a file from one directory to another implies to +propagate the hierarchy constraints. To protect against privilege escalations +through renaming or linking, and for the sack of simplicity, Landlock currently +limits linking and renaming to the same directory. Future Landlock evolutions +will enable more flexibility for renaming and linking, with dedicated ruleset +flags. + +Filesystem layout modification +------------------------------ + +As for file renaming and linking, a sandboxed thread cannot modify its +filesystem layout, whether via :manpage:`mount(2)` or :manpage:`pivot_root(2)`. +However, :manpage:`chroot(2)` calls are not denied. + +Special filesystems +------------------- + +Access to regular files and directories can be restricted by Landlock, +according to the handled accesses of a ruleset. However, files that do not +come from a user-visible filesystem (e.g. pipe, socket), but can still be +accessed through /proc/self/fd/, cannot currently be restricted. Likewise, +some special kernel filesystems such as nsfs, which can be accessed through +/proc/self/ns/, cannot currently be restricted. For now, these kind of special +paths are then always allowed. Future Landlock evolutions will enable to +restrict such paths with dedicated ruleset flags. + +Ruleset layers +-------------- + +There is a limit of 64 layers of stacked rulesets. This can be an issue for a +task willing to enforce a new ruleset in complement to its 64 inherited +rulesets. Once this limit is reached, sys_landlock_restrict_self() returns +E2BIG. It is then strongly suggested to carefully build rulesets once in the +life of a thread, especially for applications able to launch other applications +that may also want to sandbox themselves (e.g. shells, container managers, +etc.). + +Memory usage +------------ + +Kernel memory allocated to create rulesets is accounted and can be restricted +by the :doc:`/admin-guide/cgroup-v1/memory`. + +Questions and answers +===================== + +What about user space sandbox managers? +--------------------------------------- + +Using user space process to enforce restrictions on kernel resources can lead +to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of +the OS code and state +https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/`_). + +What about namespaces and containers? +------------------------------------- + +Namespaces can help create sandboxes but they are not designed for +access-control and then miss useful features for such use case (e.g. no +fine-grained restrictions). Moreover, their complexity can lead to security +issues, especially when untrusted processes can manipulate them (cf. +`Controlling access to user namespaces https://lwn.net/Articles/673597/`_). + +Additional documentation +======================== + +* :doc:`/security/landlock` +* https://landlock.io + +.. Links +.. _samples/landlock/sandboxer.c: + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/sample... diff --git a/MAINTAINERS b/MAINTAINERS index 88175ed1f315..7b0c6de5946a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10003,6 +10003,8 @@ L: linux-security-module@vger.kernel.org S: Supported W: https://landlock.io T: git https://github.com/landlock-lsm/linux.git +F: Documentation/security/landlock.rst +F: Documentation/userspace-api/landlock.rst F: include/uapi/linux/landlock.h F: samples/landlock/ F: security/landlock/
On Tue, Mar 16, 2021 at 09:42:52PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
This documentation can be built with the Sphinx framework.
Well, yes. :) Maybe describe what the documentation covers instead here. Regardless: yay docs! This is great.
[...] +Bind mounts and OverlayFS +-------------------------
+Landlock enables to restrict access to file hierarchies, which means that these +access rights can be propagated with bind mounts (cf. +:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`.
+A bind mount mirrors a source file hierarchy to a destination. The destination +hierarchy is then composed of the exact same files, on which Landlock rules can +be tied, either via the source or the destination path. These rules restrict +access when they are encountered on a path, which means that they can restrict +access to multiple file hierarchies at the same time, whether these hierarchies +are the result of bind mounts or not.
+An OverlayFS mount point consists of upper and lower layers. These layers are +combined in a merge directory, result of the mount point. This merge hierarchy +may include files from the upper and lower layers, but modifications performed +on the merge hierarchy only reflects on the upper layer. From a Landlock +policy point of view, each OverlayFS layers and merge hierarchies are +standalone and contains their own set of files and directories, which is +different from bind mounts. A policy restricting an OverlayFS layer will not +restrict the resulted merged hierarchy, and vice versa.
Can you include some examples about what a user of landlock should do? i.e. what are some examples of unexpected results when trying to write policy that runs on top of overlayfs, etc?
[...] +File renaming and linking +-------------------------
+Because Landlock targets unprivileged access controls, it is needed to properly +handle composition of rules. Such property also implies rules nesting. +Properly handling multiple layers of ruleset, each one of them able to restrict +access to files, also implies to inherit the ruleset restrictions from a parent +to its hierarchy. Because files are identified and restricted by their +hierarchy, moving or linking a file from one directory to another implies to +propagate the hierarchy constraints. To protect against privilege escalations +through renaming or linking, and for the sack of simplicity, Landlock currently
typo: sack -> sake
[...] +Special filesystems +-------------------
+Access to regular files and directories can be restricted by Landlock, +according to the handled accesses of a ruleset. However, files that do not +come from a user-visible filesystem (e.g. pipe, socket), but can still be +accessed through /proc/self/fd/, cannot currently be restricted. Likewise, +some special kernel filesystems such as nsfs, which can be accessed through +/proc/self/ns/, cannot currently be restricted. For now, these kind of special +paths are then always allowed. Future Landlock evolutions will enable to +restrict such paths with dedicated ruleset flags.
With this series, can /proc (at the top level) be blocked? (i.e. can a landlock user avoid the weirdness by making /proc/$pid/ unavailable?)
+Ruleset layers +--------------
+There is a limit of 64 layers of stacked rulesets. This can be an issue for a +task willing to enforce a new ruleset in complement to its 64 inherited +rulesets. Once this limit is reached, sys_landlock_restrict_self() returns +E2BIG. It is then strongly suggested to carefully build rulesets once in the +life of a thread, especially for applications able to launch other applications +that may also want to sandbox themselves (e.g. shells, container managers, +etc.).
How was this value (64) chosen?
On 19/03/2021 19:03, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:52PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
This documentation can be built with the Sphinx framework.
Well, yes. :) Maybe describe what the documentation covers instead here. Regardless: yay docs! This is great.
Well, I don't know what to describe other than the subject, the rest is in the patch. :)
[...] +Bind mounts and OverlayFS +-------------------------
+Landlock enables to restrict access to file hierarchies, which means that these +access rights can be propagated with bind mounts (cf. +:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`.
+A bind mount mirrors a source file hierarchy to a destination. The destination +hierarchy is then composed of the exact same files, on which Landlock rules can +be tied, either via the source or the destination path. These rules restrict +access when they are encountered on a path, which means that they can restrict +access to multiple file hierarchies at the same time, whether these hierarchies +are the result of bind mounts or not.
+An OverlayFS mount point consists of upper and lower layers. These layers are +combined in a merge directory, result of the mount point. This merge hierarchy +may include files from the upper and lower layers, but modifications performed +on the merge hierarchy only reflects on the upper layer. From a Landlock +policy point of view, each OverlayFS layers and merge hierarchies are +standalone and contains their own set of files and directories, which is +different from bind mounts. A policy restricting an OverlayFS layer will not +restrict the resulted merged hierarchy, and vice versa.
Can you include some examples about what a user of landlock should do? i.e. what are some examples of unexpected results when trying to write policy that runs on top of overlayfs, etc?
Landlock works well with overlayfs, at least from my point of view. It may be a bit disturbing with bind mount though, but it is the same with other access-controls (e.g. DAC). Landlock users should just think about file hierarchies and create their policies accordingly. A user should not try to adapt a policy according to his/her understanding of overlayfs, but just think about file hierarchies.
[...] +File renaming and linking +-------------------------
+Because Landlock targets unprivileged access controls, it is needed to properly +handle composition of rules. Such property also implies rules nesting. +Properly handling multiple layers of ruleset, each one of them able to restrict +access to files, also implies to inherit the ruleset restrictions from a parent +to its hierarchy. Because files are identified and restricted by their +hierarchy, moving or linking a file from one directory to another implies to +propagate the hierarchy constraints. To protect against privilege escalations +through renaming or linking, and for the sack of simplicity, Landlock currently
typo: sack -> sake
Indeed
[...] +Special filesystems +-------------------
+Access to regular files and directories can be restricted by Landlock, +according to the handled accesses of a ruleset. However, files that do not +come from a user-visible filesystem (e.g. pipe, socket), but can still be +accessed through /proc/self/fd/, cannot currently be restricted. Likewise, +some special kernel filesystems such as nsfs, which can be accessed through +/proc/self/ns/, cannot currently be restricted. For now, these kind of special +paths are then always allowed. Future Landlock evolutions will enable to +restrict such paths with dedicated ruleset flags.
With this series, can /proc (at the top level) be blocked? (i.e. can a landlock user avoid the weirdness by making /proc/$pid/ unavailable?)
/proc can be blocked, but not /proc/*/ns/* because of disconnected roots. I plan to address this.
+Ruleset layers +--------------
+There is a limit of 64 layers of stacked rulesets. This can be an issue for a +task willing to enforce a new ruleset in complement to its 64 inherited +rulesets. Once this limit is reached, sys_landlock_restrict_self() returns +E2BIG. It is then strongly suggested to carefully build rulesets once in the +life of a thread, especially for applications able to launch other applications +that may also want to sandbox themselves (e.g. shells, container managers, +etc.).
How was this value (64) chosen?
It was first an implementation constraint to limit the memory allocated for each rule, but it could now be increased if needed. The implementation still uses u64 for the sake of simplicity, but we could switch to a bitmap type.
On 19/03/2021 19:54, Mickaël Salaün wrote:
On 19/03/2021 19:03, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:52PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
This documentation can be built with the Sphinx framework.
Well, yes. :) Maybe describe what the documentation covers instead here. Regardless: yay docs! This is great.
What about that?
Add a first document describing userspace API: how to define and enforce a Landlock security policy. This is explained with a simple example. The Landlock system calls are described with their expected behavior and current limitations.
Another document is dedicated to kernel developers, describing guiding principles and some important kernel structures.
This documentation can be built with the Sphinx framework.
[...] +Bind mounts and OverlayFS +-------------------------
+Landlock enables to restrict access to file hierarchies, which means that these +access rights can be propagated with bind mounts (cf. +:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`.
+A bind mount mirrors a source file hierarchy to a destination. The destination +hierarchy is then composed of the exact same files, on which Landlock rules can +be tied, either via the source or the destination path. These rules restrict +access when they are encountered on a path, which means that they can restrict +access to multiple file hierarchies at the same time, whether these hierarchies +are the result of bind mounts or not.
+An OverlayFS mount point consists of upper and lower layers. These layers are +combined in a merge directory, result of the mount point. This merge hierarchy +may include files from the upper and lower layers, but modifications performed +on the merge hierarchy only reflects on the upper layer. From a Landlock +policy point of view, each OverlayFS layers and merge hierarchies are +standalone and contains their own set of files and directories, which is +different from bind mounts. A policy restricting an OverlayFS layer will not +restrict the resulted merged hierarchy, and vice versa.
Can you include some examples about what a user of landlock should do? i.e. what are some examples of unexpected results when trying to write policy that runs on top of overlayfs, etc?
Landlock works well with overlayfs, at least from my point of view. It may be a bit disturbing with bind mount though, but it is the same with other access-controls (e.g. DAC). Landlock users should just think about file hierarchies and create their policies accordingly. A user should not try to adapt a policy according to his/her understanding of overlayfs, but just think about file hierarchies.
[...] +File renaming and linking +-------------------------
+Because Landlock targets unprivileged access controls, it is needed to properly +handle composition of rules. Such property also implies rules nesting. +Properly handling multiple layers of ruleset, each one of them able to restrict +access to files, also implies to inherit the ruleset restrictions from a parent +to its hierarchy. Because files are identified and restricted by their +hierarchy, moving or linking a file from one directory to another implies to +propagate the hierarchy constraints. To protect against privilege escalations +through renaming or linking, and for the sack of simplicity, Landlock currently
typo: sack -> sake
Indeed
[...] +Special filesystems +-------------------
+Access to regular files and directories can be restricted by Landlock, +according to the handled accesses of a ruleset. However, files that do not +come from a user-visible filesystem (e.g. pipe, socket), but can still be +accessed through /proc/self/fd/, cannot currently be restricted. Likewise, +some special kernel filesystems such as nsfs, which can be accessed through +/proc/self/ns/, cannot currently be restricted. For now, these kind of special +paths are then always allowed. Future Landlock evolutions will enable to +restrict such paths with dedicated ruleset flags.
With this series, can /proc (at the top level) be blocked? (i.e. can a landlock user avoid the weirdness by making /proc/$pid/ unavailable?)
/proc can be blocked, but not /proc/*/ns/* because of disconnected roots. I plan to address this.
+Ruleset layers +--------------
+There is a limit of 64 layers of stacked rulesets. This can be an issue for a +task willing to enforce a new ruleset in complement to its 64 inherited +rulesets. Once this limit is reached, sys_landlock_restrict_self() returns +E2BIG. It is then strongly suggested to carefully build rulesets once in the +life of a thread, especially for applications able to launch other applications +that may also want to sandbox themselves (e.g. shells, container managers, +etc.).
How was this value (64) chosen?
It was first an implementation constraint to limit the memory allocated for each rule, but it could now be increased if needed. The implementation still uses u64 for the sake of simplicity, but we could switch to a bitmap type.
On 19/03/2021 19:54, Mickaël Salaün wrote:
On 19/03/2021 19:03, Kees Cook wrote:
On Tue, Mar 16, 2021 at 09:42:52PM +0100, Mickaël Salaün wrote:
From: Mickaël Salaün mic@linux.microsoft.com
[...]
[...] +Special filesystems +-------------------
+Access to regular files and directories can be restricted by Landlock, +according to the handled accesses of a ruleset. However, files that do not +come from a user-visible filesystem (e.g. pipe, socket), but can still be +accessed through /proc/self/fd/, cannot currently be restricted. Likewise, +some special kernel filesystems such as nsfs, which can be accessed through +/proc/self/ns/, cannot currently be restricted. For now, these kind of special +paths are then always allowed. Future Landlock evolutions will enable to +restrict such paths with dedicated ruleset flags.
With this series, can /proc (at the top level) be blocked? (i.e. can a landlock user avoid the weirdness by making /proc/$pid/ unavailable?)
/proc can be blocked, but not /proc/*/ns/* because of disconnected roots. I plan to address this.
It is important to note that access to sensitive /proc files such as ns/* and fd/* are automatically restricted according to domain hierarchies. I'll add this detail to the documentation. :)
I've queued this patchset here:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git landlock_lsm
and pulled it into next-testing, which will get it coverage in linux-next.
All going well, I'll aim to push this to Linus in the next merge window. More review and testing during that time will be helpful.
On 19/03/2021 00:26, James Morris wrote:
I've queued this patchset here:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git landlock_lsm
and pulled it into next-testing, which will get it coverage in linux-next.
All going well, I'll aim to push this to Linus in the next merge window. More review and testing during that time will be helpful.
Good, thanks! The syzkaller changes are now merged and up-to-date with linux-next: https://github.com/google/syzkaller/commits/3d01c4de549b4e4bddba6102715c212b...
linux-kselftest-mirror@lists.linaro.org