The patch titled
Subject: proc/vmcore: fix clearing user buffer by properly using clear_user()
has been added to the -mm tree. Its filename is
proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/proc-vmcore-fix-clearing-user-buf…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/proc-vmcore-fix-clearing-user-buf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: proc/vmcore: fix clearing user buffer by properly using clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and has
some logically unplugged memory inside an added Linux memory block, I can
easily trigger a BUG by copying the vmcore via "cp":
[ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service...
[ 11.339697] kdump[420]: Kdump is using the default log level(3).
[ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete
[ 11.386722] kdump[467]: saving vmcore
[ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000
[ 16.531705] #PF: supervisor write access in kernel mode
[ 16.532037] #PF: error_code(0x0003) - permissions violation
[ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
[ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI
[ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
[ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
[ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
[ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
[ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
[ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
[ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
[ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
[ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
[ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
[ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
[ 16.539510] Call Trace:
[ 16.539679] <TASK>
[ 16.539828] read_vmcore+0x236/0x2c0
[ 16.540063] ? enqueue_hrtimer+0x2f/0x80
[ 16.540323] ? inode_security+0x22/0x60
[ 16.540572] proc_reg_read+0x55/0xa0
[ 16.540807] vfs_read+0x95/0x190
[ 16.541022] ksys_read+0x4f/0xc0
[ 16.541238] do_syscall_64+0x3b/0x90
[ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on wrong
access. In the x86-64 variant of clear_user(), SMAP is properly handled
via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
--- a/fs/proc/vmcore.c~proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user
+++ a/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
_
Patches currently in -mm which might be from david(a)redhat.com are
proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user.patch
Booting to Android userspace on 5.14 or newer triggers the following
SELinux denial:
avc: denied { sys_nice } for comm="init" capability=23
scontext=u:r:init:s0 tcontext=u:r:init:s0 tclass=capability
permissive=0
Init is PID 0 running as root, so it already has CAP_SYS_ADMIN. For
better compatibility with older SEPolicy, check ADMIN before NICE.
Fixes: 9d3a39a5f1e4 ("block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE")
Signed-off-by: Alistair Delva <adelva(a)google.com>
Cc: Khazhismel Kumykov <khazhy(a)google.com>
Cc: Bart Van Assche <bvanassche(a)acm.org>
Cc: Serge Hallyn <serge(a)hallyn.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: selinux(a)vger.kernel.org
Cc: linux-security-module(a)vger.kernel.org
Cc: kernel-team(a)android.com
Cc: stable(a)vger.kernel.org # v5.14+
---
v2: added comment requested by Jens
block/ioprio.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/ioprio.c b/block/ioprio.c
index 0e4ff245f2bf..313c14a70bbd 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -69,7 +69,14 @@ int ioprio_check_cap(int ioprio)
switch (class) {
case IOPRIO_CLASS_RT:
- if (!capable(CAP_SYS_NICE) && !capable(CAP_SYS_ADMIN))
+ /*
+ * Originally this only checked for CAP_SYS_ADMIN,
+ * which was implicitly allowed for pid 0 by security
+ * modules such as SELinux. Make sure we check
+ * CAP_SYS_ADMIN first to avoid a denial/avc for
+ * possibly missing CAP_SYS_NICE permission.
+ */
+ if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_NICE))
return -EPERM;
fallthrough;
/* rt has prio field too */
--
2.34.0.rc1.387.gb447b232ab-goog