The patch titled
Subject: rapidio: fix an error in get_user_pages_fast() error handling
has been added to the -mm tree. Its filename is
rapidio-fix-an-error-in-get_user_pages_fast-error-handling.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/rapidio-fix-an-error-in-get_user_p…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/rapidio-fix-an-error-in-get_user_p…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: rapidio: fix an error in get_user_pages_fast() error handling
In the case of get_user_pages_fast() returning fewer pages than requested,
rio_dma_transfer() does not quite do the right thing. It attempts to
release all the pages that were requested, rather than just the pages that
were pinned.
Fix the error handling so that only the pages that were successfully
pinned are released.
Link: http://lkml.kernel.org/r/20200517235620.205225-2-jhubbard@nvidia.com
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Matt Porter <mporter(a)kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9(a)gmail.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/rapidio/devices/rio_mport_cdev.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/drivers/rapidio/devices/rio_mport_cdev.c~rapidio-fix-an-error-in-get_user_pages_fast-error-handling
+++ a/drivers/rapidio/devices/rio_mport_cdev.c
@@ -877,6 +877,11 @@ rio_dma_transfer(struct file *filp, u32
rmcd_error("pinned %ld out of %ld pages",
pinned, nr_pages);
ret = -EFAULT;
+ /*
+ * Set nr_pages up to mean "how many pages to unpin, in
+ * the error handler:
+ */
+ nr_pages = pinned;
goto err_pg;
}
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
rapidio-fix-an-error-in-get_user_pages_fast-error-handling.patch
mm-gup-introduce-pin_user_pages_unlocked.patch
ivtv-convert-get_user_pages-pin_user_pages.patch
rapidio-convert-get_user_pages-pin_user_pages.patch
In the case of get_user_pages_fast() returning fewer pages than
requested, rio_dma_transfer() does not quite do the right thing.
It attempts to release all the pages that were requested, rather
than just the pages that were pinned.
Fix the error handling so that only the pages that were successfully
pinned are released.
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Cc: Matt Porter <mporter(a)kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9(a)gmail.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: linux-media(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
drivers/rapidio/devices/rio_mport_cdev.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index 8155f59ece38..10af330153b5 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -877,6 +877,11 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
rmcd_error("pinned %ld out of %ld pages",
pinned, nr_pages);
ret = -EFAULT;
+ /*
+ * Set nr_pages up to mean "how many pages to unpin, in
+ * the error handler:
+ */
+ nr_pages = pinned;
goto err_pg;
}
--
2.26.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 37486135d3a7b03acc7755b63627a130437f066a Mon Sep 17 00:00:00 2001
From: Babu Moger <babu.moger(a)amd.com>
Date: Tue, 12 May 2020 18:59:06 -0500
Subject: [PATCH] KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it
to x86.c
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.
In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.
While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch. This will let SVM support protection keys.
Cc: stable(a)vger.kernel.org
Reported-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Babu Moger <babu.moger(a)amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit(a)naples-babu.amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9e8263b1e6fe..0a6b35353fc7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,6 +578,7 @@ struct kvm_vcpu_arch {
unsigned long cr4;
unsigned long cr4_guest_owned_bits;
unsigned long cr8;
+ u32 host_pkru;
u32 pkru;
u32 hflags;
u64 efer;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e45cf89c5821..89c766fad889 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1372,7 +1372,6 @@ void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmx_vcpu_pi_load(vcpu, cpu);
- vmx->host_pkru = read_pkru();
vmx->host_debugctlmsr = get_debugctlmsr();
}
@@ -6564,11 +6563,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
kvm_load_guest_xsave_state(vcpu);
- if (static_cpu_has(X86_FEATURE_PKU) &&
- kvm_read_cr4_bits(vcpu, X86_CR4_PKE) &&
- vcpu->arch.pkru != vmx->host_pkru)
- __write_pkru(vcpu->arch.pkru);
-
pt_guest_enter(vmx);
if (vcpu_to_pmu(vcpu)->version)
@@ -6658,18 +6652,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
pt_guest_exit(vmx);
- /*
- * eager fpu is enabled if PKEY is supported and CR4 is switched
- * back on host, so it is safe to read guest PKRU from current
- * XSAVE.
- */
- if (static_cpu_has(X86_FEATURE_PKU) &&
- kvm_read_cr4_bits(vcpu, X86_CR4_PKE)) {
- vcpu->arch.pkru = rdpkru();
- if (vcpu->arch.pkru != vmx->host_pkru)
- __write_pkru(vmx->host_pkru);
- }
-
kvm_load_host_xsave_state(vcpu);
vmx->nested.nested_run_pending = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 98176b80c481..d11eba8b85c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -837,11 +837,25 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
vcpu->arch.ia32_xss != host_xss)
wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
}
+
+ if (static_cpu_has(X86_FEATURE_PKU) &&
+ (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
+ (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
+ vcpu->arch.pkru != vcpu->arch.host_pkru)
+ __write_pkru(vcpu->arch.pkru);
}
EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
{
+ if (static_cpu_has(X86_FEATURE_PKU) &&
+ (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
+ (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
+ vcpu->arch.pkru = rdpkru();
+ if (vcpu->arch.pkru != vcpu->arch.host_pkru)
+ __write_pkru(vcpu->arch.host_pkru);
+ }
+
if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
if (vcpu->arch.xcr0 != host_xcr0)
@@ -3549,6 +3563,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_ops.vcpu_load(vcpu, cpu);
+ /* Save host pkru register if supported */
+ vcpu->arch.host_pkru = read_pkru();
+
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: d8e36b7ce54e - Makefile: disallow data races on gcc-10 as well
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: FAILED
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
We attempted to compile the kernel for multiple architectures, but the compile
failed on one or more architectures:
aarch64: FAILED (see build-aarch64.log.xz attachment)
ppc64le: FAILED (see build-ppc64le.log.xz attachment)
s390x: FAILED (see build-s390x.log.xz attachment)
x86_64: FAILED (see build-x86_64.log.xz attachment)
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b2a5212fb634561bb734c6356904e37f6665b955 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Fri, 15 May 2020 12:11:18 +0200
Subject: [PATCH] bpf: Restrict bpf_trace_printk()'s %s usage and add %pks,
%pus specifier
Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the
very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on
archs with overlapping address ranges.
While the helpers have been addressed through work in 6ae08ae3dea2 ("bpf: Add
probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need
an option for bpf_trace_printk() as well to fix it.
Similarly as with the helpers, force users to make an explicit choice by adding
%pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding
strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS.
The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended
for other objects aside strings that are probed and printed under tracing, and
reused out of other facilities like bpf_seq_printf() or BTF based type printing.
Existing behavior of %s for current users is still kept working for archs where it
is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
For archs not having this property we fall-back to pick probing under KERNEL_DS as
a sensible default.
Fixes: 8d3b7dce8622 ("bpf: add support for %s specifier to bpf_trace_printk()")
Reported-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Brendan Gregg <brendan.d.gregg(a)gmail.com>
Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 8ebe46b1af39..5dfcc4592b23 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -112,6 +112,20 @@ used when printing stack backtraces. The specifier takes into
consideration the effect of compiler optimisations which may occur
when tail-calls are used and marked with the noreturn GCC attribute.
+Probed Pointers from BPF / tracing
+----------------------------------
+
+::
+
+ %pks kernel string
+ %pus user string
+
+The ``k`` and ``u`` specifiers are used for printing prior probed memory from
+either kernel memory (k) or user memory (u). The subsequent ``s`` specifier
+results in printing a string. For direct use in regular vsnprintf() the (k)
+and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(),
+for example, it reads the memory it is pointing to without faulting.
+
Kernel Pointers
---------------
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b83bdaa31c7b..a010edc37ee0 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -323,17 +323,15 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
/*
* Only limited trace_printk() conversion specifiers allowed:
- * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %s
+ * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %pks %pus %s
*/
BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
u64, arg2, u64, arg3)
{
+ int i, mod[3] = {}, fmt_cnt = 0;
+ char buf[64], fmt_ptype;
+ void *unsafe_ptr = NULL;
bool str_seen = false;
- int mod[3] = {};
- int fmt_cnt = 0;
- u64 unsafe_addr;
- char buf[64];
- int i;
/*
* bpf_check()->check_func_arg()->check_stack_boundary()
@@ -359,40 +357,71 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
if (fmt[i] == 'l') {
mod[fmt_cnt]++;
i++;
- } else if (fmt[i] == 'p' || fmt[i] == 's') {
+ } else if (fmt[i] == 'p') {
mod[fmt_cnt]++;
+ if ((fmt[i + 1] == 'k' ||
+ fmt[i + 1] == 'u') &&
+ fmt[i + 2] == 's') {
+ fmt_ptype = fmt[i + 1];
+ i += 2;
+ goto fmt_str;
+ }
+
/* disallow any further format extensions */
if (fmt[i + 1] != 0 &&
!isspace(fmt[i + 1]) &&
!ispunct(fmt[i + 1]))
return -EINVAL;
- fmt_cnt++;
- if (fmt[i] == 's') {
- if (str_seen)
- /* allow only one '%s' per fmt string */
- return -EINVAL;
- str_seen = true;
-
- switch (fmt_cnt) {
- case 1:
- unsafe_addr = arg1;
- arg1 = (long) buf;
- break;
- case 2:
- unsafe_addr = arg2;
- arg2 = (long) buf;
- break;
- case 3:
- unsafe_addr = arg3;
- arg3 = (long) buf;
- break;
- }
- buf[0] = 0;
- strncpy_from_unsafe(buf,
- (void *) (long) unsafe_addr,
+
+ goto fmt_next;
+ } else if (fmt[i] == 's') {
+ mod[fmt_cnt]++;
+ fmt_ptype = fmt[i];
+fmt_str:
+ if (str_seen)
+ /* allow only one '%s' per fmt string */
+ return -EINVAL;
+ str_seen = true;
+
+ if (fmt[i + 1] != 0 &&
+ !isspace(fmt[i + 1]) &&
+ !ispunct(fmt[i + 1]))
+ return -EINVAL;
+
+ switch (fmt_cnt) {
+ case 0:
+ unsafe_ptr = (void *)(long)arg1;
+ arg1 = (long)buf;
+ break;
+ case 1:
+ unsafe_ptr = (void *)(long)arg2;
+ arg2 = (long)buf;
+ break;
+ case 2:
+ unsafe_ptr = (void *)(long)arg3;
+ arg3 = (long)buf;
+ break;
+ }
+
+ buf[0] = 0;
+ switch (fmt_ptype) {
+ case 's':
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ strncpy_from_unsafe(buf, unsafe_ptr,
sizeof(buf));
+ break;
+#endif
+ case 'k':
+ strncpy_from_unsafe_strict(buf, unsafe_ptr,
+ sizeof(buf));
+ break;
+ case 'u':
+ strncpy_from_unsafe_user(buf,
+ (__force void __user *)unsafe_ptr,
+ sizeof(buf));
+ break;
}
- continue;
+ goto fmt_next;
}
if (fmt[i] == 'l') {
@@ -403,6 +432,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
if (fmt[i] != 'i' && fmt[i] != 'd' &&
fmt[i] != 'u' && fmt[i] != 'x')
return -EINVAL;
+fmt_next:
fmt_cnt++;
}
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 7c488a1ce318..532b6606a18a 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2168,6 +2168,10 @@ char *fwnode_string(char *buf, char *end, struct fwnode_handle *fwnode,
* f full name
* P node name, including a possible unit address
* - 'x' For printing the address. Equivalent to "%lx".
+ * - '[ku]s' For a BPF/tracing related format specifier, e.g. used out of
+ * bpf_trace_printk() where [ku] prefix specifies either kernel (k)
+ * or user (u) memory to probe, and:
+ * s a string, equivalent to "%s" on direct vsnprintf() use
*
* ** When making changes please also update:
* Documentation/core-api/printk-formats.rst
@@ -2251,6 +2255,14 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
if (!IS_ERR(ptr))
break;
return err_ptr(buf, end, ptr, spec);
+ case 'u':
+ case 'k':
+ switch (fmt[1]) {
+ case 's':
+ return string(buf, end, ptr, spec);
+ default:
+ return error_string(buf, end, "(einval)", spec);
+ }
}
/* default is to _not_ leak addresses, hash before printing */