This is a note to let you know that I've just added the patch titled
fm10k: Use smp_rmb rather than read_barrier_depends
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fm10k-use-smp_rmb-rather-than-read_barrier_depends.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7b8edcc685b5e2c3c37aa13dc50a88e84a5bfef8 Mon Sep 17 00:00:00 2001
From: Brian King <brking(a)linux.vnet.ibm.com>
Date: Fri, 17 Nov 2017 11:05:48 -0600
Subject: fm10k: Use smp_rmb rather than read_barrier_depends
From: Brian King <brking(a)linux.vnet.ibm.com>
commit 7b8edcc685b5e2c3c37aa13dc50a88e84a5bfef8 upstream.
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with fm10k as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Signed-off-by: Brian King <brking(a)linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1218,7 +1218,7 @@ static bool fm10k_clean_tx_irq(struct fm
break;
/* prevent any other reads prior to eop_desc */
- read_barrier_depends();
+ smp_rmb();
/* if DD is not set pending work has not been completed */
if (!(eop_desc->flags & FM10K_TXD_FLAG_DONE))
Patches currently in stable-queue which might be from brking(a)linux.vnet.ibm.com are
queue-3.18/i40evf-use-smp_rmb-rather-than-read_barrier_depends.patch
queue-3.18/igb-use-smp_rmb-rather-than-read_barrier_depends.patch
queue-3.18/ixgbevf-use-smp_rmb-rather-than-read_barrier_depends.patch
queue-3.18/igbvf-use-smp_rmb-rather-than-read_barrier_depends.patch
queue-3.18/fm10k-use-smp_rmb-rather-than-read_barrier_depends.patch
queue-3.18/ixgbe-fix-skb-list-corruption-on-power-systems.patch
queue-3.18/i40e-use-smp_rmb-rather-than-read_barrier_depends.patch
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c3b047fa2d2286d5e438bcb470c7b1a49f415f6 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 21 Sep 2017 05:40:18 -0300
Subject: [PATCH] [media] cx231xx-cards: fix NULL-deref on missing association
descriptor
Make sure to check that we actually have an Interface Association
Descriptor before dereferencing it during probe to avoid dereferencing a
NULL-pointer.
Fixes: e0d3bafd0258 ("V4L/DVB (10954): Add cx231xx USB driver")
Cc: stable <stable(a)vger.kernel.org> # 2.6.30
Reported-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Tested-by: Andrey Konovalov <andreyknvl(a)google.com>
Signed-off-by: Hans Verkuil <hans.verkuil(a)cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)osg.samsung.com>
diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
index e0daa9b6c2a0..9b742d569fb5 100644
--- a/drivers/media/usb/cx231xx/cx231xx-cards.c
+++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
@@ -1684,7 +1684,7 @@ static int cx231xx_usb_probe(struct usb_interface *interface,
nr = dev->devno;
assoc_desc = udev->actconfig->intf_assoc[0];
- if (assoc_desc->bFirstInterface != ifnum) {
+ if (!assoc_desc || assoc_desc->bFirstInterface != ifnum) {
dev_err(d, "Not found matching IAD interface\n");
retval = -ENODEV;
goto err_if;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f23ab3efb1b30cc5c5ef5ae4ef294ed467f30675 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Date: Fri, 10 Nov 2017 12:15:00 +1100
Subject: [PATCH] powerpc: Fix DABR match on hash based systems
Commit 398a719d34a1 ("powerpc/mm: Update bits used to skip hash_page")
mistakenly dropped the DSISR_DABRMATCH bit from the mask of bit tested
to skip trying to hash a page.
As a result, the DABR matches would no longer be detected.
This adds it back. We open code it in the 2 places where it matters
rather than fold it into DSISR_BAD_FAULT_32S/64S because this isn't
technically a bad fault and while we would never hit it with the
current code, I prefer if page_fault_is_bad() didn't trigger on these.
Fixes: 398a719d34a1 ("powerpc/mm: Update bits used to skip hash_page")
Cc: stable(a)vger.kernel.org # v4.14
Tested-by: Pedro Miraglia Franco de Carvalho <pedromfc(a)br.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6879aed47377..445137e2d0ca 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1498,7 +1498,7 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
do_hash_page:
#ifdef CONFIG_PPC_BOOK3S_64
- lis r0,DSISR_BAD_FAULT_64S@h
+ lis r0,(DSISR_BAD_FAULT_64S|DSISR_DABRMATCH)@h
ori r0,r0,DSISR_BAD_FAULT_64S@l
and. r0,r4,r0 /* weird error? */
bne- handle_page_fault /* if not, try to insert a HPTE */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 8c54166491e7..29b2fed93289 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -388,7 +388,7 @@ DataAccess:
EXCEPTION_PROLOG
mfspr r10,SPRN_DSISR
stw r10,_DSISR(r11)
- andis. r0,r10,DSISR_BAD_FAULT_32S@h
+ andis. r0,r10,(DSISR_BAD_FAULT_32S|DSISR_DABRMATCH)@h
bne 1f /* if not, try to put a PTE */
mfspr r4,SPRN_DAR /* into the hash table */
rlwinm r3,r10,32-15,21,21 /* DSISR_STORE -> _PAGE_RW */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a06c66835f75fe2be4f154a93cc30cb81734b81 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Date: Fri, 10 Nov 2017 10:25:07 +0530
Subject: [PATCH] powerpc/64s/slice: Use addr limit when computing slice mask
While computing slice mask for the free area we need make sure we only
search in the addr limit applicable for this mmap. We update the
slb_addr_limit after we request for a mmap above 128TB. But the
following mmap request with hint addr below 128TB should still limit
its search to below 128TB. ie. we should not use slb_addr_limit to
compute slice mask in this case. Instead, we should derive high addr
limit based on the mmap hint addr value.
Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB")
Cc: stable(a)vger.kernel.org # v4.12+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 564fff06f5c1..23ec2c5e3b78 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -122,7 +122,8 @@ static int slice_high_has_vma(struct mm_struct *mm, unsigned long slice)
return !slice_area_is_free(mm, start, end - start);
}
-static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret)
+static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
+ unsigned long high_limit)
{
unsigned long i;
@@ -133,15 +134,16 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret)
if (!slice_low_has_vma(mm, i))
ret->low_slices |= 1u << i;
- if (mm->context.slb_addr_limit <= SLICE_LOW_TOP)
+ if (high_limit <= SLICE_LOW_TOP)
return;
- for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++)
+ for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
if (!slice_high_has_vma(mm, i))
__set_bit(i, ret->high_slices);
}
-static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret)
+static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret,
+ unsigned long high_limit)
{
unsigned char *hpsizes;
int index, mask_index;
@@ -156,8 +158,11 @@ static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_ma
if (((lpsizes >> (i * 4)) & 0xf) == psize)
ret->low_slices |= 1u << i;
+ if (high_limit <= SLICE_LOW_TOP)
+ return;
+
hpsizes = mm->context.high_slices_psize;
- for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
+ for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
mask_index = i & 0x1;
index = i >> 1;
if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
@@ -169,6 +174,10 @@ static int slice_check_fit(struct mm_struct *mm,
struct slice_mask mask, struct slice_mask available)
{
DECLARE_BITMAP(result, SLICE_NUM_HIGH);
+ /*
+ * Make sure we just do bit compare only to the max
+ * addr limit and not the full bit map size.
+ */
unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
bitmap_and(result, mask.high_slices,
@@ -472,7 +481,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
/* First make up a "good" mask of slices that have the right size
* already
*/
- slice_mask_for_size(mm, psize, &good_mask);
+ slice_mask_for_size(mm, psize, &good_mask, high_limit);
slice_print_mask(" good_mask", good_mask);
/*
@@ -497,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
#ifdef CONFIG_PPC_64K_PAGES
/* If we support combo pages, we can allow 64k pages in 4k slices */
if (psize == MMU_PAGE_64K) {
- slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask);
+ slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
if (fixed)
slice_or_mask(&good_mask, &compat_mask);
}
@@ -530,11 +539,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
return newaddr;
}
}
-
- /* We don't fit in the good mask, check what other slices are
+ /*
+ * We don't fit in the good mask, check what other slices are
* empty and thus can be converted
*/
- slice_mask_for_free(mm, &potential_mask);
+ slice_mask_for_free(mm, &potential_mask, high_limit);
slice_or_mask(&potential_mask, &good_mask);
slice_print_mask(" potential", potential_mask);
@@ -744,17 +753,18 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
{
struct slice_mask mask, available;
unsigned int psize = mm->context.user_psize;
+ unsigned long high_limit = mm->context.slb_addr_limit;
if (radix_enabled())
return 0;
slice_range_to_mask(addr, len, &mask);
- slice_mask_for_size(mm, psize, &available);
+ slice_mask_for_size(mm, psize, &available, high_limit);
#ifdef CONFIG_PPC_64K_PAGES
/* We need to account for 4k slices too */
if (psize == MMU_PAGE_64K) {
struct slice_mask compat_mask;
- slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask);
+ slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
slice_or_mask(&available, &compat_mask);
}
#endif
This is a note to let you know that I've just added the patch titled
bcache: only permit to recovery read error when cache device is clean
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d59b23795933678c9638fd20c942d2b4f3cd6185 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Mon, 30 Oct 2017 14:46:31 -0700
Subject: bcache: only permit to recovery read error when cache device is clean
From: Coly Li <colyli(a)suse.de>
commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream.
When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.
For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.
With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.
For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.
Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc->has_data in writethrough mode still makes sense.
Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
bug to fix, and option to permit it is unnecessary. So this version
the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure to
allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.
[small change to patch comment spelling by mlyle]
Signed-off-by: Coly Li <colyli(a)suse.de>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Reported-by: Arne Wolf <awolf(a)lenovo.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Cc: Nix <nix(a)esperi.org.uk>
Cc: Kai Krakow <hurikhan77(a)gmail.com>
Cc: Eric Wheeler <bcache(a)lists.ewheeler.net>
Cc: Junhui Tang <tang.junhui(a)zte.com.cn>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/request.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -702,8 +702,16 @@ static void cached_dev_read_error(struct
{
struct search *s = container_of(cl, struct search, cl);
struct bio *bio = &s->bio.bio;
+ struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
- if (s->recoverable) {
+ /*
+ * If cache device is dirty (dc->has_dirty is non-zero), then
+ * recovery a failed read request from cached device may get a
+ * stale data back. So read failure recovery is only permitted
+ * when cache device is clean.
+ */
+ if (s->recoverable &&
+ (dc && !atomic_read(&dc->has_dirty))) {
/* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio);
Patches currently in stable-queue which might be from colyli(a)suse.de are
queue-4.9/bcache-only-permit-to-recovery-read-error-when-cache-device-is-clean.patch
queue-4.9/bcache-check-ca-alloc_thread-initialized-before-wake-up-it.patch
This is a note to let you know that I've just added the patch titled
time: Always make sure wall_to_monotonic isn't positive
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
time-always-make-sure-wall_to_monotonic-isn-t-positive.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e1d7ba8735551ed79c7a0463a042353574b96da3 Mon Sep 17 00:00:00 2001
From: Wang YanQing <udknight(a)gmail.com>
Date: Tue, 23 Jun 2015 18:38:54 +0800
Subject: time: Always make sure wall_to_monotonic isn't positive
From: Wang YanQing <udknight(a)gmail.com>
commit e1d7ba8735551ed79c7a0463a042353574b96da3 upstream.
Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).
Issue 1:exportfs -a generate:
"exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
"btime 4294967236"
The same issues can be reproduced on x86 after running the
following code:
int main(void)
{
struct timeval val;
int ret;
val.tv_sec = 0;
val.tv_usec = 0;
ret = settimeofday(&val, NULL);
return 0;
}
Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.
In symptom 1:
negative boot time cause get_expiry() to overflow time_t
when input expire time is 2147483647, then cache_flush()
always clears entries just added in ip_map_parse.
In symptom 2:
show_stat() uses "unsigned long" to print negative btime
value returned by getboottime.
This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).
Cc: Prarit Bhargava <prarit(a)redhat.com>
Cc: Richard Cochran <richardcochran(a)gmail.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Wang YanQing <udknight(a)gmail.com>
[jstultz: reworded commit message]
[msfjarvis: Backport to 3.18 as we are missing the do_settimeofday64
function the upstream commit patches, so we apply the changes to
do_settimeofday]
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Harsh Shandilya <msfjarvis(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/time/timekeeping.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -712,6 +712,7 @@ int do_settimeofday(const struct timespe
struct timekeeper *tk = &tk_core.timekeeper;
struct timespec64 ts_delta, xt, tmp;
unsigned long flags;
+ int ret = 0;
if (!timespec_valid_strict(tv))
return -EINVAL;
@@ -725,11 +726,16 @@ int do_settimeofday(const struct timespe
ts_delta.tv_sec = tv->tv_sec - xt.tv_sec;
ts_delta.tv_nsec = tv->tv_nsec - xt.tv_nsec;
+ if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts_delta));
tmp = timespec_to_timespec64(*tv);
tk_set_xtime(tk, &tmp);
-
+out:
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
@@ -738,7 +744,7 @@ int do_settimeofday(const struct timespe
/* signal hrtimers about time change */
clock_was_set();
- return 0;
+ return ret;
}
EXPORT_SYMBOL(do_settimeofday);
@@ -767,7 +773,8 @@ int timekeeping_inject_offset(struct tim
/* Make sure the proposed value is valid */
tmp = timespec64_add(tk_xtime(tk), ts64);
- if (!timespec64_valid_strict(&tmp)) {
+ if (timespec64_compare(&tk->wall_to_monotonic, &ts64) > 0 ||
+ !timespec64_valid_strict(&tmp)) {
ret = -EINVAL;
goto error;
}
Patches currently in stable-queue which might be from udknight(a)gmail.com are
queue-3.18/time-always-make-sure-wall_to_monotonic-isn-t-positive.patch
From: Wang YanQing <udknight(a)gmail.com>
commit e1d7ba8735551ed79c7a0463a042353574b96da3 upstream.
Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).
Issue 1:exportfs -a generate:
"exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
"btime 4294967236"
The same issues can be reproduced on x86 after running the
following code:
int main(void)
{
struct timeval val;
int ret;
val.tv_sec = 0;
val.tv_usec = 0;
ret = settimeofday(&val, NULL);
return 0;
}
Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.
In symptom 1:
negative boot time cause get_expiry() to overflow time_t
when input expire time is 2147483647, then cache_flush()
always clears entries just added in ip_map_parse.
In symptom 2:
show_stat() uses "unsigned long" to print negative btime
value returned by getboottime.
This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).
Change-Id: I19acf5df5cc34dd388de0dc633723fe73adc077e
Cc: Prarit Bhargava <prarit(a)redhat.com>
Cc: Richard Cochran <richardcochran(a)gmail.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Wang YanQing <udknight(a)gmail.com>
[jstultz: reworded commit message]
[msfjarvis: Backport to 3.18 as we are missing the do_settimeofday64
function the upstream commit patches, so we apply the changes to
do_settimeofday]
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Harsh Shandilya <msfjarvis(a)gmail.com>
---
kernel/time/timekeeping.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 07b1c31898bd..caece7c8e44c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -713,6 +713,7 @@ int do_settimeofday(const struct timespec *tv)
struct timekeeper *tk = &tk_core.timekeeper;
struct timespec64 ts_delta, xt, tmp;
unsigned long flags;
+ int ret = 0;
if (!timespec_valid_strict(tv))
return -EINVAL;
@@ -726,11 +727,16 @@ int do_settimeofday(const struct timespec *tv)
ts_delta.tv_sec = tv->tv_sec - xt.tv_sec;
ts_delta.tv_nsec = tv->tv_nsec - xt.tv_nsec;
+ if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts_delta));
tmp = timespec_to_timespec64(*tv);
tk_set_xtime(tk, &tmp);
-
+out:
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
@@ -739,7 +745,7 @@ int do_settimeofday(const struct timespec *tv)
/* signal hrtimers about time change */
clock_was_set();
- return 0;
+ return ret;
}
EXPORT_SYMBOL(do_settimeofday);
@@ -768,7 +774,8 @@ int timekeeping_inject_offset(struct timespec *ts)
/* Make sure the proposed value is valid */
tmp = timespec64_add(tk_xtime(tk), ts64);
- if (!timespec64_valid_strict(&tmp)) {
+ if (timespec64_compare(&tk->wall_to_monotonic, &ts64) > 0 ||
+ !timespec64_valid_strict(&tmp)) {
ret = -EINVAL;
goto error;
}
--
2.15.0.631.g7ddcec0
This is a note to let you know that I've just added the patch titled
svcrdma: Preserve CB send buffer across retransmits
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
svcrdma-preserve-cb-send-buffer-across-retransmits.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0bad47cada5defba13e98827d22d06f13258dfb3 Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Mon, 16 Oct 2017 12:14:33 -0400
Subject: svcrdma: Preserve CB send buffer across retransmits
From: Chuck Lever <chuck.lever(a)oracle.com>
commit 0bad47cada5defba13e98827d22d06f13258dfb3 upstream.
During each NFSv4 callback Call, an RDMA Send completion frees the
page that contains the RPC Call message. If the upper layer
determines that a retransmit is necessary, this is too soon.
One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
callback request, the following BUG fires on the NFS server:
kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2
kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0
kernel: flags: 0x2fffff80000000()
kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: Call Trace:
kernel: dump_stack+0x62/0x80
kernel: bad_page+0xfe/0x11a
kernel: free_pages_check_bad+0x76/0x78
kernel: free_pcppages_bulk+0x364/0x441
kernel: ? ttwu_do_activate.isra.61+0x71/0x78
kernel: free_hot_cold_page+0x1c5/0x202
kernel: __put_page+0x2c/0x36
kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]
This issue exists all the way back to v4.5, but refactoring and code
re-organization prevents this simple patch from applying to kernels
older than v4.12. The fix is the same, however, if someone needs to
backport it.
Reported-by: Ben Coddington <bcodding(a)redhat.com>
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Signed-off-by: J. Bruce Fields <bfields(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -133,6 +133,10 @@ static int svc_rdma_bc_sendto(struct svc
if (ret)
goto out_err;
+ /* Bump page refcnt so Send completion doesn't release
+ * the rq_buffer before all retransmits are complete.
+ */
+ get_page(virt_to_page(rqst->rq_buffer));
ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
if (ret)
goto out_unmap;
@@ -165,7 +169,6 @@ xprt_rdma_bc_allocate(struct rpc_task *t
return -EINVAL;
}
- /* svc_rdma_sendto releases this page */
page = alloc_page(RPCRDMA_DEF_GFP);
if (!page)
return -ENOMEM;
@@ -184,6 +187,7 @@ xprt_rdma_bc_free(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
+ put_page(virt_to_page(rqst->rq_buffer));
kfree(rqst->rq_rbuffer);
}
Patches currently in stable-queue which might be from chuck.lever(a)oracle.com are
queue-4.14/nfs-fix-ugly-referral-attributes.patch
queue-4.14/svcrdma-preserve-cb-send-buffer-across-retransmits.patch
This is a note to let you know that I've just added the patch titled
SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sunrpc-fix-tracepoint-storage-issues-with-svc_recv-and-svc_rqst_status.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e9d4bf219c83d09579bc62512fea2ca10f025d93 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)primarydata.com>
Date: Tue, 10 Oct 2017 17:31:42 -0400
Subject: SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
From: Trond Myklebust <trond.myklebust(a)primarydata.com>
commit e9d4bf219c83d09579bc62512fea2ca10f025d93 upstream.
There is no guarantee that either the request or the svc_xprt exist
by the time we get round to printing the trace message.
Signed-off-by: Trond Myklebust <trond.myklebust(a)primarydata.com>
Signed-off-by: J. Bruce Fields <bfields(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/trace/events/sunrpc.h | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -456,20 +456,22 @@ TRACE_EVENT(svc_recv,
TP_ARGS(rqst, status),
TP_STRUCT__entry(
- __field(struct sockaddr *, addr)
__field(__be32, xid)
__field(int, status)
__field(unsigned long, flags)
+ __dynamic_array(unsigned char, addr, rqst->rq_addrlen)
),
TP_fast_assign(
- __entry->addr = (struct sockaddr *)&rqst->rq_addr;
__entry->xid = status > 0 ? rqst->rq_xid : 0;
__entry->status = status;
__entry->flags = rqst->rq_flags;
+ memcpy(__get_dynamic_array(addr),
+ &rqst->rq_addr, rqst->rq_addrlen);
),
- TP_printk("addr=%pIScp xid=0x%x status=%d flags=%s", __entry->addr,
+ TP_printk("addr=%pIScp xid=0x%x status=%d flags=%s",
+ (struct sockaddr *)__get_dynamic_array(addr),
be32_to_cpu(__entry->xid), __entry->status,
show_rqstp_flags(__entry->flags))
);
@@ -514,22 +516,23 @@ DECLARE_EVENT_CLASS(svc_rqst_status,
TP_ARGS(rqst, status),
TP_STRUCT__entry(
- __field(struct sockaddr *, addr)
__field(__be32, xid)
- __field(int, dropme)
__field(int, status)
__field(unsigned long, flags)
+ __dynamic_array(unsigned char, addr, rqst->rq_addrlen)
),
TP_fast_assign(
- __entry->addr = (struct sockaddr *)&rqst->rq_addr;
__entry->xid = rqst->rq_xid;
__entry->status = status;
__entry->flags = rqst->rq_flags;
+ memcpy(__get_dynamic_array(addr),
+ &rqst->rq_addr, rqst->rq_addrlen);
),
TP_printk("addr=%pIScp rq_xid=0x%x status=%d flags=%s",
- __entry->addr, be32_to_cpu(__entry->xid),
+ (struct sockaddr *)__get_dynamic_array(addr),
+ be32_to_cpu(__entry->xid),
__entry->status, show_rqstp_flags(__entry->flags))
);
Patches currently in stable-queue which might be from trond.myklebust(a)primarydata.com are
queue-4.14/nfsd-deal-with-revoked-delegations-appropriately.patch
queue-4.14/sunrpc-fix-tracepoint-storage-issues-with-svc_recv-and-svc_rqst_status.patch
This is a note to let you know that I've just added the patch titled
spi-nor: intel-spi: Fix broken software sequencing codes
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
spi-nor-intel-spi-fix-broken-software-sequencing-codes.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9d63f17661e25fd28714dac94bdebc4ff5b75f09 Mon Sep 17 00:00:00 2001
From: Bin Meng <bmeng.cn(a)gmail.com>
Date: Mon, 11 Sep 2017 02:41:53 -0700
Subject: spi-nor: intel-spi: Fix broken software sequencing codes
From: Bin Meng <bmeng.cn(a)gmail.com>
commit 9d63f17661e25fd28714dac94bdebc4ff5b75f09 upstream.
There are two bugs in current intel_spi_sw_cycle():
- The 'data byte count' field should be the number of bytes
transferred minus 1
- SSFSTS_CTL is the offset from ispi->sregs, not ispi->base
Signed-off-by: Bin Meng <bmeng.cn(a)gmail.com>
Acked-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen(a)wedev4u.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/spi-nor/intel-spi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mtd/spi-nor/intel-spi.c
+++ b/drivers/mtd/spi-nor/intel-spi.c
@@ -422,7 +422,7 @@ static int intel_spi_sw_cycle(struct int
if (ret < 0)
return ret;
- val = (len << SSFSTS_CTL_DBC_SHIFT) | SSFSTS_CTL_DS;
+ val = ((len - 1) << SSFSTS_CTL_DBC_SHIFT) | SSFSTS_CTL_DS;
val |= ret << SSFSTS_CTL_COP_SHIFT;
val |= SSFSTS_CTL_FCERR | SSFSTS_CTL_FDONE;
val |= SSFSTS_CTL_SCGO;
@@ -432,7 +432,7 @@ static int intel_spi_sw_cycle(struct int
if (ret)
return ret;
- status = readl(ispi->base + SSFSTS_CTL);
+ status = readl(ispi->sregs + SSFSTS_CTL);
if (status & SSFSTS_CTL_FCERR)
return -EIO;
else if (status & SSFSTS_CTL_AEL)
Patches currently in stable-queue which might be from bmeng.cn(a)gmail.com are
queue-4.14/spi-nor-intel-spi-fix-broken-software-sequencing-codes.patch
This is a note to let you know that I've just added the patch titled
NFC: fix device-allocation error return
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nfc-fix-device-allocation-error-return.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c45e3e4c5b134b081e8af362109905427967eb19 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Sun, 9 Jul 2017 13:08:58 +0200
Subject: NFC: fix device-allocation error return
From: Johan Hovold <johan(a)kernel.org>
commit c45e3e4c5b134b081e8af362109905427967eb19 upstream.
A recent change fixing NFC device allocation itself introduced an
error-handling bug by returning an error pointer in case device-id
allocation failed. This is clearly broken as the callers still expected
NULL to be returned on errors as detected by Dan's static checker.
Fix this up by returning NULL in the event that we've run out of memory
when allocating a new device id.
Note that the offending commit is marked for stable (3.8) so this fix
needs to be backported along with it.
Fixes: 20777bc57c34 ("NFC: fix broken device allocation")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Samuel Ortiz <sameo(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/nfc/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -1106,7 +1106,7 @@ struct nfc_dev *nfc_allocate_device(stru
err_free_dev:
kfree(dev);
- return ERR_PTR(rc);
+ return NULL;
}
EXPORT_SYMBOL(nfc_allocate_device);
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/serdev-fix-registration-of-second-slave.patch
queue-4.14/clk-ti-dra7-atl-clock-fix-child-node-lookups.patch
queue-4.14/irqchip-gic-v3-fix-ppi-partitions-lookup.patch
queue-4.14/nfc-fix-device-allocation-error-return.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, region : make 'resource' attribute only readable by root
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b8ff981f88df03c72a4de2f6eaa9ce447a10ac03 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 26 Sep 2017 11:17:52 -0700
Subject: libnvdimm, region : make 'resource' attribute only readable by root
From: Dan Williams <dan.j.williams(a)intel.com>
commit b8ff981f88df03c72a4de2f6eaa9ce447a10ac03 upstream.
For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for region
devices only readable by root. Otherwise we disclose physical address
information.
Fixes: 802f4be6feee ("libnvdimm: Add 'resource' sysfs attribute to regions")
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/region_devs.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -562,8 +562,12 @@ static umode_t region_visible(struct kob
if (!is_nd_pmem(dev) && a == &dev_attr_badblocks.attr)
return 0;
- if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
- return 0;
+ if (a == &dev_attr_resource.attr) {
+ if (is_nd_pmem(dev))
+ return 0400;
+ else
+ return 0;
+ }
if (a == &dev_attr_deep_flush.attr) {
int has_flush = nvdimm_has_flush(nd_region);
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
queue-4.14/libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
queue-4.14/libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-pmd-faults-on-zero-length-files.patch
queue-4.14/libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-general-protection-fault-in-dax_alloc_inode.patch
queue-4.14/libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, pfn: make 'resource' attribute only readable by root
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 26417ae4fc6108f8db436f24108b08f68bdc520e Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 26 Sep 2017 13:07:06 -0700
Subject: libnvdimm, pfn: make 'resource' attribute only readable by root
From: Dan Williams <dan.j.williams(a)intel.com>
commit 26417ae4fc6108f8db436f24108b08f68bdc520e upstream.
For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for pfn
devices only readable by root. Otherwise we disclose physical address
information.
Fixes: f6ed58c70d14 ("libnvdimm, pfn: 'resource'-address and 'size'...")
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/pfn_devs.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -282,8 +282,16 @@ static struct attribute *nd_pfn_attribut
NULL,
};
+static umode_t pfn_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ if (a == &dev_attr_resource.attr)
+ return 0400;
+ return a->mode;
+}
+
struct attribute_group nd_pfn_attribute_group = {
.attrs = nd_pfn_attributes,
+ .is_visible = pfn_visible,
};
static const struct attribute_group *nd_pfn_attribute_groups[] = {
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
queue-4.14/libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
queue-4.14/libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-pmd-faults-on-zero-length-files.patch
queue-4.14/libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-general-protection-fault-in-dax_alloc_inode.patch
queue-4.14/libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, namespace: make 'resource' attribute only readable by root
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c1fb3542074fd0c4d901d778bd52455111e4eb6f Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 26 Sep 2017 11:21:24 -0700
Subject: libnvdimm, namespace: make 'resource' attribute only readable by root
From: Dan Williams <dan.j.williams(a)intel.com>
commit c1fb3542074fd0c4d901d778bd52455111e4eb6f upstream.
For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for
namespace devices only readable by root. Otherwise we disclose physical
address information.
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/namespace_devs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1620,7 +1620,7 @@ static umode_t namespace_visible(struct
if (a == &dev_attr_resource.attr) {
if (is_namespace_blk(dev))
return 0;
- return a->mode;
+ return 0400;
}
if (is_namespace_pmem(dev) || is_namespace_blk(dev)) {
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
queue-4.14/libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
queue-4.14/libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-pmd-faults-on-zero-length-files.patch
queue-4.14/libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-general-protection-fault-in-dax_alloc_inode.patch
queue-4.14/libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, namespace: fix label initialization to use valid seq numbers
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b18d4b8a25af6fe83d7692191d6ff962ea611c4f Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 26 Sep 2017 11:41:28 -0700
Subject: libnvdimm, namespace: fix label initialization to use valid seq numbers
From: Dan Williams <dan.j.williams(a)intel.com>
commit b18d4b8a25af6fe83d7692191d6ff962ea611c4f upstream.
The set of valid sequence numbers is {1,2,3}. The specification
indicates that an implementation should consider 0 a sign of a critical
error:
UEFI 2.7: 13.19 NVDIMM Label Protocol
Software never writes the sequence number 00, so a correctly
check-summed Index Block with this sequence number probably indicates a
critical error. When software discovers this case it treats it as an
invalid Index Block indication.
While the expectation is that the invalid block is just thrown away, the
Robustness Principle says we should fix this to make both sequence
numbers valid.
Fixes: f524bf271a5c ("libnvdimm: write pmem label set")
Reported-by: Juston Li <juston.li(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/label.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -1050,7 +1050,7 @@ static int init_labels(struct nd_mapping
nsindex = to_namespace_index(ndd, 0);
memset(nsindex, 0, ndd->nsarea.config_size);
for (i = 0; i < 2; i++) {
- int rc = nd_label_write_index(ndd, i, i*2, ND_NSINDEX_INIT);
+ int rc = nd_label_write_index(ndd, i, 3 - i, ND_NSINDEX_INIT);
if (rc)
return rc;
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
queue-4.14/libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
queue-4.14/libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-pmd-faults-on-zero-length-files.patch
queue-4.14/libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-general-protection-fault-in-dax_alloc_inode.patch
queue-4.14/libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, dimm: clear 'locked' status on successful DIMM enable
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d34cb808402898e53b9a9bcbbedd01667a78723b Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 25 Sep 2017 11:01:31 -0700
Subject: libnvdimm, dimm: clear 'locked' status on successful DIMM enable
From: Dan Williams <dan.j.williams(a)intel.com>
commit d34cb808402898e53b9a9bcbbedd01667a78723b upstream.
If we successfully enable a DIMM then it must not be locked and we can
clear the label-read failure condition. Otherwise, we need to reload the
entire bus provider driver to achieve the same effect, and that can
disrupt unrelated DIMMs and namespaces.
Fixes: 9d62ed965118 ("libnvdimm: handle locked label storage areas")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/dimm.c | 1 +
drivers/nvdimm/dimm_devs.c | 7 +++++++
drivers/nvdimm/nd.h | 1 +
3 files changed, 9 insertions(+)
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -68,6 +68,7 @@ static int nvdimm_probe(struct device *d
rc = nd_label_reserve_dpa(ndd);
if (ndd->ns_current >= 0)
nvdimm_set_aliasing(dev);
+ nvdimm_clear_locked(dev);
nvdimm_bus_unlock(dev);
if (rc)
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -200,6 +200,13 @@ void nvdimm_set_locked(struct device *de
set_bit(NDD_LOCKED, &nvdimm->flags);
}
+void nvdimm_clear_locked(struct device *dev)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+
+ clear_bit(NDD_LOCKED, &nvdimm->flags);
+}
+
static void nvdimm_release(struct device *dev)
{
struct nvdimm *nvdimm = to_nvdimm(dev);
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -254,6 +254,7 @@ long nvdimm_clear_poison(struct device *
unsigned int len);
void nvdimm_set_aliasing(struct device *dev);
void nvdimm_set_locked(struct device *dev);
+void nvdimm_clear_locked(struct device *dev);
struct nd_btt *to_nd_btt(struct device *dev);
struct nd_gen_sb {
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-dimm-clear-locked-status-on-successful-dimm-enable.patch
queue-4.14/libnvdimm-pfn-make-resource-attribute-only-readable-by-root.patch
queue-4.14/libnvdimm-region-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-pmd-faults-on-zero-length-files.patch
queue-4.14/libnvdimm-namespace-make-resource-attribute-only-readable-by-root.patch
queue-4.14/dax-fix-general-protection-fault-in-dax_alloc_inode.patch
queue-4.14/libnvdimm-namespace-fix-label-initialization-to-use-valid-seq-numbers.patch
This is a note to let you know that I've just added the patch titled
kvm: vmx: Reinstate support for CPUs without virtual NMI
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-vmx-reinstate-support-for-cpus-without-virtual-nmi.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8a1b43922d0d1279e7936ba85c4c2a870403c95f Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 6 Nov 2017 13:31:12 +0100
Subject: kvm: vmx: Reinstate support for CPUs without virtual NMI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Paolo Bonzini <pbonzini(a)redhat.com>
commit 8a1b43922d0d1279e7936ba85c4c2a870403c95f upstream.
This is more or less a revert of commit 2c82878b0cb3 ("KVM: VMX: require
virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
only had virtual NMIs in some SKUs.
The revert is not trivial because in the meanwhile there have been several
fixes to nested NMI injection. Therefore, the entire vNMI state is moved
to struct loaded_vmcs.
Another change compared to before the patch is a simplification here:
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
!(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
get_vmcs12(vcpu))))) {
The final condition here is always true (because nested_cpu_has_virtual_nmis
is always false) and is removed.
Fixes: 2c82878b0cb38fd516fd612c67852a6bbf282003
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/vmx.c | 150 +++++++++++++++++++++++++++++++++++++----------------
1 file changed, 106 insertions(+), 44 deletions(-)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -202,6 +202,10 @@ struct loaded_vmcs {
bool nmi_known_unmasked;
unsigned long vmcs_host_cr3; /* May not match real cr3 */
unsigned long vmcs_host_cr4; /* May not match real cr4 */
+ /* Support for vnmi-less CPUs */
+ int soft_vnmi_blocked;
+ ktime_t entry_time;
+ s64 vnmi_blocked_time;
struct list_head loaded_vmcss_on_cpu_link;
};
@@ -1286,6 +1290,11 @@ static inline bool cpu_has_vmx_invpcid(v
SECONDARY_EXEC_ENABLE_INVPCID;
}
+static inline bool cpu_has_virtual_nmis(void)
+{
+ return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS;
+}
+
static inline bool cpu_has_vmx_wbinvd_exit(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -1343,11 +1352,6 @@ static inline bool nested_cpu_has2(struc
(vmcs12->secondary_vm_exec_control & bit);
}
-static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12)
-{
- return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
-}
-
static inline bool nested_cpu_has_preemption_timer(struct vmcs12 *vmcs12)
{
return vmcs12->pin_based_vm_exec_control &
@@ -3699,9 +3703,9 @@ static __init int setup_vmcs_config(stru
&_vmexit_control) < 0)
return -EIO;
- min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING |
- PIN_BASED_VIRTUAL_NMIS;
- opt = PIN_BASED_POSTED_INTR | PIN_BASED_VMX_PREEMPTION_TIMER;
+ min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
+ opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
+ PIN_BASED_VMX_PREEMPTION_TIMER;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
&_pin_based_exec_control) < 0)
return -EIO;
@@ -5667,7 +5671,8 @@ static void enable_irq_window(struct kvm
static void enable_nmi_window(struct kvm_vcpu *vcpu)
{
- if (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
+ if (!cpu_has_virtual_nmis() ||
+ vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
enable_irq_window(vcpu);
return;
}
@@ -5707,6 +5712,19 @@ static void vmx_inject_nmi(struct kvm_vc
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ if (!cpu_has_virtual_nmis()) {
+ /*
+ * Tracking the NMI-blocked state in software is built upon
+ * finding the next open IRQ window. This, in turn, depends on
+ * well-behaving guests: They have to keep IRQs disabled at
+ * least as long as the NMI handler runs. Otherwise we may
+ * cause NMI nesting, maybe breaking the guest. But as this is
+ * highly unlikely, we can live with the residual risk.
+ */
+ vmx->loaded_vmcs->soft_vnmi_blocked = 1;
+ vmx->loaded_vmcs->vnmi_blocked_time = 0;
+ }
+
++vcpu->stat.nmi_injections;
vmx->loaded_vmcs->nmi_known_unmasked = false;
@@ -5725,6 +5743,8 @@ static bool vmx_get_nmi_mask(struct kvm_
struct vcpu_vmx *vmx = to_vmx(vcpu);
bool masked;
+ if (!cpu_has_virtual_nmis())
+ return vmx->loaded_vmcs->soft_vnmi_blocked;
if (vmx->loaded_vmcs->nmi_known_unmasked)
return false;
masked = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_NMI;
@@ -5736,13 +5756,20 @@ static void vmx_set_nmi_mask(struct kvm_
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- vmx->loaded_vmcs->nmi_known_unmasked = !masked;
- if (masked)
- vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
- GUEST_INTR_STATE_NMI);
- else
- vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
- GUEST_INTR_STATE_NMI);
+ if (!cpu_has_virtual_nmis()) {
+ if (vmx->loaded_vmcs->soft_vnmi_blocked != masked) {
+ vmx->loaded_vmcs->soft_vnmi_blocked = masked;
+ vmx->loaded_vmcs->vnmi_blocked_time = 0;
+ }
+ } else {
+ vmx->loaded_vmcs->nmi_known_unmasked = !masked;
+ if (masked)
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ else
+ vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ }
}
static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
@@ -5750,6 +5777,10 @@ static int vmx_nmi_allowed(struct kvm_vc
if (to_vmx(vcpu)->nested.nested_run_pending)
return 0;
+ if (!cpu_has_virtual_nmis() &&
+ to_vmx(vcpu)->loaded_vmcs->soft_vnmi_blocked)
+ return 0;
+
return !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
(GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
| GUEST_INTR_STATE_NMI));
@@ -6478,6 +6509,7 @@ static int handle_ept_violation(struct k
* AAK134, BY25.
*/
if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+ cpu_has_virtual_nmis() &&
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
@@ -6961,7 +6993,7 @@ static struct loaded_vmcs *nested_get_cu
}
/* Create a new VMCS */
- item = kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
+ item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
if (!item)
return NULL;
item->vmcs02.vmcs = alloc_vmcs();
@@ -7978,6 +8010,7 @@ static int handle_pml_full(struct kvm_vc
* "blocked by NMI" bit has to be set before next VM entry.
*/
if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+ cpu_has_virtual_nmis() &&
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
GUEST_INTR_STATE_NMI);
@@ -8822,6 +8855,25 @@ static int vmx_handle_exit(struct kvm_vc
return 0;
}
+ if (unlikely(!cpu_has_virtual_nmis() &&
+ vmx->loaded_vmcs->soft_vnmi_blocked)) {
+ if (vmx_interrupt_allowed(vcpu)) {
+ vmx->loaded_vmcs->soft_vnmi_blocked = 0;
+ } else if (vmx->loaded_vmcs->vnmi_blocked_time > 1000000000LL &&
+ vcpu->arch.nmi_pending) {
+ /*
+ * This CPU don't support us in finding the end of an
+ * NMI-blocked window if the guest runs with IRQs
+ * disabled. So we pull the trigger after 1 s of
+ * futile waiting, but inform the user about this.
+ */
+ printk(KERN_WARNING "%s: Breaking out of NMI-blocked "
+ "state on VCPU %d after 1 s timeout\n",
+ __func__, vcpu->vcpu_id);
+ vmx->loaded_vmcs->soft_vnmi_blocked = 0;
+ }
+ }
+
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu);
@@ -9104,33 +9156,38 @@ static void vmx_recover_nmi_blocking(str
idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK;
- if (vmx->loaded_vmcs->nmi_known_unmasked)
- return;
- /*
- * Can't use vmx->exit_intr_info since we're not sure what
- * the exit reason is.
- */
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
- vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
- /*
- * SDM 3: 27.7.1.2 (September 2008)
- * Re-set bit "block by NMI" before VM entry if vmexit caused by
- * a guest IRET fault.
- * SDM 3: 23.2.2 (September 2008)
- * Bit 12 is undefined in any of the following cases:
- * If the VM exit sets the valid bit in the IDT-vectoring
- * information field.
- * If the VM exit is due to a double fault.
- */
- if ((exit_intr_info & INTR_INFO_VALID_MASK) && unblock_nmi &&
- vector != DF_VECTOR && !idtv_info_valid)
- vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
- GUEST_INTR_STATE_NMI);
- else
- vmx->loaded_vmcs->nmi_known_unmasked =
- !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO)
- & GUEST_INTR_STATE_NMI);
+ if (cpu_has_virtual_nmis()) {
+ if (vmx->loaded_vmcs->nmi_known_unmasked)
+ return;
+ /*
+ * Can't use vmx->exit_intr_info since we're not sure what
+ * the exit reason is.
+ */
+ exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+ unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
+ vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
+ /*
+ * SDM 3: 27.7.1.2 (September 2008)
+ * Re-set bit "block by NMI" before VM entry if vmexit caused by
+ * a guest IRET fault.
+ * SDM 3: 23.2.2 (September 2008)
+ * Bit 12 is undefined in any of the following cases:
+ * If the VM exit sets the valid bit in the IDT-vectoring
+ * information field.
+ * If the VM exit is due to a double fault.
+ */
+ if ((exit_intr_info & INTR_INFO_VALID_MASK) && unblock_nmi &&
+ vector != DF_VECTOR && !idtv_info_valid)
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ else
+ vmx->loaded_vmcs->nmi_known_unmasked =
+ !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO)
+ & GUEST_INTR_STATE_NMI);
+ } else if (unlikely(vmx->loaded_vmcs->soft_vnmi_blocked))
+ vmx->loaded_vmcs->vnmi_blocked_time +=
+ ktime_to_ns(ktime_sub(ktime_get(),
+ vmx->loaded_vmcs->entry_time));
}
static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
@@ -9247,6 +9304,11 @@ static void __noclone vmx_vcpu_run(struc
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long debugctlmsr, cr3, cr4;
+ /* Record the guest's net vcpu time for enforced NMI injections. */
+ if (unlikely(!cpu_has_virtual_nmis() &&
+ vmx->loaded_vmcs->soft_vnmi_blocked))
+ vmx->loaded_vmcs->entry_time = ktime_get();
+
/* Don't enter VMX if guest state is invalid, let the exit handler
start emulation until we arrive back to a valid state */
if (vmx->emulation_required)
Patches currently in stable-queue which might be from pbonzini(a)redhat.com are
queue-4.14/kvm-nvmx-set-idtr-and-gdtr-limits-when-loading-l1-host-state.patch
queue-4.14/kvm-svm-obey-guest-pat.patch
queue-4.14/kvm-vmx-reinstate-support-for-cpus-without-virtual-nmi.patch
This is a note to let you know that I've just added the patch titled
KVM: SVM: obey guest PAT
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-svm-obey-guest-pat.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15038e14724799b8c205beb5f20f9e54896013c3 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 26 Oct 2017 09:13:27 +0200
Subject: KVM: SVM: obey guest PAT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Paolo Bonzini <pbonzini(a)redhat.com>
commit 15038e14724799b8c205beb5f20f9e54896013c3 upstream.
For many years some users of assigned devices have reported worse
performance on AMD processors with NPT than on AMD without NPT,
Intel or bare metal.
The reason turned out to be that SVM is discarding the guest PAT
setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-,
PA3=UC). The guest might be using a different setting, and
especially might want write combining but isn't getting it
(instead getting slow UC or UC- accesses).
Thanks a lot to geoff(a)hostfission.com for noticing the relation
to the g_pat setting. The patch has been tested also by a bunch
of people on VFIO users forums.
Fixes: 709ddebf81cb40e3c36c6109a7892e8b93a09464
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Nick Sarnie <commendsarnex(a)gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/svm.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3657,6 +3657,13 @@ static int svm_set_msr(struct kvm_vcpu *
u32 ecx = msr->index;
u64 data = msr->data;
switch (ecx) {
+ case MSR_IA32_CR_PAT:
+ if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+ return 1;
+ vcpu->arch.pat = data;
+ svm->vmcb->save.g_pat = data;
+ mark_dirty(svm->vmcb, VMCB_NPT);
+ break;
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
Patches currently in stable-queue which might be from pbonzini(a)redhat.com are
queue-4.14/kvm-nvmx-set-idtr-and-gdtr-limits-when-loading-l1-host-state.patch
queue-4.14/kvm-svm-obey-guest-pat.patch
queue-4.14/kvm-vmx-reinstate-support-for-cpus-without-virtual-nmi.patch
This is a note to let you know that I've just added the patch titled
KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-ppc-book3s-hv-don-t-call-real-mode-xics-hypercall-handlers-if-not-enabled.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 00bb6ae5006205e041ce9784c819460562351d47 Mon Sep 17 00:00:00 2001
From: Paul Mackerras <paulus(a)ozlabs.org>
Date: Thu, 26 Oct 2017 17:00:22 +1100
Subject: KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled
From: Paul Mackerras <paulus(a)ozlabs.org>
commit 00bb6ae5006205e041ce9784c819460562351d47 upstream.
When running a guest on a POWER9 system with the in-kernel XICS
emulation disabled (for example by running QEMU with the parameter
"-machine pseries,kernel_irqchip=off"), the kernel does not pass
the XICS-related hypercalls such as H_CPPR up to userspace for
emulation there as it should.
The reason for this is that the real-mode handlers for these
hypercalls don't check whether a XICS device has been instantiated
before calling the xics-on-xive code. That code doesn't check
either, leading to potential NULL pointer dereferences because
vcpu->arch.xive_vcpu is NULL. Those dereferences won't cause an
exception in real mode but will lead to kernel memory corruption.
This fixes it by adding kvmppc_xics_enabled() checks before calling
the XICS functions.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Paul Mackerras <paulus(a)ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/powerpc/kvm/book3s_hv_builtin.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -529,6 +529,8 @@ static inline bool is_rm(void)
unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
if (xive_enabled()) {
if (is_rm())
return xive_rm_h_xirr(vcpu);
@@ -541,6 +543,8 @@ unsigned long kvmppc_rm_h_xirr(struct kv
unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
vcpu->arch.gpr[5] = get_tb();
if (xive_enabled()) {
if (is_rm())
@@ -554,6 +558,8 @@ unsigned long kvmppc_rm_h_xirr_x(struct
unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
if (xive_enabled()) {
if (is_rm())
return xive_rm_h_ipoll(vcpu, server);
@@ -567,6 +573,8 @@ unsigned long kvmppc_rm_h_ipoll(struct k
int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
unsigned long mfrr)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
if (xive_enabled()) {
if (is_rm())
return xive_rm_h_ipi(vcpu, server, mfrr);
@@ -579,6 +587,8 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcp
int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
if (xive_enabled()) {
if (is_rm())
return xive_rm_h_cppr(vcpu, cppr);
@@ -591,6 +601,8 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vc
int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
{
+ if (!kvmppc_xics_enabled(vcpu))
+ return H_TOO_HARD;
if (xive_enabled()) {
if (is_rm())
return xive_rm_h_eoi(vcpu, xirr);
Patches currently in stable-queue which might be from paulus(a)ozlabs.org are
queue-4.14/kvm-ppc-book3s-hv-don-t-call-real-mode-xics-hypercall-handlers-if-not-enabled.patch
This is a note to let you know that I've just added the patch titled
KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-nvmx-set-idtr-and-gdtr-limits-when-loading-l1-host-state.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 21f2d551183847bc7fbe8d866151d00cdad18752 Mon Sep 17 00:00:00 2001
From: Ladi Prosek <lprosek(a)redhat.com>
Date: Wed, 11 Oct 2017 16:54:42 +0200
Subject: KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
From: Ladi Prosek <lprosek(a)redhat.com>
commit 21f2d551183847bc7fbe8d866151d00cdad18752 upstream.
Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:
"The GDTR and IDTR limits are each set to FFFFH."
Signed-off-by: Ladi Prosek <lprosek(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/vmx.c | 2 ++
1 file changed, 2 insertions(+)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11325,6 +11325,8 @@ static void load_vmcs12_host_state(struc
vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->host_ia32_sysenter_eip);
vmcs_writel(GUEST_IDTR_BASE, vmcs12->host_idtr_base);
vmcs_writel(GUEST_GDTR_BASE, vmcs12->host_gdtr_base);
+ vmcs_write32(GUEST_IDTR_LIMIT, 0xFFFF);
+ vmcs_write32(GUEST_GDTR_LIMIT, 0xFFFF);
/* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1. */
if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
Patches currently in stable-queue which might be from lprosek(a)redhat.com are
queue-4.14/kvm-nvmx-set-idtr-and-gdtr-limits-when-loading-l1-host-state.patch
This is a note to let you know that I've just added the patch titled
IB/srpt: Do not accept invalid initiator port names
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-srpt-do-not-accept-invalid-initiator-port-names.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c70ca38960399a63d5c048b7b700612ea321d17e Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche(a)wdc.com>
Date: Wed, 11 Oct 2017 10:27:22 -0700
Subject: IB/srpt: Do not accept invalid initiator port names
From: Bart Van Assche <bart.vanassche(a)wdc.com>
commit c70ca38960399a63d5c048b7b700612ea321d17e upstream.
Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.
Fixes: commit a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/srpt/ib_srpt.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2777,7 +2777,7 @@ static int srpt_parse_i_port_id(u8 i_por
{
const char *p;
unsigned len, count, leading_zero_bytes;
- int ret, rc;
+ int ret;
p = name;
if (strncasecmp(p, "0x", 2) == 0)
@@ -2789,10 +2789,9 @@ static int srpt_parse_i_port_id(u8 i_por
count = min(len / 2, 16U);
leading_zero_bytes = 16 - count;
memset(i_port_id, 0, leading_zero_bytes);
- rc = hex2bin(i_port_id + leading_zero_bytes, p, count);
- if (rc < 0)
- pr_debug("hex2bin failed for srpt_parse_i_port_id: %d\n", rc);
- ret = 0;
+ ret = hex2bin(i_port_id + leading_zero_bytes, p, count);
+ if (ret < 0)
+ pr_debug("hex2bin failed for srpt_parse_i_port_id: %d\n", ret);
out:
return ret;
}
Patches currently in stable-queue which might be from bart.vanassche(a)wdc.com are
queue-4.14/block-fix-a-race-between-blk_cleanup_queue-and-timeout-handling.patch
queue-4.14/scsi-qla2xxx-suppress-a-kernel-complaint-in-qla_init_base_qpair.patch
queue-4.14/ib-srp-avoid-that-a-cable-pull-can-trigger-a-kernel-crash.patch
queue-4.14/ib-srpt-do-not-accept-invalid-initiator-port-names.patch
This is a note to let you know that I've just added the patch titled
IB/srp: Avoid that a cable pull can trigger a kernel crash
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-srp-avoid-that-a-cable-pull-can-trigger-a-kernel-crash.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8a0d18c62121d3c554a83eb96e2752861d84d937 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche(a)wdc.com>
Date: Wed, 11 Oct 2017 10:27:26 -0700
Subject: IB/srp: Avoid that a cable pull can trigger a kernel crash
From: Bart Van Assche <bart.vanassche(a)wdc.com>
commit 8a0d18c62121d3c554a83eb96e2752861d84d937 upstream.
This patch fixes the following kernel crash:
general protection fault: 0000 [#1] PREEMPT SMP
Workqueue: ib_mad2 timeout_sends [ib_core]
Call Trace:
ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core]
send_handler+0xb2/0xd0 [ib_core]
timeout_sends+0x14d/0x220 [ib_core]
process_one_work+0x200/0x630
worker_thread+0x4e/0x3b0
kthread+0x113/0x150
Fixes: commit aef9ec39c47f ("IB: Add SCSI RDMA Protocol (SRP) initiator")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/srp/ib_srp.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -665,12 +665,19 @@ static void srp_path_rec_completion(int
static int srp_lookup_path(struct srp_rdma_ch *ch)
{
struct srp_target_port *target = ch->target;
- int ret;
+ int ret = -ENODEV;
ch->path.numb_path = 1;
init_completion(&ch->done);
+ /*
+ * Avoid that the SCSI host can be removed by srp_remove_target()
+ * before srp_path_rec_completion() is called.
+ */
+ if (!scsi_host_get(target->scsi_host))
+ goto out;
+
ch->path_query_id = ib_sa_path_rec_get(&srp_sa_client,
target->srp_host->srp_dev->dev,
target->srp_host->port,
@@ -684,18 +691,24 @@ static int srp_lookup_path(struct srp_rd
GFP_KERNEL,
srp_path_rec_completion,
ch, &ch->path_query);
- if (ch->path_query_id < 0)
- return ch->path_query_id;
+ ret = ch->path_query_id;
+ if (ret < 0)
+ goto put;
ret = wait_for_completion_interruptible(&ch->done);
if (ret < 0)
- return ret;
+ goto put;
- if (ch->status < 0)
+ ret = ch->status;
+ if (ret < 0)
shost_printk(KERN_WARNING, target->scsi_host,
PFX "Path record query failed\n");
- return ch->status;
+put:
+ scsi_host_put(target->scsi_host);
+
+out:
+ return ret;
}
static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
Patches currently in stable-queue which might be from bart.vanassche(a)wdc.com are
queue-4.14/block-fix-a-race-between-blk_cleanup_queue-and-timeout-handling.patch
queue-4.14/scsi-qla2xxx-suppress-a-kernel-complaint-in-qla_init_base_qpair.patch
queue-4.14/ib-srp-avoid-that-a-cable-pull-can-trigger-a-kernel-crash.patch
queue-4.14/ib-srpt-do-not-accept-invalid-initiator-port-names.patch
This is a note to let you know that I've just added the patch titled
IB/hfi1: Fix incorrect available receive user context count
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-hfi1-fix-incorrect-available-receive-user-context-count.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d7d626179fb283aba73699071af0df6d00e32138 Mon Sep 17 00:00:00 2001
From: "Michael J. Ruhl" <michael.j.ruhl(a)intel.com>
Date: Mon, 2 Oct 2017 11:04:19 -0700
Subject: IB/hfi1: Fix incorrect available receive user context count
From: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
commit d7d626179fb283aba73699071af0df6d00e32138 upstream.
The addition of the VNIC contexts to num_rcv_contexts changes the
meaning of the sysfs value nctxts from available user contexts, to
user contexts + reserved VNIC contexts.
User applications that use nctxts are now broken.
Update the calculation so that VNIC contexts are used only if there are
hardware contexts available, and do not silently affect nctxts.
Update code to use the calculated VNIC context number.
Update the sysfs value nctxts to be available user contexts only.
Fixes: 2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Niranjana Vishwanathapura <Niranjana.Vishwanathapura(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/hfi1/chip.c | 35 +++++++++++++++++++--------------
drivers/infiniband/hw/hfi1/hfi.h | 2 +
drivers/infiniband/hw/hfi1/sysfs.c | 2 -
drivers/infiniband/hw/hfi1/vnic_main.c | 7 ++++--
4 files changed, 29 insertions(+), 17 deletions(-)
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -13074,7 +13074,7 @@ static int request_msix_irqs(struct hfi1
first_sdma = last_general;
last_sdma = first_sdma + dd->num_sdma;
first_rx = last_sdma;
- last_rx = first_rx + dd->n_krcv_queues + HFI1_NUM_VNIC_CTXT;
+ last_rx = first_rx + dd->n_krcv_queues + dd->num_vnic_contexts;
/* VNIC MSIx interrupts get mapped when VNIC contexts are created */
dd->first_dyn_msix_idx = first_rx + dd->n_krcv_queues;
@@ -13294,8 +13294,9 @@ static int set_up_interrupts(struct hfi1
* slow source, SDMACleanupDone)
* N interrupts - one per used SDMA engine
* M interrupt - one per kernel receive context
+ * V interrupt - one for each VNIC context
*/
- total = 1 + dd->num_sdma + dd->n_krcv_queues + HFI1_NUM_VNIC_CTXT;
+ total = 1 + dd->num_sdma + dd->n_krcv_queues + dd->num_vnic_contexts;
/* ask for MSI-X interrupts */
request = request_msix(dd, total);
@@ -13356,10 +13357,12 @@ fail:
* in array of contexts
* freectxts - number of free user contexts
* num_send_contexts - number of PIO send contexts being used
+ * num_vnic_contexts - number of contexts reserved for VNIC
*/
static int set_up_context_variables(struct hfi1_devdata *dd)
{
unsigned long num_kernel_contexts;
+ u16 num_vnic_contexts = HFI1_NUM_VNIC_CTXT;
int total_contexts;
int ret;
unsigned ngroups;
@@ -13393,6 +13396,14 @@ static int set_up_context_variables(stru
num_kernel_contexts);
num_kernel_contexts = dd->chip_send_contexts - num_vls - 1;
}
+
+ /* Accommodate VNIC contexts if possible */
+ if ((num_kernel_contexts + num_vnic_contexts) > dd->chip_rcv_contexts) {
+ dd_dev_err(dd, "No receive contexts available for VNIC\n");
+ num_vnic_contexts = 0;
+ }
+ total_contexts = num_kernel_contexts + num_vnic_contexts;
+
/*
* User contexts:
* - default to 1 user context per real (non-HT) CPU core if
@@ -13402,19 +13413,16 @@ static int set_up_context_variables(stru
num_user_contexts =
cpumask_weight(&node_affinity.real_cpu_mask);
- total_contexts = num_kernel_contexts + num_user_contexts;
-
/*
* Adjust the counts given a global max.
*/
- if (total_contexts > dd->chip_rcv_contexts) {
+ if (total_contexts + num_user_contexts > dd->chip_rcv_contexts) {
dd_dev_err(dd,
"Reducing # user receive contexts to: %d, from %d\n",
- (int)(dd->chip_rcv_contexts - num_kernel_contexts),
+ (int)(dd->chip_rcv_contexts - total_contexts),
(int)num_user_contexts);
- num_user_contexts = dd->chip_rcv_contexts - num_kernel_contexts;
/* recalculate */
- total_contexts = num_kernel_contexts + num_user_contexts;
+ num_user_contexts = dd->chip_rcv_contexts - total_contexts;
}
/* each user context requires an entry in the RMT */
@@ -13427,25 +13435,24 @@ static int set_up_context_variables(stru
user_rmt_reduced);
/* recalculate */
num_user_contexts = user_rmt_reduced;
- total_contexts = num_kernel_contexts + num_user_contexts;
}
- /* Accommodate VNIC contexts */
- if ((total_contexts + HFI1_NUM_VNIC_CTXT) <= dd->chip_rcv_contexts)
- total_contexts += HFI1_NUM_VNIC_CTXT;
+ total_contexts += num_user_contexts;
/* the first N are kernel contexts, the rest are user/vnic contexts */
dd->num_rcv_contexts = total_contexts;
dd->n_krcv_queues = num_kernel_contexts;
dd->first_dyn_alloc_ctxt = num_kernel_contexts;
+ dd->num_vnic_contexts = num_vnic_contexts;
dd->num_user_contexts = num_user_contexts;
dd->freectxts = num_user_contexts;
dd_dev_info(dd,
- "rcv contexts: chip %d, used %d (kernel %d, user %d)\n",
+ "rcv contexts: chip %d, used %d (kernel %d, vnic %u, user %u)\n",
(int)dd->chip_rcv_contexts,
(int)dd->num_rcv_contexts,
(int)dd->n_krcv_queues,
- (int)dd->num_rcv_contexts - dd->n_krcv_queues);
+ dd->num_vnic_contexts,
+ dd->num_user_contexts);
/*
* Receive array allocation:
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1047,6 +1047,8 @@ struct hfi1_devdata {
u64 z_send_schedule;
u64 __percpu *send_schedule;
+ /* number of reserved contexts for VNIC usage */
+ u16 num_vnic_contexts;
/* number of receive contexts in use by the driver */
u32 num_rcv_contexts;
/* number of pio send contexts in use by the driver */
--- a/drivers/infiniband/hw/hfi1/sysfs.c
+++ b/drivers/infiniband/hw/hfi1/sysfs.c
@@ -543,7 +543,7 @@ static ssize_t show_nctxts(struct device
* give a more accurate picture of total contexts available.
*/
return scnprintf(buf, PAGE_SIZE, "%u\n",
- min(dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt,
+ min(dd->num_user_contexts,
(u32)dd->sc_sizes[SC_USER].count));
}
--- a/drivers/infiniband/hw/hfi1/vnic_main.c
+++ b/drivers/infiniband/hw/hfi1/vnic_main.c
@@ -840,6 +840,9 @@ struct net_device *hfi1_vnic_alloc_rn(st
struct rdma_netdev *rn;
int i, size, rc;
+ if (!dd->num_vnic_contexts)
+ return ERR_PTR(-ENOMEM);
+
if (!port_num || (port_num > dd->num_pports))
return ERR_PTR(-EINVAL);
@@ -848,7 +851,7 @@ struct net_device *hfi1_vnic_alloc_rn(st
size = sizeof(struct opa_vnic_rdma_netdev) + sizeof(*vinfo);
netdev = alloc_netdev_mqs(size, name, name_assign_type, setup,
- dd->chip_sdma_engines, HFI1_NUM_VNIC_CTXT);
+ dd->chip_sdma_engines, dd->num_vnic_contexts);
if (!netdev)
return ERR_PTR(-ENOMEM);
@@ -856,7 +859,7 @@ struct net_device *hfi1_vnic_alloc_rn(st
vinfo = opa_vnic_dev_priv(netdev);
vinfo->dd = dd;
vinfo->num_tx_q = dd->chip_sdma_engines;
- vinfo->num_rx_q = HFI1_NUM_VNIC_CTXT;
+ vinfo->num_rx_q = dd->num_vnic_contexts;
vinfo->netdev = netdev;
rn->free_rdma_netdev = hfi1_vnic_free_rn;
rn->set_id = hfi1_vnic_set_vesw_id;
Patches currently in stable-queue which might be from michael.j.ruhl(a)intel.com are
queue-4.14/ib-hfi1-fix-incorrect-available-receive-user-context-count.patch
This is a note to let you know that I've just added the patch titled
IB/core: Only maintain real QPs in the security lists
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-core-only-maintain-real-qps-in-the-security-lists.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 877add28178a7fa3c68f29c450d050a8e6513f08 Mon Sep 17 00:00:00 2001
From: Daniel Jurgens <danielj(a)mellanox.com>
Date: Tue, 7 Nov 2017 18:33:26 +0200
Subject: IB/core: Only maintain real QPs in the security lists
From: Daniel Jurgens <danielj(a)mellanox.com>
commit 877add28178a7fa3c68f29c450d050a8e6513f08 upstream.
When modify QP is called on a shared QP update the security context for
the real QP. When security is subsequently enforced the shared QP
handles will be checked as well.
Without this change shared QP handles get added to the port/pkey lists,
which is a bug, because not all shared QP handles will be checked for
access. Also the shared QP security context wouldn't get removed from
the port/pkey lists causing access to free memory and list corruption
when they are destroyed.
Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/security.c | 51 ++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 23 deletions(-)
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -87,16 +87,14 @@ static int enforce_qp_pkey_security(u16
if (ret)
return ret;
- if (qp_sec->qp == qp_sec->qp->real_qp) {
- list_for_each_entry(shared_qp_sec,
- &qp_sec->shared_qp_list,
- shared_qp_list) {
- ret = security_ib_pkey_access(shared_qp_sec->security,
- subnet_prefix,
- pkey);
- if (ret)
- return ret;
- }
+ list_for_each_entry(shared_qp_sec,
+ &qp_sec->shared_qp_list,
+ shared_qp_list) {
+ ret = security_ib_pkey_access(shared_qp_sec->security,
+ subnet_prefix,
+ pkey);
+ if (ret)
+ return ret;
}
return 0;
}
@@ -560,15 +558,22 @@ int ib_security_modify_qp(struct ib_qp *
int ret = 0;
struct ib_ports_pkeys *tmp_pps;
struct ib_ports_pkeys *new_pps;
- bool special_qp = (qp->qp_type == IB_QPT_SMI ||
- qp->qp_type == IB_QPT_GSI ||
- qp->qp_type >= IB_QPT_RESERVED1);
+ struct ib_qp *real_qp = qp->real_qp;
+ bool special_qp = (real_qp->qp_type == IB_QPT_SMI ||
+ real_qp->qp_type == IB_QPT_GSI ||
+ real_qp->qp_type >= IB_QPT_RESERVED1);
bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
(qp_attr_mask & IB_QP_ALT_PATH));
+ /* The port/pkey settings are maintained only for the real QP. Open
+ * handles on the real QP will be in the shared_qp_list. When
+ * enforcing security on the real QP all the shared QPs will be
+ * checked as well.
+ */
+
if (pps_change && !special_qp) {
- mutex_lock(&qp->qp_sec->mutex);
- new_pps = get_new_pps(qp,
+ mutex_lock(&real_qp->qp_sec->mutex);
+ new_pps = get_new_pps(real_qp,
qp_attr,
qp_attr_mask);
@@ -586,14 +591,14 @@ int ib_security_modify_qp(struct ib_qp *
if (!ret)
ret = check_qp_port_pkey_settings(new_pps,
- qp->qp_sec);
+ real_qp->qp_sec);
}
if (!ret)
- ret = qp->device->modify_qp(qp->real_qp,
- qp_attr,
- qp_attr_mask,
- udata);
+ ret = real_qp->device->modify_qp(real_qp,
+ qp_attr,
+ qp_attr_mask,
+ udata);
if (pps_change && !special_qp) {
/* Clean up the lists and free the appropriate
@@ -602,8 +607,8 @@ int ib_security_modify_qp(struct ib_qp *
if (ret) {
tmp_pps = new_pps;
} else {
- tmp_pps = qp->qp_sec->ports_pkeys;
- qp->qp_sec->ports_pkeys = new_pps;
+ tmp_pps = real_qp->qp_sec->ports_pkeys;
+ real_qp->qp_sec->ports_pkeys = new_pps;
}
if (tmp_pps) {
@@ -611,7 +616,7 @@ int ib_security_modify_qp(struct ib_qp *
port_pkey_list_remove(&tmp_pps->alt);
}
kfree(tmp_pps);
- mutex_unlock(&qp->qp_sec->mutex);
+ mutex_unlock(&real_qp->qp_sec->mutex);
}
return ret;
}
Patches currently in stable-queue which might be from danielj(a)mellanox.com are
queue-4.14/ib-core-only-maintain-real-qps-in-the-security-lists.patch
queue-4.14/ib-core-avoid-crash-on-pkey-enforcement-failed-in-received-mads.patch