This is a note to let you know that I've just added the patch titled
selftests/x86: Add tests for the STR and SLDT instructions
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
selftests-x86-add-tests-for-the-str-and-sldt-instructions.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a9e017d5619eb371460c8e516f4684def62bef3a Mon Sep 17 00:00:00 2001
From: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Date: Sun, 5 Nov 2017 18:27:57 -0800
Subject: selftests/x86: Add tests for the STR and SLDT instructions
From: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
commit a9e017d5619eb371460c8e516f4684def62bef3a upstream.
The STR and SLDT instructions are not valid when running on virtual-8086
mode and generate an invalid operand exception. These two instructions are
protected by the Intel User-Mode Instruction Prevention (UMIP) security
feature. In protected mode, if UMIP is enabled, these instructions generate
a general protection fault if called from CPL > 0. Linux traps the general
protection fault and emulates the instructions sgdt, sidt and smsw; but not
str and sldt.
These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.
Tests fallback to exit with INT3 in case emulation does happen.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Chen Yucong <slaoub(a)gmail.com>
Cc: Chris Metcalf <cmetcalf(a)mellanox.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Fenghua Yu <fenghua.yu(a)intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Paul Gortmaker <paul.gortmaker(a)windriver.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: ricardo.neri(a)intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-13-git-send-email-ricardo.neri-ca…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/x86/entry_from_vm86.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+ "vmcode_umip_str:\n\t"
+ "str %eax\n\t"
+ "vmcode_umip_sldt:\n\t"
+ "sldt %eax\n\t"
+ "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
extern unsigned char vmcode[], end_vmcode[];
extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
- vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+ vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+ vmcode_umip_str[], vmcode_umip_sldt[];
/* Returns false if the test was skipped. */
static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struc
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+ sethandler(SIGILL, sighandler, 0);
+ do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+ "STR instruction");
+ clearhandler(SIGILL);
+
+ sethandler(SIGILL, sighandler, 0);
+ do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+ "SLDT instruction");
+ clearhandler(SIGILL);
}
int main(void)
Patches currently in stable-queue which might be from ricardo.neri-calderon(a)linux.intel.com are
queue-4.14/selftests-x86-add-tests-for-the-str-and-sldt-instructions.patch
queue-4.14/selftests-x86-add-tests-for-user-mode-instruction-prevention.patch
This is a note to let you know that I've just added the patch titled
parisc: Handle case where flush_cache_range is called with no context
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-handle-case-where-flush_cache_range-is-called-with-no-context.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ef0f88fe5466c2ca1d2975549ba6be502c464c1 Mon Sep 17 00:00:00 2001
From: John David Anglin <dave.anglin(a)bell.net>
Date: Wed, 7 Mar 2018 08:18:05 -0500
Subject: parisc: Handle case where flush_cache_range is called with no context
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: John David Anglin <dave.anglin(a)bell.net>
commit 9ef0f88fe5466c2ca1d2975549ba6be502c464c1 upstream.
Just when I had decided that flush_cache_range() was always called with
a valid context, Helge reported two cases where the
"BUG_ON(!vma->vm_mm->context);" was hit on the phantom buildd:
kernel BUG at /mnt/sdb6/linux/linux-4.15.4/arch/parisc/kernel/cache.c:587!
CPU: 1 PID: 3254 Comm: kworker/1:2 Tainted: G D 4.15.0-1-parisc64-smp #1 Debian 4.15.4-1+b1
Workqueue: events free_ioctx
IAOQ[0]: flush_cache_range+0x164/0x168
IAOQ[1]: flush_cache_page+0x0/0x1c8
RP(r2): unmap_page_range+0xae8/0xb88
Backtrace:
[<00000000404a6980>] unmap_page_range+0xae8/0xb88
[<00000000404a6ae0>] unmap_single_vma+0xc0/0x188
[<00000000404a6cdc>] zap_page_range_single+0x134/0x1f8
[<00000000404a702c>] unmap_mapping_range+0x1cc/0x208
[<0000000040461518>] truncate_pagecache+0x98/0x108
[<0000000040461624>] truncate_setsize+0x9c/0xb8
[<00000000405d7f30>] put_aio_ring_file+0x80/0x100
[<00000000405d803c>] aio_free_ring+0x8c/0x290
[<00000000405d82c0>] free_ioctx+0x80/0x180
[<0000000040284e6c>] process_one_work+0x21c/0x668
[<00000000402854c4>] worker_thread+0x20c/0x778
[<0000000040291d44>] kthread+0x2d4/0x2e0
[<0000000040204020>] end_fault_vector+0x20/0xc0
This indicates that we need to handle the no context case in
flush_cache_range() as we do in flush_cache_mm().
In thinking about this, I realized that we don't need to flush the TLB
when there is no context. So, I added context checks to the large flush
cases in flush_cache_mm() and flush_cache_range(). The large flush case
occurs frequently in flush_cache_mm() and the change should improve fork
performance.
The v2 version of this change removes the BUG_ON from flush_cache_page()
by skipping the TLB flush when there is no context. I also added code
to flush the TLB in flush_cache_mm() and flush_cache_range() when we
have a context that's not current. Now all three routines handle TLB
flushes in a similar manner.
Signed-off-by: John David Anglin <dave.anglin(a)bell.net>
Cc: stable(a)vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/kernel/cache.c | 41 ++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -543,7 +543,8 @@ void flush_cache_mm(struct mm_struct *mm
rp3440, etc. So, avoid it if the mm isn't too big. */
if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
mm_total_size(mm) >= parisc_cache_flush_threshold) {
- flush_tlb_all();
+ if (mm->context)
+ flush_tlb_all();
flush_cache_all();
return;
}
@@ -571,6 +572,8 @@ void flush_cache_mm(struct mm_struct *mm
pfn = pte_pfn(*ptep);
if (!pfn_valid(pfn))
continue;
+ if (unlikely(mm->context))
+ flush_tlb_page(vma, addr);
__flush_cache_page(vma, addr, PFN_PHYS(pfn));
}
}
@@ -579,26 +582,46 @@ void flush_cache_mm(struct mm_struct *mm
void flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
+ pgd_t *pgd;
+ unsigned long addr;
+
if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
end - start >= parisc_cache_flush_threshold) {
- flush_tlb_range(vma, start, end);
+ if (vma->vm_mm->context)
+ flush_tlb_range(vma, start, end);
flush_cache_all();
return;
}
- flush_user_dcache_range_asm(start, end);
- if (vma->vm_flags & VM_EXEC)
- flush_user_icache_range_asm(start, end);
- flush_tlb_range(vma, start, end);
+ if (vma->vm_mm->context == mfsp(3)) {
+ flush_user_dcache_range_asm(start, end);
+ if (vma->vm_flags & VM_EXEC)
+ flush_user_icache_range_asm(start, end);
+ flush_tlb_range(vma, start, end);
+ return;
+ }
+
+ pgd = vma->vm_mm->pgd;
+ for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
+ unsigned long pfn;
+ pte_t *ptep = get_ptep(pgd, addr);
+ if (!ptep)
+ continue;
+ pfn = pte_pfn(*ptep);
+ if (pfn_valid(pfn)) {
+ if (unlikely(vma->vm_mm->context))
+ flush_tlb_page(vma, addr);
+ __flush_cache_page(vma, addr, PFN_PHYS(pfn));
+ }
+ }
}
void
flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn)
{
- BUG_ON(!vma->vm_mm->context);
-
if (pfn_valid(pfn)) {
- flush_tlb_page(vma, vmaddr);
+ if (likely(vma->vm_mm->context))
+ flush_tlb_page(vma, vmaddr);
__flush_cache_page(vma, vmaddr, PFN_PHYS(pfn));
}
}
Patches currently in stable-queue which might be from dave.anglin(a)bell.net are
queue-4.14/parisc-handle-case-where-flush_cache_range-is-called-with-no-context.patch
This is a note to let you know that I've just added the patch titled
RDMAVT: Fix synchronization around percpu_ref
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
rdmavt-fix-synchronization-around-percpu_ref.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 74b44bbe80b4c62113ac1501482ea1ee40eb9d67 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj(a)kernel.org>
Date: Wed, 14 Mar 2018 12:10:18 -0700
Subject: RDMAVT: Fix synchronization around percpu_ref
From: Tejun Heo <tj(a)kernel.org>
commit 74b44bbe80b4c62113ac1501482ea1ee40eb9d67 upstream.
rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table. When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.
rvt_check_refs() seems to have a couple issues.
* It has a fast exit path which tests percpu_ref_is_zero(). However,
a percpu_ref reading zero doesn't mean that the object can be
released. In fact, the ->release() callback might not even have
started executing yet. Proceeding with freeing can lead to
use-after-free.
* lkey_table is RCU protected but there is no RCU grace period in the
free path. percpu_ref uses RCU internally but it's sched-RCU whose
grace periods are different from regular RCU. Also, it generally
isn't a good idea to depend on internal behaviors like this.
To address the above issues, this patch removes the fast exit and adds
an explicit synchronize_rcu().
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Cc: linux-rdma(a)vger.kernel.org
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/sw/rdmavt/mr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -489,11 +489,13 @@ static int rvt_check_refs(struct rvt_mre
unsigned long timeout;
struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device);
- if (percpu_ref_is_zero(&mr->refcount))
- return 0;
- /* avoid dma mr */
- if (mr->lkey)
+ if (mr->lkey) {
+ /* avoid dma mr */
rvt_dereg_clean_qps(mr);
+ /* @mr was indexed on rcu protected @lkey_table */
+ synchronize_rcu();
+ }
+
timeout = wait_for_completion_timeout(&mr->comp, 5 * HZ);
if (!timeout) {
rvt_pr_err(rdi,
Patches currently in stable-queue which might be from tj(a)kernel.org are
queue-4.14/rdmavt-fix-synchronization-around-percpu_ref.patch
queue-4.14/fs-aio-add-explicit-rcu-grace-period-when-freeing-kioctx.patch
queue-4.14/fs-aio-use-rcu-accessors-for-kioctx_table-table.patch
This is a note to let you know that I've just added the patch titled
lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lock_parent-needs-to-recheck-if-dentry-got-__dentry_kill-ed-under-it.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3b821409632ab778d46e807516b457dfa72736ed Mon Sep 17 00:00:00 2001
From: Al Viro <viro(a)zeniv.linux.org.uk>
Date: Fri, 23 Feb 2018 20:47:17 -0500
Subject: lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
From: Al Viro <viro(a)zeniv.linux.org.uk>
commit 3b821409632ab778d46e807516b457dfa72736ed upstream.
In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent. We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point. Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.
Cc: stable(a)vger.kernel.org # 3.12+, counting backports
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/dcache.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -644,11 +644,16 @@ again:
spin_unlock(&parent->d_lock);
goto again;
}
- rcu_read_unlock();
- if (parent != dentry)
+ if (parent != dentry) {
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
- else
+ if (unlikely(dentry->d_lockref.count < 0)) {
+ spin_unlock(&parent->d_lock);
+ parent = NULL;
+ }
+ } else {
parent = NULL;
+ }
+ rcu_read_unlock();
return parent;
}
Patches currently in stable-queue which might be from viro(a)zeniv.linux.org.uk are
queue-4.14/fs-teach-path_connected-to-handle-nfs-filesystems-with-multiple-roots.patch
queue-4.14/lock_parent-needs-to-recheck-if-dentry-got-__dentry_kill-ed-under-it.patch
This is a note to let you know that I've just added the patch titled
kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-arm-arm64-vgic-v3-tighten-synchronization-for-guests-using-v2-on-v3.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 27e91ad1e746e341ca2312f29bccb9736be7b476 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Tue, 6 Mar 2018 21:44:37 +0000
Subject: kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3
From: Marc Zyngier <marc.zyngier(a)arm.com>
commit 27e91ad1e746e341ca2312f29bccb9736be7b476 upstream.
On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
force synchronization between the memory-mapped guest view and
the system-register view that the hypervisor uses.
This is incorrect, as the spec calls out the need for "a DSB whose
required access type is both loads and stores with any Shareability
attribute", while we're only synchronizing stores.
We also lack an isb after the dsb to ensure that the latter has
actually been executed before we start reading stuff from the sysregs.
The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
just after.
Cc: stable(a)vger.kernel.org
Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore")
Acked-by: Christoffer Dall <cdall(a)kernel.org>
Reviewed-by: Andre Przywara <andre.przywara(a)arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
virt/kvm/arm/hyp/vgic-v3-sr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -215,7 +215,8 @@ void __hyp_text __vgic_v3_save_state(str
* are now visible to the system register interface.
*/
if (!cpu_if->vgic_sre) {
- dsb(st);
+ dsb(sy);
+ isb();
cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
}
Patches currently in stable-queue which might be from marc.zyngier(a)arm.com are
queue-4.14/kvm-arm-arm64-vgic-don-t-populate-multiple-lrs-with-the-same-vintid.patch
queue-4.14/kvm-arm-arm64-vgic-v3-tighten-synchronization-for-guests-using-v2-on-v3.patch
queue-4.14/kvm-arm-arm64-reduce-verbosity-of-kvm-init-log.patch
This is a note to let you know that I've just added the patch titled
KVM: x86: Fix device passthrough when SME is active
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-x86-fix-device-passthrough-when-sme-is-active.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From daaf216c06fba4ee4dc3f62715667da929d68774 Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Thu, 8 Mar 2018 17:17:31 -0600
Subject: KVM: x86: Fix device passthrough when SME is active
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit daaf216c06fba4ee4dc3f62715667da929d68774 upstream.
When using device passthrough with SME active, the MMIO range that is
mapped for the device should not be mapped encrypted. Add a check in
set_spte() to insure that a page is not mapped encrypted if that page
is a device MMIO page as indicated by kvm_is_mmio_pfn().
Cc: <stable(a)vger.kernel.org> # 4.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/mmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2758,8 +2758,10 @@ static int set_spte(struct kvm_vcpu *vcp
else
pte_access &= ~ACC_WRITE_MASK;
+ if (!kvm_is_mmio_pfn(pfn))
+ spte |= shadow_me_mask;
+
spte |= (u64)pfn << PAGE_SHIFT;
- spte |= shadow_me_mask;
if (pte_access & ACC_WRITE_MASK) {
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.14/kvm-x86-fix-device-passthrough-when-sme-is-active.patch
queue-4.14/x86-cpufeatures-add-intel-pconfig-cpufeature.patch
queue-4.14/x86-cpufeatures-add-intel-total-memory-encryption-cpufeature.patch
This is a note to let you know that I've just added the patch titled
KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-arm-arm64-vgic-don-t-populate-multiple-lrs-with-the-same-vintid.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 16ca6a607d84bef0129698d8d808f501afd08d43 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Tue, 6 Mar 2018 21:48:01 +0000
Subject: KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid
From: Marc Zyngier <marc.zyngier(a)arm.com>
commit 16ca6a607d84bef0129698d8d808f501afd08d43 upstream.
The vgic code is trying to be clever when injecting GICv2 SGIs,
and will happily populate LRs with the same interrupt number if
they come from multiple vcpus (after all, they are distinct
interrupt sources).
Unfortunately, this is against the letter of the architecture,
and the GICv2 architecture spec says "Each valid interrupt stored
in the List registers must have a unique VirtualID for that
virtual CPU interface.". GICv3 has similar (although slightly
ambiguous) restrictions.
This results in guests locking up when using GICv2-on-GICv3, for
example. The obvious fix is to stop trying so hard, and inject
a single vcpu per SGI per guest entry. After all, pending SGIs
with multiple source vcpus are pretty rare, and are mostly seen
in scenario where the physical CPUs are severely overcomitted.
But as we now only inject a single instance of a multi-source SGI per
vcpu entry, we may delay those interrupts for longer than strictly
necessary, and run the risk of injecting lower priority interrupts
in the meantime.
In order to address this, we adopt a three stage strategy:
- If we encounter a multi-source SGI in the AP list while computing
its depth, we force the list to be sorted
- When populating the LRs, we prevent the injection of any interrupt
of lower priority than that of the first multi-source SGI we've
injected.
- Finally, the injection of a multi-source SGI triggers the request
of a maintenance interrupt when there will be no pending interrupt
in the LRs (HCR_NPIE).
At the point where the last pending interrupt in the LRs switches
from Pending to Active, the maintenance interrupt will be delivered,
allowing us to add the remaining SGIs using the same process.
Cc: stable(a)vger.kernel.org
Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Acked-by: Christoffer Dall <cdall(a)kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/irqchip/arm-gic-v3.h | 1
include/linux/irqchip/arm-gic.h | 1
virt/kvm/arm/vgic/vgic-v2.c | 9 ++++-
virt/kvm/arm/vgic/vgic-v3.c | 9 ++++-
virt/kvm/arm/vgic/vgic.c | 61 ++++++++++++++++++++++++++++---------
virt/kvm/arm/vgic/vgic.h | 2 +
6 files changed, 67 insertions(+), 16 deletions(-)
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -501,6 +501,7 @@
#define ICH_HCR_EN (1 << 0)
#define ICH_HCR_UIE (1 << 1)
+#define ICH_HCR_NPIE (1 << 3)
#define ICH_HCR_TC (1 << 10)
#define ICH_HCR_TALL0 (1 << 11)
#define ICH_HCR_TALL1 (1 << 12)
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -84,6 +84,7 @@
#define GICH_HCR_EN (1 << 0)
#define GICH_HCR_UIE (1 << 1)
+#define GICH_HCR_NPIE (1 << 3)
#define GICH_LR_VIRTUALID (0x3ff << 0)
#define GICH_LR_PHYSID_CPUID_SHIFT (10)
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -37,6 +37,13 @@ void vgic_v2_init_lrs(void)
vgic_v2_write_lr(i, 0);
}
+void vgic_v2_set_npie(struct kvm_vcpu *vcpu)
+{
+ struct vgic_v2_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v2;
+
+ cpuif->vgic_hcr |= GICH_HCR_NPIE;
+}
+
void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v2;
@@ -63,7 +70,7 @@ void vgic_v2_fold_lr_state(struct kvm_vc
struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
int lr;
- cpuif->vgic_hcr &= ~GICH_HCR_UIE;
+ cpuif->vgic_hcr &= ~(GICH_HCR_UIE | GICH_HCR_NPIE);
for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
u32 val = cpuif->vgic_lr[lr];
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -25,6 +25,13 @@ static bool group0_trap;
static bool group1_trap;
static bool common_trap;
+void vgic_v3_set_npie(struct kvm_vcpu *vcpu)
+{
+ struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
+
+ cpuif->vgic_hcr |= ICH_HCR_NPIE;
+}
+
void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
@@ -45,7 +52,7 @@ void vgic_v3_fold_lr_state(struct kvm_vc
u32 model = vcpu->kvm->arch.vgic.vgic_model;
int lr;
- cpuif->vgic_hcr &= ~ICH_HCR_UIE;
+ cpuif->vgic_hcr &= ~(ICH_HCR_UIE | ICH_HCR_NPIE);
for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
u64 val = cpuif->vgic_lr[lr];
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -610,22 +610,37 @@ static inline void vgic_set_underflow(st
vgic_v3_set_underflow(vcpu);
}
+static inline void vgic_set_npie(struct kvm_vcpu *vcpu)
+{
+ if (kvm_vgic_global_state.type == VGIC_V2)
+ vgic_v2_set_npie(vcpu);
+ else
+ vgic_v3_set_npie(vcpu);
+}
+
/* Requires the ap_list_lock to be held. */
-static int compute_ap_list_depth(struct kvm_vcpu *vcpu)
+static int compute_ap_list_depth(struct kvm_vcpu *vcpu,
+ bool *multi_sgi)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_irq *irq;
int count = 0;
+ *multi_sgi = false;
+
DEBUG_SPINLOCK_BUG_ON(!spin_is_locked(&vgic_cpu->ap_list_lock));
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
spin_lock(&irq->irq_lock);
/* GICv2 SGIs can count for more than one... */
- if (vgic_irq_is_sgi(irq->intid) && irq->source)
- count += hweight8(irq->source);
- else
+ if (vgic_irq_is_sgi(irq->intid) && irq->source) {
+ int w = hweight8(irq->source);
+
+ count += w;
+ *multi_sgi |= (w > 1);
+ } else {
count++;
+ }
spin_unlock(&irq->irq_lock);
}
return count;
@@ -636,28 +651,43 @@ static void vgic_flush_lr_state(struct k
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_irq *irq;
- int count = 0;
+ int count;
+ bool npie = false;
+ bool multi_sgi;
+ u8 prio = 0xff;
DEBUG_SPINLOCK_BUG_ON(!spin_is_locked(&vgic_cpu->ap_list_lock));
- if (compute_ap_list_depth(vcpu) > kvm_vgic_global_state.nr_lr)
+ count = compute_ap_list_depth(vcpu, &multi_sgi);
+ if (count > kvm_vgic_global_state.nr_lr || multi_sgi)
vgic_sort_ap_list(vcpu);
+ count = 0;
+
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
spin_lock(&irq->irq_lock);
- if (unlikely(vgic_target_oracle(irq) != vcpu))
- goto next;
-
/*
- * If we get an SGI with multiple sources, try to get
- * them in all at once.
+ * If we have multi-SGIs in the pipeline, we need to
+ * guarantee that they are all seen before any IRQ of
+ * lower priority. In that case, we need to filter out
+ * these interrupts by exiting early. This is easy as
+ * the AP list has been sorted already.
*/
- do {
+ if (multi_sgi && irq->priority > prio) {
+ spin_unlock(&irq->irq_lock);
+ break;
+ }
+
+ if (likely(vgic_target_oracle(irq) == vcpu)) {
vgic_populate_lr(vcpu, irq, count++);
- } while (irq->source && count < kvm_vgic_global_state.nr_lr);
-next:
+ if (irq->source) {
+ npie = true;
+ prio = irq->priority;
+ }
+ }
+
spin_unlock(&irq->irq_lock);
if (count == kvm_vgic_global_state.nr_lr) {
@@ -668,6 +698,9 @@ next:
}
}
+ if (npie)
+ vgic_set_npie(vcpu);
+
vcpu->arch.vgic_cpu.used_lrs = count;
/* Nuke remaining LRs */
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -150,6 +150,7 @@ void vgic_v2_fold_lr_state(struct kvm_vc
void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr);
void vgic_v2_clear_lr(struct kvm_vcpu *vcpu, int lr);
void vgic_v2_set_underflow(struct kvm_vcpu *vcpu);
+void vgic_v2_set_npie(struct kvm_vcpu *vcpu);
int vgic_v2_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr);
int vgic_v2_dist_uaccess(struct kvm_vcpu *vcpu, bool is_write,
int offset, u32 *val);
@@ -179,6 +180,7 @@ void vgic_v3_fold_lr_state(struct kvm_vc
void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr);
void vgic_v3_clear_lr(struct kvm_vcpu *vcpu, int lr);
void vgic_v3_set_underflow(struct kvm_vcpu *vcpu);
+void vgic_v3_set_npie(struct kvm_vcpu *vcpu);
void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v3_enable(struct kvm_vcpu *vcpu);
Patches currently in stable-queue which might be from marc.zyngier(a)arm.com are
queue-4.14/kvm-arm-arm64-vgic-don-t-populate-multiple-lrs-with-the-same-vintid.patch
queue-4.14/kvm-arm-arm64-vgic-v3-tighten-synchronization-for-guests-using-v2-on-v3.patch
queue-4.14/kvm-arm-arm64-reduce-verbosity-of-kvm-init-log.patch
This is a note to let you know that I've just added the patch titled
KVM: arm/arm64: Reduce verbosity of KVM init log
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-arm-arm64-reduce-verbosity-of-kvm-init-log.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 76600428c3677659e3c3633bb4f2ea302220a275 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 2 Mar 2018 08:16:30 +0000
Subject: KVM: arm/arm64: Reduce verbosity of KVM init log
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
commit 76600428c3677659e3c3633bb4f2ea302220a275 upstream.
On my GICv3 system, the following is printed to the kernel log at boot:
kvm [1]: 8-bit VMID
kvm [1]: IDMAP page: d20e35000
kvm [1]: HYP VA range: 800000000000:ffffffffffff
kvm [1]: vgic-v2@2c020000
kvm [1]: GIC system register CPU interface enabled
kvm [1]: vgic interrupt IRQ1
kvm [1]: virtual timer IRQ4
kvm [1]: Hyp mode initialized successfully
The KVM IDMAP is a mapping of a statically allocated kernel structure,
and so printing its physical address leaks the physical placement of
the kernel when physical KASLR in effect. So change the kvm_info() to
kvm_debug() to remove it from the log output.
While at it, trim the output a bit more: IRQ numbers can be found in
/proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
informational either.
Cc: <stable(a)vger.kernel.org>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Acked-by: Christoffer Dall <cdall(a)kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
virt/kvm/arm/arch_timer.c | 2 +-
virt/kvm/arm/mmu.c | 6 +++---
virt/kvm/arm/vgic/vgic-v2.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -602,7 +602,7 @@ int kvm_timer_hyp_init(void)
return err;
}
- kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
+ kvm_debug("virtual timer IRQ%d\n", host_vtimer_irq);
cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
"kvm/arm/timer:starting", kvm_timer_starting_cpu,
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1760,9 +1760,9 @@ int kvm_mmu_init(void)
*/
BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
- kvm_info("IDMAP page: %lx\n", hyp_idmap_start);
- kvm_info("HYP VA range: %lx:%lx\n",
- kern_hyp_va(PAGE_OFFSET), kern_hyp_va(~0UL));
+ kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
+ kvm_debug("HYP VA range: %lx:%lx\n",
+ kern_hyp_va(PAGE_OFFSET), kern_hyp_va(~0UL));
if (hyp_idmap_start >= kern_hyp_va(PAGE_OFFSET) &&
hyp_idmap_start < kern_hyp_va(~0UL) &&
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -380,7 +380,7 @@ int vgic_v2_probe(const struct gic_kvm_i
kvm_vgic_global_state.type = VGIC_V2;
kvm_vgic_global_state.max_gic_vcpus = VGIC_V2_MAX_CPUS;
- kvm_info("vgic-v2@%llx\n", info->vctrl.start);
+ kvm_debug("vgic-v2@%llx\n", info->vctrl.start);
return 0;
out:
Patches currently in stable-queue which might be from ard.biesheuvel(a)linaro.org are
queue-4.14/kvm-arm-arm64-reduce-verbosity-of-kvm-init-log.patch
This is a note to let you know that I've just added the patch titled
fs: Teach path_connected to handle nfs filesystems with multiple roots.
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fs-teach-path_connected-to-handle-nfs-filesystems-with-multiple-roots.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 95dd77580ccd66a0da96e6d4696945b8cea39431 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 14 Mar 2018 18:20:29 -0500
Subject: fs: Teach path_connected to handle nfs filesystems with multiple roots.
From: Eric W. Biederman <ebiederm(a)xmission.com>
commit 95dd77580ccd66a0da96e6d4696945b8cea39431 upstream.
On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem. The subsets can be from
disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.
The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem. The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.
This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.
When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.
The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount. This move can happen either locally or
remotely. With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.
If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check. As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.
The is_subdir function might be cheap enough that we can call it
unconditionally. Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to. So I am avoiding that for now.
Filesystems with snapshots such as nilfs and btrfs do something
similar. But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.
Cc: stable(a)vger.kernel.org
Reported-by: Al Viro <viro(a)ZenIV.linux.org.uk>
Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/namei.c | 5 +++--
fs/nfs/super.c | 2 ++
include/linux/fs.h | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -578,9 +578,10 @@ static int __nd_alloc_stack(struct namei
static bool path_connected(const struct path *path)
{
struct vfsmount *mnt = path->mnt;
+ struct super_block *sb = mnt->mnt_sb;
- /* Only bind mounts can have disconnected paths */
- if (mnt->mnt_root == mnt->mnt_sb->s_root)
+ /* Bind mounts and multi-root filesystems can have disconnected paths */
+ if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
return true;
return is_subdir(path->dentry, mnt->mnt_root);
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2623,6 +2623,8 @@ struct dentry *nfs_fs_mount_common(struc
/* initial superblock/root creation */
mount_info->fill_super(s, mount_info);
nfs_get_cache_cookie(s, mount_info->parsed, mount_info->cloned);
+ if (!(server->flags & NFS_MOUNT_UNSHARED))
+ s->s_iflags |= SB_I_MULTIROOT;
}
mntroot = nfs_get_root(s, mount_info->mntfh, dev_name);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1312,6 +1312,7 @@ extern int send_sigurg(struct fown_struc
#define SB_I_CGROUPWB 0x00000001 /* cgroup-aware writeback enabled */
#define SB_I_NOEXEC 0x00000002 /* Ignore executables on this fs */
#define SB_I_NODEV 0x00000004 /* Ignore devices on this fs */
+#define SB_I_MULTIROOT 0x00000008 /* Multiple roots to the dentry tree */
/* sb->s_iflags to limit user namespace mounts */
#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */
Patches currently in stable-queue which might be from ebiederm(a)xmission.com are
queue-4.14/fs-teach-path_connected-to-handle-nfs-filesystems-with-multiple-roots.patch
This is a note to let you know that I've just added the patch titled
fs/aio: Use RCU accessors for kioctx_table->table[]
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fs-aio-use-rcu-accessors-for-kioctx_table-table.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d0264c01e7587001a8c4608a5d1818dba9a4c11a Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj(a)kernel.org>
Date: Wed, 14 Mar 2018 12:10:17 -0700
Subject: fs/aio: Use RCU accessors for kioctx_table->table[]
From: Tejun Heo <tj(a)kernel.org>
commit d0264c01e7587001a8c4608a5d1818dba9a4c11a upstream.
While converting ioctx index from a list to a table, db446a08c23d
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table->table[] as an array of RCU pointers and using the
appropriate RCU accessors. This introduces a small window in the
lookup path where init and access may race.
Mark kioctx_table->table[] with __rcu and use the approriate RCU
accessors when using the field.
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise <bcrl(a)kvack.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # v3.12+
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/aio.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -68,9 +68,9 @@ struct aio_ring {
#define AIO_RING_PAGES 8
struct kioctx_table {
- struct rcu_head rcu;
- unsigned nr;
- struct kioctx *table[];
+ struct rcu_head rcu;
+ unsigned nr;
+ struct kioctx __rcu *table[];
};
struct kioctx_cpu {
@@ -330,7 +330,7 @@ static int aio_ring_mremap(struct vm_are
for (i = 0; i < table->nr; i++) {
struct kioctx *ctx;
- ctx = table->table[i];
+ ctx = rcu_dereference(table->table[i]);
if (ctx && ctx->aio_ring_file == file) {
if (!atomic_read(&ctx->dead)) {
ctx->user_id = ctx->mmap_base = vma->vm_start;
@@ -666,9 +666,9 @@ static int ioctx_add_table(struct kioctx
while (1) {
if (table)
for (i = 0; i < table->nr; i++)
- if (!table->table[i]) {
+ if (!rcu_access_pointer(table->table[i])) {
ctx->id = i;
- table->table[i] = ctx;
+ rcu_assign_pointer(table->table[i], ctx);
spin_unlock(&mm->ioctx_lock);
/* While kioctx setup is in progress,
@@ -849,8 +849,8 @@ static int kill_ioctx(struct mm_struct *
}
table = rcu_dereference_raw(mm->ioctx_table);
- WARN_ON(ctx != table->table[ctx->id]);
- table->table[ctx->id] = NULL;
+ WARN_ON(ctx != rcu_access_pointer(table->table[ctx->id]));
+ RCU_INIT_POINTER(table->table[ctx->id], NULL);
spin_unlock(&mm->ioctx_lock);
/* free_ioctx_reqs() will do the necessary RCU synchronization */
@@ -895,7 +895,8 @@ void exit_aio(struct mm_struct *mm)
skipped = 0;
for (i = 0; i < table->nr; ++i) {
- struct kioctx *ctx = table->table[i];
+ struct kioctx *ctx =
+ rcu_dereference_protected(table->table[i], true);
if (!ctx) {
skipped++;
@@ -1084,7 +1085,7 @@ static struct kioctx *lookup_ioctx(unsig
if (!table || id >= table->nr)
goto out;
- ctx = table->table[id];
+ ctx = rcu_dereference(table->table[id]);
if (ctx && ctx->user_id == ctx_id) {
percpu_ref_get(&ctx->users);
ret = ctx;
Patches currently in stable-queue which might be from tj(a)kernel.org are
queue-4.14/rdmavt-fix-synchronization-around-percpu_ref.patch
queue-4.14/fs-aio-add-explicit-rcu-grace-period-when-freeing-kioctx.patch
queue-4.14/fs-aio-use-rcu-accessors-for-kioctx_table-table.patch