On Thu, Jan 28, 2021 at 10:48:34AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 28, 2021 at 06:12:07PM +0100, Frederic Weisbecker wrote:
> > The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > the pending "nocb_timer" (note they are not the same timers) for the
> > given rdp without resetting the matching state stored in nocb_defer
> > wakeup.
> >
> > As a result, a future call_rcu() on that rdp may be fooled and think the
> > timer is armed when it's not, missing a deferred nocb_gp wakeup.
> >
> > Fix this with resetting rdp->nocb_defer_wakeup when we disarm the timer.
> >
> > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > Cc: Stable <stable(a)vger.kernel.org>
> > Cc: Josh Triplett <josh(a)joshtriplett.org>
> > Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
> > Cc: Joel Fernandes <joel(a)joelfernandes.org>
> > Cc: Neeraj Upadhyay <neeraju(a)codeaurora.org>
> > Cc: Boqun Feng <boqun.feng(a)gmail.com>
> > Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
> > ---
> > kernel/rcu/tree_plugin.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 7e33dae0e6ee..a44f80d7661b 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1705,6 +1705,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > rcu_nocb_unlock_irqrestore(rdp, flags);
> > return false;
> > }
> > +
> > + rdp->nocb_defer_wakeup = RCU_NOCB_WAKE_NOT;
>
> Given this change, does it make sense to remove the
> setting of ->nocb_defer_wakeup to RCU_NOCB_WAKE_NOT from the
> do_nocb_deferred_wakeup_common() function?
I do it later in "[PATCH 09/16] rcu/nocb: Merge nocb_timer to the rdp leader"
> Does the above assignment need
> to be WRITE_ONCE(), in other words, are all reads of ->nocb_defer_wakeup
> done with either ->nocb_lock or ->nocb_gp_lock held? (I do not believe
> that this is the case.)
Ah indeed it should probably be done with WRITE_ONCE() because it's read
locklessly on many places.
Thanks.
>
> Thanx, Paul
>
> > del_timer(&rdp->nocb_timer);
> > rcu_nocb_unlock_irqrestore(rdp, flags);
> > raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> > --
> > 2.25.1
> >
WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
Call Trace:
io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
filp_close+0xb4/0x170 fs/open.c:1295
close_files fs/file.c:403 [inline]
put_files_struct fs/file.c:418 [inline]
put_files_struct+0x1cc/0x350 fs/file.c:415
exit_files+0x7e/0xa0 fs/file.c:435
do_exit+0xc22/0x2ae0 kernel/exit.c:820
do_group_exit+0x125/0x310 kernel/exit.c:922
get_signal+0x427/0x20f0 kernel/signal.c:2773
arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Now io_uring_cancel_task_requests() can be called not through file
notes but directly, remove a WARN_ONCE() there that give us false
positives. That check is not very important and we catch it in other
places.
Fixes: 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks")
Cc: stable(a)vger.kernel.org # 5.9+
Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef(a)syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/io_uring.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 39ae1f821cef..12bf7180c0f1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8967,8 +8967,6 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
struct task_struct *task = current;
if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) {
- /* for SQPOLL only sqo_task has task notes */
- WARN_ON_ONCE(ctx->sqo_task != current);
io_disable_sqo_submit(ctx);
task = ctx->sq_data->thread;
atomic_inc(&task->io_uring->in_idle);
--
2.24.0
On Thu, Jan 28, 2021, Paolo Bonzini wrote:
> On 28/01/21 18:56, Sean Christopherson wrote:
> > On Thu, Jan 28, 2021, Paolo Bonzini wrote:
> > > - vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> > > + if (boot_cpu_has(X86_FEATURE_RTM))
> > > + vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> > > + else
> > > + vmx->guest_uret_msrs[j].mask = 0;
> >
> > IMO, this is an unnecessarily confusing way to "remove" the user return MSR.
> > Changing the ordering to do a 'continue' would also provide a separate chunk of
> > code for the new comment. And maybe replace the switch with an if-statement to
> > avoid a 'continue' buried in a switch?
>
> You still need the slot in vmx->guest_uret_msrs to store the guest value,
> even though the two available bits are both no-ops. It's ugly but it makes
> sense: you don't want to ever re-enable TSX, so you use the ignore the guest
> value and run unconditionally with the host value.
Ugh, didn't think about the guest wanting to read back the value it wrote.
> I'll rephrase everything and resend.
Thanks!
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will
generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up
with virtual machines that have HLE and RTM disabled, but TSX_CTRL
disabled.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible
to use the tsx=off as migration destinations, even though the guests
indeed do not have TSX enabled.
When tsx=off is used, however, we know that guests will not have
HLE and RTM (or if userspace sets bogus CPUID data, we do not
expect HLE and RTM to work in guests). Therefore we can keep
TSX_CTRL_RTM_DISABLE set for the entire life of the guests and
save MSR reads and writes on KVM_RUN and in the user return
notifiers.
Cc: stable(a)vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/vmx.c | 12 +++++++++++-
arch/x86/kvm/x86.c | 2 +-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..80491a729408 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6863,8 +6863,18 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
* No need to pass TSX_CTRL_CPUID_CLEAR through, so
* let's avoid changing CPUID bits under the host
* kernel's feet.
+ *
+ * If the host disabled RTM, we may still need TSX_CTRL
+ * to be supported in the guest; for example the guest
+ * could have been created on a tsx=on host with hle=0,
+ * rtm=0, tsx_ctrl=1 and later migrate to a tsx=off host.
+ * In that case however do not change the value on the host,
+ * so that TSX remains always disabled.
*/
- vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ if (boot_cpu_has(X86_FEATURE_RTM))
+ vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ else
+ vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..15733013b266 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void)
* This lets the guest use VERW to clear CPU buffers.
*/
if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
+ data &= ~ARCH_CAP_TAA_NO;
else if (!boot_cpu_has_bug(X86_BUG_TAA))
data |= ARCH_CAP_TAA_NO;
--
2.26.2
Den 28.1.2021 kl. 12:05, skrev Chris Clayton:
>
> On 28/01/2021 09:34, Greg Kroah-Hartman wrote:
>> On Thu, Jan 28, 2021 at 09:17:10AM +0000, Chris Clayton wrote:
>>> Hi,
>>>
>>> Building 5.10.11 fails on my (x86-64) laptop thusly:
>>>
>>> ..
>>>
>>> AS arch/x86/entry/thunk_64.o
>>> CC arch/x86/entry/vsyscall/vsyscall_64.o
>>> AS arch/x86/realmode/rm/header.o
>>> CC arch/x86/mm/pat/set_memory.o
>>> CC arch/x86/events/amd/core.o
>>> CC arch/x86/kernel/fpu/init.o
>>> CC arch/x86/entry/vdso/vma.o
>>> CC kernel/sched/core.o
>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at offset 0x3e
>>>
>>> AS arch/x86/realmode/rm/trampoline_64.o
>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error 255
>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
>>> make[2]: *** Waiting for unfinished jobs....
>>>
>>> ..
>>>
>>> Compiler is latest snapshot of gcc-10.
>>>
>>> Happy to test the fix but please cc me as I'm not subscribed
>>
>> Can you do 'git bisect' to track down the offending commit?
>>
>
> Sure, but I'll hold that request for a while. I updated to binutils-2.36 on Monday and I'm pretty sure that is a feature
> of this build fail. I've reverted binutils to 2.35.1, and the build succeeds. Updated to 2.36 again and, surprise,
> surprise, the kernel build fails again.
>
> I've had a glance at the binutils ML and there are all sorts of issues being reported, but it's beyond my knowledge to
> assess if this build error is related to any of them.
>
> I'll stick with binutils-2.35.1 for the time being.
>
>> And what exact gcc version are you using?
>>
>
> It's built from the 10-20210123 snapshot tarball.
>
> I can report this to the binutils folks, but might it be better if the objtool maintainer looks at it first? The
> binutils change might just have opened the gate to a bug in objtool.
>
>> thanks,
>>
>> greg k-h
>>
>
AFAIK you need this in stable trees:
From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001
From: Josh Poimboeuf <jpoimboe(a)redhat.com>
Date: Thu, 14 Jan 2021 16:14:01 -0600
Subject: [PATCH] objtool: Don't fail on missing symbol table
--
Thomas