The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca16d5bee59807bf04deaab0a8eccecd5061528c Mon Sep 17 00:00:00 2001
From: Yang Tao <yang.tao172(a)zte.com.cn>
Date: Wed, 6 Nov 2019 22:55:35 +0100
Subject: [PATCH] futex: Prevent robust futex exit race
Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.
The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.
After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.
After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.
The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.
This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.
(1) Unlocking task is killed after unlocking the futex value in user space
before being able to wake a waiter.
pthread_mutex_unlock()
|
V
atomic_exchange_rel (&mutex->__data.__lock, 0)
<------------------------killed
lll_futex_wake () |
|
|(__lock = 0)
|(enter kernel)
|
V
do_exit()
exit_mm()
mm_release()
exit_robust_list()
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned by
the exiting task prevents the wakeup of waiters which in consequence
block infinitely.
(2) Waiting task is killed after a wakeup and before it can acquire the
futex in user space.
OWNER WAITER
futex_wait()
pthread_mutex_unlock() |
| |
|(__lock = 0) |
| |
V |
futex_wake() ------------> wakeup()
|
|(return to userspace)
|(__lock = 0)
|
V
oldval = mutex->__data.__lock
<-----------------killed
atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, |
id | assume_other_futex_waiters, 0) |
|
|
(enter kernel)|
|
V
do_exit()
|
|
V
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned
by the exiting task prevents the wakeup of waiters, which seems to
be correct as the exiting task does not own the futex value, but
the consequence is that other waiters wont be woken up and block
infinitely.
In both scenarios the following conditions are true:
- task->robust_list->list_op_pending != NULL
- user space futex value == 0
- Regular futex (not PI)
If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.
As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.
This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.
The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.
The eventually woken up waiter will observe the uncontended lock value and
take it over.
[ tglx: Massaged changelog and comment. Made the return explicit and not
depend on the subsequent check and added constants to hand into
handle_futex_death() instead of plain numbers. Fixed a few coding
style issues. ]
Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Yang Tao <yang.tao172(a)zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59(a)zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.c…
Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de
diff --git a/kernel/futex.c b/kernel/futex.c
index 43229f8999fc..49eaf5be851a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3452,11 +3452,16 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
return ret;
}
+/* Constants for the pending_op argument of handle_futex_death */
+#define HANDLE_DEATH_PENDING true
+#define HANDLE_DEATH_LIST false
+
/*
* Process a futex-list entry, check whether it's owned by the
* dying task, and do notification if so:
*/
-static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
+static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
+ bool pi, bool pending_op)
{
u32 uval, uninitialized_var(nval), mval;
int err;
@@ -3469,6 +3474,42 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p
if (get_user(uval, uaddr))
return -1;
+ /*
+ * Special case for regular (non PI) futexes. The unlock path in
+ * user space has two race scenarios:
+ *
+ * 1. The unlock path releases the user space futex value and
+ * before it can execute the futex() syscall to wake up
+ * waiters it is killed.
+ *
+ * 2. A woken up waiter is killed before it can acquire the
+ * futex in user space.
+ *
+ * In both cases the TID validation below prevents a wakeup of
+ * potential waiters which can cause these waiters to block
+ * forever.
+ *
+ * In both cases the following conditions are met:
+ *
+ * 1) task->robust_list->list_op_pending != NULL
+ * @pending_op == true
+ * 2) User space futex value == 0
+ * 3) Regular futex: @pi == false
+ *
+ * If these conditions are met, it is safe to attempt waking up a
+ * potential waiter without touching the user space futex value and
+ * trying to set the OWNER_DIED bit. The user space futex value is
+ * uncontended and the rest of the user space mutex state is
+ * consistent, so a woken waiter will just take over the
+ * uncontended futex. Setting the OWNER_DIED bit would create
+ * inconsistent state and malfunction of the user space owner died
+ * handling.
+ */
+ if (pending_op && !pi && !uval) {
+ futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+ return 0;
+ }
+
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
@@ -3588,10 +3629,11 @@ void exit_robust_list(struct task_struct *curr)
* A pending lock might already be on the list, so
* don't process it twice:
*/
- if (entry != pending)
+ if (entry != pending) {
if (handle_futex_death((void __user *)entry + futex_offset,
- curr, pi))
+ curr, pi, HANDLE_DEATH_LIST))
return;
+ }
if (rc)
return;
entry = next_entry;
@@ -3605,9 +3647,10 @@ void exit_robust_list(struct task_struct *curr)
cond_resched();
}
- if (pending)
+ if (pending) {
handle_futex_death((void __user *)pending + futex_offset,
- curr, pip);
+ curr, pip, HANDLE_DEATH_PENDING);
+ }
}
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
@@ -3784,7 +3827,8 @@ void compat_exit_robust_list(struct task_struct *curr)
if (entry != pending) {
void __user *uaddr = futex_uaddr(entry, futex_offset);
- if (handle_futex_death(uaddr, curr, pi))
+ if (handle_futex_death(uaddr, curr, pi,
+ HANDLE_DEATH_LIST))
return;
}
if (rc)
@@ -3803,7 +3847,7 @@ void compat_exit_robust_list(struct task_struct *curr)
if (pending) {
void __user *uaddr = futex_uaddr(pending, futex_offset);
- handle_futex_death(uaddr, curr, pip);
+ handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca16d5bee59807bf04deaab0a8eccecd5061528c Mon Sep 17 00:00:00 2001
From: Yang Tao <yang.tao172(a)zte.com.cn>
Date: Wed, 6 Nov 2019 22:55:35 +0100
Subject: [PATCH] futex: Prevent robust futex exit race
Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.
The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.
After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.
After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.
The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.
This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.
(1) Unlocking task is killed after unlocking the futex value in user space
before being able to wake a waiter.
pthread_mutex_unlock()
|
V
atomic_exchange_rel (&mutex->__data.__lock, 0)
<------------------------killed
lll_futex_wake () |
|
|(__lock = 0)
|(enter kernel)
|
V
do_exit()
exit_mm()
mm_release()
exit_robust_list()
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned by
the exiting task prevents the wakeup of waiters which in consequence
block infinitely.
(2) Waiting task is killed after a wakeup and before it can acquire the
futex in user space.
OWNER WAITER
futex_wait()
pthread_mutex_unlock() |
| |
|(__lock = 0) |
| |
V |
futex_wake() ------------> wakeup()
|
|(return to userspace)
|(__lock = 0)
|
V
oldval = mutex->__data.__lock
<-----------------killed
atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, |
id | assume_other_futex_waiters, 0) |
|
|
(enter kernel)|
|
V
do_exit()
|
|
V
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned
by the exiting task prevents the wakeup of waiters, which seems to
be correct as the exiting task does not own the futex value, but
the consequence is that other waiters wont be woken up and block
infinitely.
In both scenarios the following conditions are true:
- task->robust_list->list_op_pending != NULL
- user space futex value == 0
- Regular futex (not PI)
If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.
As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.
This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.
The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.
The eventually woken up waiter will observe the uncontended lock value and
take it over.
[ tglx: Massaged changelog and comment. Made the return explicit and not
depend on the subsequent check and added constants to hand into
handle_futex_death() instead of plain numbers. Fixed a few coding
style issues. ]
Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Yang Tao <yang.tao172(a)zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59(a)zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.c…
Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de
diff --git a/kernel/futex.c b/kernel/futex.c
index 43229f8999fc..49eaf5be851a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3452,11 +3452,16 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
return ret;
}
+/* Constants for the pending_op argument of handle_futex_death */
+#define HANDLE_DEATH_PENDING true
+#define HANDLE_DEATH_LIST false
+
/*
* Process a futex-list entry, check whether it's owned by the
* dying task, and do notification if so:
*/
-static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
+static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
+ bool pi, bool pending_op)
{
u32 uval, uninitialized_var(nval), mval;
int err;
@@ -3469,6 +3474,42 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int p
if (get_user(uval, uaddr))
return -1;
+ /*
+ * Special case for regular (non PI) futexes. The unlock path in
+ * user space has two race scenarios:
+ *
+ * 1. The unlock path releases the user space futex value and
+ * before it can execute the futex() syscall to wake up
+ * waiters it is killed.
+ *
+ * 2. A woken up waiter is killed before it can acquire the
+ * futex in user space.
+ *
+ * In both cases the TID validation below prevents a wakeup of
+ * potential waiters which can cause these waiters to block
+ * forever.
+ *
+ * In both cases the following conditions are met:
+ *
+ * 1) task->robust_list->list_op_pending != NULL
+ * @pending_op == true
+ * 2) User space futex value == 0
+ * 3) Regular futex: @pi == false
+ *
+ * If these conditions are met, it is safe to attempt waking up a
+ * potential waiter without touching the user space futex value and
+ * trying to set the OWNER_DIED bit. The user space futex value is
+ * uncontended and the rest of the user space mutex state is
+ * consistent, so a woken waiter will just take over the
+ * uncontended futex. Setting the OWNER_DIED bit would create
+ * inconsistent state and malfunction of the user space owner died
+ * handling.
+ */
+ if (pending_op && !pi && !uval) {
+ futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+ return 0;
+ }
+
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
@@ -3588,10 +3629,11 @@ void exit_robust_list(struct task_struct *curr)
* A pending lock might already be on the list, so
* don't process it twice:
*/
- if (entry != pending)
+ if (entry != pending) {
if (handle_futex_death((void __user *)entry + futex_offset,
- curr, pi))
+ curr, pi, HANDLE_DEATH_LIST))
return;
+ }
if (rc)
return;
entry = next_entry;
@@ -3605,9 +3647,10 @@ void exit_robust_list(struct task_struct *curr)
cond_resched();
}
- if (pending)
+ if (pending) {
handle_futex_death((void __user *)pending + futex_offset,
- curr, pip);
+ curr, pip, HANDLE_DEATH_PENDING);
+ }
}
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
@@ -3784,7 +3827,8 @@ void compat_exit_robust_list(struct task_struct *curr)
if (entry != pending) {
void __user *uaddr = futex_uaddr(entry, futex_offset);
- if (handle_futex_death(uaddr, curr, pi))
+ if (handle_futex_death(uaddr, curr, pi,
+ HANDLE_DEATH_LIST))
return;
}
if (rc)
@@ -3803,7 +3847,7 @@ void compat_exit_robust_list(struct task_struct *curr)
if (pending) {
void __user *uaddr = futex_uaddr(pending, futex_offset);
- handle_futex_death(uaddr, curr, pip);
+ handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
}
}
From: Chester Lin <clin(a)suse.com>
[ Upstream commit 1d31999cf04c21709f72ceb17e65b54a401330da ]
adjust_lowmem_bounds() checks every memblocks in order to find the boundary
between lowmem and highmem. However some memblocks could be marked as NOMAP
so they are not used by kernel, which should be skipped while calculating
the boundary.
Signed-off-by: Chester Lin <clin(a)suse.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
---
arch/arm/mm/mmu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 70e560cf8ca0..d8cbe772f690 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1195,6 +1195,9 @@ void __init adjust_lowmem_bounds(void)
phys_addr_t block_start = reg->base;
phys_addr_t block_end = reg->base + reg->size;
+ if (memblock_is_nomap(reg))
+ continue;
+
if (reg->base < vmalloc_limit) {
if (block_end > lowmem_limit)
/*
--
2.24.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1a338e5b6fe9e0a39c57c232dc96c198bb53e47 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Wed, 20 Nov 2019 10:10:49 +0100
Subject: [PATCH] x86/entry/32: Unwind the ESPFIX stack earlier on exception
entry
Right now, we do some fancy parts of the exception entry path while SS
might have a nonzero base: we fill in regs->ss and regs->sp, and we
consider switching to the kernel stack. This results in regs->ss and
regs->sp referring to a non-flat stack and it may result in
overflowing the entry stack. The former issue means that we can try to
call iret_exc on a non-flat stack, which doesn't work.
Tested with selftests/x86/sigreturn_32.
Fixes: 45d7b255747c ("x86/entry/32: Enter the kernel via trampoline stack")
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)kernel.org
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d9f401995278..647e2a272d08 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -210,8 +210,6 @@
/*
* The high bits of the CS dword (__csh) are used for CS_FROM_*.
* Clear them in case hardware didn't do this for us.
- *
- * Be careful: we may have nonzero SS base due to ESPFIX.
*/
andl $0x0000ffff, 4*4(%esp)
@@ -307,12 +305,21 @@
.Lfinished_frame_\@:
.endm
-.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 skip_gs=0 unwind_espfix=0
cld
.if \skip_gs == 0
PUSH_GS
.endif
pushl %fs
+
+ pushl %eax
+ movl $(__KERNEL_PERCPU), %eax
+ movl %eax, %fs
+.if \unwind_espfix > 0
+ UNWIND_ESPFIX_STACK
+.endif
+ popl %eax
+
FIXUP_FRAME
pushl %es
pushl %ds
@@ -326,8 +333,6 @@
movl $(__USER_DS), %edx
movl %edx, %ds
movl %edx, %es
- movl $(__KERNEL_PERCPU), %edx
- movl %edx, %fs
.if \skip_gs == 0
SET_KERNEL_GS %edx
.endif
@@ -1153,18 +1158,17 @@ ENDPROC(entry_INT80_32)
lss (%esp), %esp /* switch to the normal stack segment */
#endif
.endm
+
.macro UNWIND_ESPFIX_STACK
+ /* It's safe to clobber %eax, all other regs need to be preserved */
#ifdef CONFIG_X86_ESPFIX32
movl %ss, %eax
/* see if on espfix stack */
cmpw $__ESPFIX_SS, %ax
- jne 27f
- movl $__KERNEL_DS, %eax
- movl %eax, %ds
- movl %eax, %es
+ jne .Lno_fixup_\@
/* switch to normal stack */
FIXUP_ESPFIX_STACK
-27:
+.Lno_fixup_\@:
#endif
.endm
@@ -1458,10 +1462,9 @@ END(page_fault)
common_exception_read_cr2:
/* the function address is in %gs's slot on the stack */
- SAVE_ALL switch_stacks=1 skip_gs=1
+ SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
ENCODE_FRAME_POINTER
- UNWIND_ESPFIX_STACK
/* fixup %gs */
GS_TO_REG %ecx
@@ -1483,9 +1486,8 @@ END(common_exception_read_cr2)
common_exception:
/* the function address is in %gs's slot on the stack */
- SAVE_ALL switch_stacks=1 skip_gs=1
+ SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
ENCODE_FRAME_POINTER
- UNWIND_ESPFIX_STACK
/* fixup %gs */
GS_TO_REG %ecx
8-letter strings representing ARC perf events are stores in two
32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc.
And the same order of bytes in the word is used regardless CPU endianness.
Which means in case of big-endian CPU core we need to swap bytes to get
the same order as if it was on little-endian CPU.
Otherwise we're seeing the following error message on boot:
------------------------->8----------------------
ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support]
sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
sysfs_warn_dup+0x46/0x58
sysfs_add_file_mode_ns+0xb2/0x168
create_files+0x70/0x2a0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0
Failed to register pmu: arc_pct, reason -17
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
__warn+0x9c/0xd4
warn_slowpath_fmt+0x22/0x2c
perf_event_sysfs_init+0x70/0xa0
---[ end trace a75fb9a9837bd1ec ]---
------------------------->8----------------------
What happens here we're trying to register more than one raw perf event
with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters
and encoded into two 32-bit words. In this particular case we deal with 2
events:
* "IJMP____" which counts all jump & branch instructions
* "IJMPC___" which counts only conditional jumps & branches
Those strings are split in two 32-bit words this way "IJMP" + "____" &
"IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core
being big-endian then we read "PMJI" + "____" & "PMJI" + "___C".
And since we interpret read array of ASCII letters as a null-terminated string
on big-endian CPU we end up with 2 events of the same name "PMJI".
Signed-off-by: Alexey Brodkin <abrodkin(a)synopsys.com>
Cc: stable(a)vger.kernel.org
---
Greg, Sasha, this is the same patch as
commit 5effc09c4907 ("ARC: perf: Accommodate big-endian CPU")
but fine-tuned to be applicable to kernels 4.19 and older.
arch/arc/kernel/perf_event.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 8aec462d90fb..30f66b123541 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -490,8 +490,8 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
/* loop thru all available h/w condition indexes */
for (j = 0; j < cc_bcr.c; j++) {
write_aux_reg(ARC_REG_CC_INDEX, j);
- cc_name.indiv.word0 = read_aux_reg(ARC_REG_CC_NAME0);
- cc_name.indiv.word1 = read_aux_reg(ARC_REG_CC_NAME1);
+ cc_name.indiv.word0 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME0));
+ cc_name.indiv.word1 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME1));
/* See if it has been mapped to a perf event_id */
for (i = 0; i < ARRAY_SIZE(arc_pmu_ev_hw_map); i++) {
--
2.16.2
From: Joerg Roedel <jroedel(a)suse.de>
When vmalloc_sync_all() iterates over the address space until
FIX_ADDR_TOP it will sync the whole kernel address space starting from
VMALLOC_START.
This is not a problem when the kernel address range is identical in
all page-tables, but this is no longer the case when PTI is enabled on
x86-32. In that case the per-process LDT is mapped in the kernel
address range and vmalloc_sync_all() clears the LDT mapping for all
processes.
To make LDT working again vmalloc_sync_all() must only iterate over
the volatile parts of the kernel address range that are identical
between all processes. This includes the VMALLOC and the PKMAP areas
on x86-32.
The order of the ranges in the address space is:
VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR
So the right check in vmalloc_sync_all() is "address < LDT_BASE_ADDR"
to make sure the VMALLOC and PKMAP areas are synchronized and the LDT
mapping is not falsely overwritten. the CPU_ENTRY_AREA and
the FIXMAP area are no longer synced as well, but these
ranges are synchronized on page-table creation time and do
not change during runtime.
This change fixes the ldt_gdt selftest in my setup.
Fixes: 7757d607c6b3 ("x86/pti: AllowCONFIG_PAGE_TABLE_ISOLATION for x86_32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
---
arch/x86/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ceacd1156db..144329c44436 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -197,7 +197,7 @@ void vmalloc_sync_all(void)
return;
for (address = VMALLOC_START & PMD_MASK;
- address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
+ address >= TASK_SIZE_MAX && address < LDT_BASE_ADDR;
address += PMD_SIZE) {
struct page *page;
--
2.16.4
Hi Sasha, Greg,
> -----Original Message-----
> From: Sasha Levin <sashal(a)kernel.org>
> Sent: Saturday, October 26, 2019 4:11 PM
> To: Sasha Levin <sashal(a)kernel.org>; Alexey Brodkin <abrodkin(a)synopsys.com>; linux-snps-
> arc(a)lists.infradead.org
> Cc: linux-kernel(a)vger.kernel.org; stable(a)vger.kernel.org; stable(a)vger.kernel.org
> Subject: Re: [PATCH] ARC: perf: Accommodate big-endian CPU
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.3.7, v4.19.80, v4.14.150, v4.9.197, v4.4.197.
>
> v5.3.7: Build OK!
> v4.19.80: Failed to apply! Possible dependencies:
> 0e956150fe09f ("ARC: perf: introduce Kernel PMU events support")
> 14f81a91ad29a ("ARC: perf: trivial code cleanup")
> baf9cc85ba01f ("ARC: perf: move HW events mapping to separate function")
> v4.14.150: Failed to apply! Possible dependencies:
> v4.9.197: Failed to apply! Possible dependencies:
> v4.4.197: Failed to apply! Possible dependencies:
Indeed the clash is due to
commit baf9cc85ba01f ("ARC: perf: move HW events mapping to separate function") as tmp variable "j" was changed on "i". So that's a fixed hunk:
-------------------------------->8------------------------------
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 8aec462d90fb..30f66b123541 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -490,8 +490,8 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
/* loop thru all available h/w condition indexes */
for (j = 0; j < cc_bcr.c; j++) {
write_aux_reg(ARC_REG_CC_INDEX, j);
- cc_name.indiv.word0 = read_aux_reg(ARC_REG_CC_NAME0);
- cc_name.indiv.word1 = read_aux_reg(ARC_REG_CC_NAME1);
+ cc_name.indiv.word0 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME0));
+ cc_name.indiv.word1 = le32_to_cpu(read_aux_reg(ARC_REG_CC_NAME1));
/* See if it has been mapped to a perf event_id */
for (i = 0; i < ARRAY_SIZE(arc_pmu_ev_hw_map); i++) {
-------------------------------->8------------------------------
Should I send a formal patch with it or it's OK for now?
-Alexey