This is a note to let you know that I've just added the patch titled
s390/guarded storage: fix possible memory corruption
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-guarded-storage-fix-possible-memory-corruption.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fa1edf3f63c05ca8eacafcd7048ed91e5360f1a8 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Mon, 11 Sep 2017 11:24:22 +0200
Subject: s390/guarded storage: fix possible memory corruption
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit fa1edf3f63c05ca8eacafcd7048ed91e5360f1a8 upstream.
For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: 916cda1aa1b4 ("s390: add a system call for guarded storage")
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/guarded_storage.c | 2 ++
1 file changed, 2 insertions(+)
--- a/arch/s390/kernel/guarded_storage.c
+++ b/arch/s390/kernel/guarded_storage.c
@@ -14,9 +14,11 @@
void exit_thread_gs(void)
{
+ preempt_disable();
kfree(current->thread.gs_cb);
kfree(current->thread.gs_bc_cb);
current->thread.gs_cb = current->thread.gs_bc_cb = NULL;
+ preempt_enable();
}
static int gs_enable(void)
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch
queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.14/s390-fix-transactional-execution-control-register-handling.patch
queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch
queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch
This is a note to let you know that I've just added the patch titled
s390/noexec: execute kexec datamover without DAT
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-noexec-execute-kexec-datamover-without-dat.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d0e810eeb3d326978f248b8f0233a2f30f58c72d Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Thu, 9 Nov 2017 23:00:14 +0100
Subject: s390/noexec: execute kexec datamover without DAT
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit d0e810eeb3d326978f248b8f0233a2f30f58c72d upstream.
Rebooting into a new kernel with kexec fails (system dies) if tried on
a machine that has no-execute support. Reason for this is that the so
called datamover code gets executed with DAT on (MMU is active) and
the page that contains the datamover is marked as non-executable.
Therefore when branching into the datamover an unexpected program
check happens and afterwards the machine is dead.
This can be simply avoided by disabling DAT, which also disables any
no-execute checks, just before the datamover gets executed.
In fact the first thing done by the datamover is to disable DAT. The
code in the datamover that disables DAT can be removed as well.
Thanks to Michael Holzheu and Gerald Schaefer for tracking this down.
Reviewed-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com>
Reviewed-by: Philipp Rudo <prudo(a)linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer(a)de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Fixes: 57d7f939e7bd ("s390: add no-execute support")
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/machine_kexec.c | 1 +
arch/s390/kernel/relocate_kernel.S | 3 ---
2 files changed, 1 insertion(+), 3 deletions(-)
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -269,6 +269,7 @@ static void __do_machine_kexec(void *dat
s390_reset_system();
data_mover = (relocate_kernel_t) page_to_phys(image->control_code_page);
+ __arch_local_irq_stnsm(0xfb); /* disable DAT - avoid no-execute */
/* Call the moving routine */
(*data_mover)(&image->head, image->start);
--- a/arch/s390/kernel/relocate_kernel.S
+++ b/arch/s390/kernel/relocate_kernel.S
@@ -29,7 +29,6 @@
ENTRY(relocate_kernel)
basr %r13,0 # base address
.base:
- stnsm sys_msk-.base(%r13),0xfb # disable DAT
stctg %c0,%c15,ctlregs-.base(%r13)
stmg %r0,%r15,gprregs-.base(%r13)
lghi %r0,3
@@ -103,8 +102,6 @@ ENTRY(relocate_kernel)
.align 8
load_psw:
.long 0x00080000,0x80000000
- sys_msk:
- .quad 0
ctlregs:
.rept 16
.quad 0
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch
queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.14/s390-fix-transactional-execution-control-register-handling.patch
queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch
queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch
This is a note to let you know that I've just added the patch titled
s390: fix transactional execution control register handling
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-fix-transactional-execution-control-register-handling.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Thu, 9 Nov 2017 12:29:34 +0100
Subject: s390: fix transactional execution control register handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream.
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan(a)danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/include/asm/switch_to.h | 2 +-
arch/s390/kernel/early.c | 4 +++-
arch/s390/kernel/process.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -37,8 +37,8 @@ static inline void restore_access_regs(u
save_ri_cb(prev->thread.ri_cb); \
save_gs_cb(prev->thread.gs_cb); \
} \
+ update_cr_regs(next); \
if (next->mm) { \
- update_cr_regs(next); \
set_cpu_flag(CIF_FPU); \
restore_access_regs(&next->thread.acrs[0]); \
restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -375,8 +375,10 @@ static __init void detect_machine_facili
S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE;
if (test_facility(40))
S390_lowcore.machine_flags |= MACHINE_FLAG_LPP;
- if (test_facility(50) && test_facility(73))
+ if (test_facility(50) && test_facility(73)) {
S390_lowcore.machine_flags |= MACHINE_FLAG_TE;
+ __ctl_set_bit(0, 55);
+ }
if (test_facility(51))
S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC;
if (test_facility(129)) {
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -100,6 +100,7 @@ int copy_thread_tls(unsigned long clone_
memset(&p->thread.per_user, 0, sizeof(p->thread.per_user));
memset(&p->thread.per_event, 0, sizeof(p->thread.per_event));
clear_tsk_thread_flag(p, TIF_SINGLE_STEP);
+ p->thread.per_flags = 0;
/* Initialize per thread user and system timer values */
p->thread.user_timer = 0;
p->thread.guest_timer = 0;
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch
queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.14/s390-fix-transactional-execution-control-register-handling.patch
queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch
queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch
This is a note to let you know that I've just added the patch titled
s390/disassembler: add missing end marker for e7 table
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
s390-disassembler-add-missing-end-marker-for-e7-table.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Tue, 26 Sep 2017 09:16:48 +0200
Subject: s390/disassembler: add missing end marker for e7 table
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
commit 5c50538752af7968f53924b22dede8ed4ce4cb3b upstream.
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kernel/dis.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/dis.c
+++ b/arch/s390/kernel/dis.c
@@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = {
{ "vfsq", 0xce, INSTR_VRR_VV000MM },
{ "vfs", 0xe2, INSTR_VRR_VVV00MM },
{ "vftci", 0x4a, INSTR_VRI_VVIMM },
+ { "", 0, INSTR_INVALID }
};
static struct s390_insn opcode_eb[] = {
Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are
queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch
queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch
queue-4.14/s390-fix-transactional-execution-control-register-handling.patch
queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch
queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch
This is a note to let you know that I've just added the patch titled
ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-pm-fix-acpi_pm_notifier_lock-vs-flush_workqueue-deadlock.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Tue, 7 Nov 2017 23:08:10 +0200
Subject: ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
commit ff1656790b3a4caca94505c52fd0250f981ea187 upstream.
acpi_remove_pm_notifier() ends up calling flush_workqueue() while
holding acpi_pm_notifier_lock, and that same lock is taken by
by the work via acpi_pm_notify_handler(). This can deadlock.
To fix the problem let's split the single lock into two: one to
protect the dev->wakeup between the work vs. add/remove, and
another one to handle notifier installation vs. removal.
After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work()
annotation" I was able to kill the machine (Intel Braswell)
very easily with 'powertop --auto-tune', runtime suspending i915,
and trying to wake it up via the USB keyboard. The cases when
it didn't die are presumably explained by lockdep getting disabled
by something else (cpu hotplug locking issues usually).
Fortunately I still got a lockdep report over netconsole
(trickling in very slowly), even though the machine was
otherwise practically dead:
[ 112.179806] ======================================================
[ 114.670858] WARNING: possible circular locking dependency detected
[ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted
[ 119.658101] ------------------------------------------------------
[ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
[ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up
[ 121.313485] usb 1-6: USB disconnect, device number 3
[ 121.313501] usb 1-6.2: USB disconnect, device number 4
[ 134.747383] kworker/0:2/47 is trying to acquire lock:
[ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80
[ 139.721524]
[ 139.721524] but task is already holding lock:
[ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 147.184450]
[ 147.184450] which lock already depends on the new lock.
[ 147.184450]
[ 154.604711]
[ 154.604711] the existing dependency chain (in reverse order) is:
[ 159.447888]
[ 159.447888] -> #2 ((&dpc->work)){+.+.}:
[ 164.183486] __lock_acquire+0x1255/0x13f0
[ 166.504313] lock_acquire+0xb5/0x210
[ 168.778973] process_one_work+0x1b9/0x720
[ 171.030316] worker_thread+0x4c/0x440
[ 173.257184] kthread+0x154/0x190
[ 175.456143] ret_from_fork+0x27/0x40
[ 177.624348]
[ 177.624348] -> #1 ("kacpi_notify"){+.+.}:
[ 181.850351] __lock_acquire+0x1255/0x13f0
[ 183.941695] lock_acquire+0xb5/0x210
[ 186.046115] flush_workqueue+0xdd/0x510
[ 190.408153] acpi_os_wait_events_complete+0x31/0x40
[ 192.625303] acpi_remove_notify_handler+0x133/0x188
[ 194.820829] acpi_remove_pm_notifier+0x56/0x90
[ 196.989068] acpi_dev_pm_detach+0x5f/0xa0
[ 199.145866] dev_pm_domain_detach+0x27/0x30
[ 201.285614] i2c_device_probe+0x100/0x210
[ 203.411118] driver_probe_device+0x23e/0x310
[ 205.522425] __driver_attach+0xa3/0xb0
[ 207.634268] bus_for_each_dev+0x69/0xa0
[ 209.714797] driver_attach+0x1e/0x20
[ 211.778258] bus_add_driver+0x1bc/0x230
[ 213.837162] driver_register+0x60/0xe0
[ 215.868162] i2c_register_driver+0x42/0x70
[ 217.869551] 0xffffffffa0172017
[ 219.863009] do_one_initcall+0x45/0x170
[ 221.843863] do_init_module+0x5f/0x204
[ 223.817915] load_module+0x225b/0x29b0
[ 225.757234] SyS_finit_module+0xc6/0xd0
[ 227.661851] do_syscall_64+0x5c/0x120
[ 229.536819] return_from_SYSCALL_64+0x0/0x7a
[ 231.392444]
[ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}:
[ 235.124914] check_prev_add+0x44e/0x8a0
[ 237.024795] __lock_acquire+0x1255/0x13f0
[ 238.937351] lock_acquire+0xb5/0x210
[ 240.840799] __mutex_lock+0x75/0x940
[ 242.709517] mutex_lock_nested+0x1c/0x20
[ 244.551478] acpi_pm_notify_handler+0x2f/0x80
[ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c
[ 248.194412] acpi_os_execute_deferred+0x14/0x30
[ 250.003925] process_one_work+0x1ec/0x720
[ 251.803191] worker_thread+0x4c/0x440
[ 253.605307] kthread+0x154/0x190
[ 255.387498] ret_from_fork+0x27/0x40
[ 257.153175]
[ 257.153175] other info that might help us debug this:
[ 257.153175]
[ 262.324392] Chain exists of:
[ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work)
[ 262.324392]
[ 267.391997] Possible unsafe locking scenario:
[ 267.391997]
[ 270.758262] CPU0 CPU1
[ 272.431713] ---- ----
[ 274.060756] lock((&dpc->work));
[ 275.646532] lock("kacpi_notify");
[ 277.260772] lock((&dpc->work));
[ 278.839146] lock(acpi_pm_notifier_lock);
[ 280.391902]
[ 280.391902] *** DEADLOCK ***
[ 280.391902]
[ 284.986385] 2 locks held by kworker/0:2/47:
[ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[ 289.727725]
Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications)
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/acpi/device_pm.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable)
#ifdef CONFIG_PM
static DEFINE_MUTEX(acpi_pm_notifier_lock);
+static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
void acpi_pm_wakeup_event(struct device *dev)
{
@@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct
if (!dev && !func)
return AE_BAD_PARAMETER;
- mutex_lock(&acpi_pm_notifier_lock);
+ mutex_lock(&acpi_pm_notifier_install_lock);
if (adev->wakeup.flags.notifier_present)
goto out;
- adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
- adev->wakeup.context.dev = dev;
- adev->wakeup.context.func = func;
-
status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
acpi_pm_notify_handler, NULL);
if (ACPI_FAILURE(status))
goto out;
+ mutex_lock(&acpi_pm_notifier_lock);
+ adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
+ adev->wakeup.context.dev = dev;
+ adev->wakeup.context.func = func;
adev->wakeup.flags.notifier_present = true;
+ mutex_unlock(&acpi_pm_notifier_lock);
out:
- mutex_unlock(&acpi_pm_notifier_lock);
+ mutex_unlock(&acpi_pm_notifier_install_lock);
return status;
}
@@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(stru
{
acpi_status status = AE_BAD_PARAMETER;
- mutex_lock(&acpi_pm_notifier_lock);
+ mutex_lock(&acpi_pm_notifier_install_lock);
if (!adev->wakeup.flags.notifier_present)
goto out;
@@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(stru
if (ACPI_FAILURE(status))
goto out;
+ mutex_lock(&acpi_pm_notifier_lock);
adev->wakeup.context.func = NULL;
adev->wakeup.context.dev = NULL;
wakeup_source_unregister(adev->wakeup.ws);
-
adev->wakeup.flags.notifier_present = false;
+ mutex_unlock(&acpi_pm_notifier_lock);
out:
- mutex_unlock(&acpi_pm_notifier_lock);
+ mutex_unlock(&acpi_pm_notifier_install_lock);
return status;
}
Patches currently in stable-queue which might be from ville.syrjala(a)linux.intel.com are
queue-4.14/acpi-pm-fix-acpi_pm_notifier_lock-vs-flush_workqueue-deadlock.patch
This is a note to let you know that I've just added the patch titled
ACPI / EC: Fix regression related to triggering source of EC event handling
ACPI / EC: Fix an issue that SCI_EVT cannot be detected
ACPI / EC: Remove old CLEAR_ON_RESUME quirk
Revert "ACPI / EC: Enable event freeze mode..." to fix
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb Mon Sep 17 00:00:00 2001
From: Lv Zheng <lv.zheng(a)intel.com>
Date: Tue, 26 Sep 2017 16:54:03 +0800
Subject: ACPI / EC: Fix regression related to triggering source of EC event handling
From: Lv Zheng <lv.zheng(a)intel.com>
commit 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb upstream.
Originally the Samsung quirks removed by commit 4c237371 can be covered
by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956
changed ec_freeze_events=Y back to N, making this problem re-surface.
Actually, if commit e923e8e7 is robust enough, we can freely change
ec_freeze_events mode, so this patch fixes the issue by improving
commit e923e8e7.
Related commits listed in the merged order:
Commit: e923e8e79e18fd6be9162f1be6b99a002e9df2cb
Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected
after event is enabled
Commit: 4c237371f290d1ed3b2071dd43554362137b1cce
Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk
Commit: 9c40f956ce9b331493347d1b3cb7e384f7dc0581
Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix
a regression
This patch not only fixes the reported post-resume EC event triggering
source issue, but also fixes an unreported similar issue related to the
driver bind by adding EC event triggering source in ec_install_handlers().
Fixes: e923e8e79e18 (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled)
Fixes: 4c237371f290 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk)
Fixes: 9c40f956ce9b (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833
Signed-off-by: Lv Zheng <lv.zheng(a)intel.com>
Reported-by: Alistair Hamilton <ahpatent(a)gmail.com>
Tested-by: Alistair Hamilton <ahpatent(a)gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/acpi/ec.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -486,8 +486,11 @@ static inline void __acpi_ec_enable_even
{
if (!test_and_set_bit(EC_FLAGS_QUERY_ENABLED, &ec->flags))
ec_log_drv("event unblocked");
- if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
- advance_transaction(ec);
+ /*
+ * Unconditionally invoke this once after enabling the event
+ * handling mechanism to detect the pending events.
+ */
+ advance_transaction(ec);
}
static inline void __acpi_ec_disable_event(struct acpi_ec *ec)
@@ -1456,11 +1459,10 @@ static int ec_install_handlers(struct ac
if (test_bit(EC_FLAGS_STARTED, &ec->flags) &&
ec->reference_count >= 1)
acpi_ec_enable_gpe(ec, true);
-
- /* EC is fully operational, allow queries */
- acpi_ec_enable_event(ec);
}
}
+ /* EC is fully operational, allow queries */
+ acpi_ec_enable_event(ec);
return 0;
}
Patches currently in stable-queue which might be from lv.zheng(a)intel.com are
queue-4.14/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:24 +0000
Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.
The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu <fengguang.wu(a)intel.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Tested-by: Toshi Kani <toshi.kani(a)hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon(a)arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index caf86be815ba..4052ec39e8db 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -51,6 +51,13 @@ enum fixed_addresses {
FIX_EARLYCON_MEM_BASE,
FIX_TEXT_POKE0,
+
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif /* CONFIG_ACPI_APEI_GHES */
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index dcd9fb55e679..b0c505fe9a95 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,6 +104,12 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index cb7aceae3553..572b6c7303ed 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -51,6 +51,7 @@
#include <acpi/actbl1.h>
#include <acpi/ghes.h>
#include <acpi/apei.h>
+#include <asm/fixmap.h>
#include <asm/tlbflush.h>
#include <ras/ras_event.h>
@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* Because the memory area used to transfer hardware error information
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * the fixmap is used instead.
*/
/*
@@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex);
/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
- * area from being mapped simultaneously.
+ * These 2 spinlocks are used to prevent the fixmap entries from being used
+ * simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
@@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void)
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
+ __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
}
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
+ __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
}
-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap_nmi(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_NMI);
}
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
+static void ghes_iounmap_irq(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_IRQ);
}
static int ghes_estatus_pool_init(void)
@@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
paddr += trunk;
buffer += trunk;
if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
+ ghes_iounmap_nmi();
raw_spin_unlock(&ghes_ioremap_lock_nmi);
} else {
- ghes_iounmap_irq(vaddr);
+ ghes_iounmap_irq();
spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:24 +0000
Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.
The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu <fengguang.wu(a)intel.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Tested-by: Toshi Kani <toshi.kani(a)hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon(a)arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index caf86be815ba..4052ec39e8db 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -51,6 +51,13 @@ enum fixed_addresses {
FIX_EARLYCON_MEM_BASE,
FIX_TEXT_POKE0,
+
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif /* CONFIG_ACPI_APEI_GHES */
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index dcd9fb55e679..b0c505fe9a95 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,6 +104,12 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index cb7aceae3553..572b6c7303ed 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -51,6 +51,7 @@
#include <acpi/actbl1.h>
#include <acpi/ghes.h>
#include <acpi/apei.h>
+#include <asm/fixmap.h>
#include <asm/tlbflush.h>
#include <ras/ras_event.h>
@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* Because the memory area used to transfer hardware error information
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * the fixmap is used instead.
*/
/*
@@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex);
/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
- * area from being mapped simultaneously.
+ * These 2 spinlocks are used to prevent the fixmap entries from being used
+ * simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
@@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void)
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
+ __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
}
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
+ __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
}
-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap_nmi(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_NMI);
}
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
+static void ghes_iounmap_irq(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_IRQ);
}
static int ghes_estatus_pool_init(void)
@@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
paddr += trunk;
buffer += trunk;
if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
+ ghes_iounmap_nmi();
raw_spin_unlock(&ghes_ioremap_lock_nmi);
} else {
- ghes_iounmap_irq(vaddr);
+ ghes_iounmap_irq();
spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:24 +0000
Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.
The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu <fengguang.wu(a)intel.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Tested-by: Toshi Kani <toshi.kani(a)hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon(a)arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index caf86be815ba..4052ec39e8db 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -51,6 +51,13 @@ enum fixed_addresses {
FIX_EARLYCON_MEM_BASE,
FIX_TEXT_POKE0,
+
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif /* CONFIG_ACPI_APEI_GHES */
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index dcd9fb55e679..b0c505fe9a95 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,6 +104,12 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index cb7aceae3553..572b6c7303ed 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -51,6 +51,7 @@
#include <acpi/actbl1.h>
#include <acpi/ghes.h>
#include <acpi/apei.h>
+#include <asm/fixmap.h>
#include <asm/tlbflush.h>
#include <ras/ras_event.h>
@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* Because the memory area used to transfer hardware error information
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * the fixmap is used instead.
*/
/*
@@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex);
/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
- * area from being mapped simultaneously.
+ * These 2 spinlocks are used to prevent the fixmap entries from being used
+ * simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
@@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void)
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
+ __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
}
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
+ __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
}
-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap_nmi(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_NMI);
}
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
+static void ghes_iounmap_irq(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_IRQ);
}
static int ghes_estatus_pool_init(void)
@@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
paddr += trunk;
buffer += trunk;
if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
+ ghes_iounmap_nmi();
raw_spin_unlock(&ghes_ioremap_lock_nmi);
} else {
- ghes_iounmap_irq(vaddr);
+ ghes_iounmap_irq();
spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:24 +0000
Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.
The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.
clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().
Reported-by: Fengguang Wu <fengguang.wu(a)intel.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Tested-by: Toshi Kani <toshi.kani(a)hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon(a)arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index caf86be815ba..4052ec39e8db 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -51,6 +51,13 @@ enum fixed_addresses {
FIX_EARLYCON_MEM_BASE,
FIX_TEXT_POKE0,
+
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif /* CONFIG_ACPI_APEI_GHES */
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index dcd9fb55e679..b0c505fe9a95 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -104,6 +104,12 @@ enum fixed_addresses {
FIX_GDT_REMAP_BEGIN,
FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
+#ifdef CONFIG_ACPI_APEI_GHES
+ /* Used for GHES mapping from assorted contexts */
+ FIX_APEI_GHES_IRQ,
+ FIX_APEI_GHES_NMI,
+#endif
+
__end_of_permanent_fixed_addresses,
/*
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index cb7aceae3553..572b6c7303ed 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -51,6 +51,7 @@
#include <acpi/actbl1.h>
#include <acpi/ghes.h>
#include <acpi/apei.h>
+#include <asm/fixmap.h>
#include <asm/tlbflush.h>
#include <ras/ras_event.h>
@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* Because the memory area used to transfer hardware error information
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * the fixmap is used instead.
*/
/*
@@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex);
/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
- * area from being mapped simultaneously.
+ * These 2 spinlocks are used to prevent the fixmap entries from being used
+ * simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
@@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void)
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
+ __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
}
static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
{
- unsigned long vaddr;
phys_addr_t paddr;
pgprot_t prot;
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
-
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
+ __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
+ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
}
-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap_nmi(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_NMI);
}
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
+static void ghes_iounmap_irq(void)
{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;
-
- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ clear_fixmap(FIX_APEI_GHES_IRQ);
}
static int ghes_estatus_pool_init(void)
@@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
paddr += trunk;
buffer += trunk;
if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
+ ghes_iounmap_nmi();
raw_spin_unlock(&ghes_ioremap_lock_nmi);
} else {
- ghes_iounmap_irq(vaddr);
+ ghes_iounmap_irq();
spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
}
}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:25 +0000
Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area
Now that nothing is using the ghes_ioremap_area pages, rip them out.
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 572b6c7303ed..f14695e744d0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
- */
-
-/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
- */
-#define GHES_IOREMAP_PAGES 2
-#define GHES_IOREMAP_IRQ_PAGE(base) (base)
-#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)
-
-/* virtual memory area for atomic ioremap */
-static struct vm_struct *ghes_ioremap_area;
-/*
+ *
* These 2 spinlocks are used to prevent the fixmap entries from being used
* simultaneously.
*/
@@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
-static int ghes_ioremap_init(void)
-{
- ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
- VM_IOREMAP, VMALLOC_START, VMALLOC_END);
- if (!ghes_ioremap_area) {
- pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n");
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static void ghes_ioremap_exit(void)
-{
- free_vm_area(ghes_ioremap_area);
-}
-
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
phys_addr_t paddr;
@@ -1247,13 +1218,9 @@ static int __init ghes_init(void)
ghes_nmi_init_cxt();
- rc = ghes_ioremap_init();
- if (rc)
- goto err;
-
rc = ghes_estatus_pool_init();
if (rc)
- goto err_ioremap_exit;
+ goto err;
rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE *
GHES_ESTATUS_CACHE_ALLOCED_MAX);
@@ -1277,8 +1244,6 @@ static int __init ghes_init(void)
return 0;
err_pool_exit:
ghes_estatus_pool_exit();
-err_ioremap_exit:
- ghes_ioremap_exit();
err:
return rc;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:25 +0000
Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area
Now that nothing is using the ghes_ioremap_area pages, rip them out.
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 572b6c7303ed..f14695e744d0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
- */
-
-/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
- */
-#define GHES_IOREMAP_PAGES 2
-#define GHES_IOREMAP_IRQ_PAGE(base) (base)
-#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)
-
-/* virtual memory area for atomic ioremap */
-static struct vm_struct *ghes_ioremap_area;
-/*
+ *
* These 2 spinlocks are used to prevent the fixmap entries from being used
* simultaneously.
*/
@@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
-static int ghes_ioremap_init(void)
-{
- ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
- VM_IOREMAP, VMALLOC_START, VMALLOC_END);
- if (!ghes_ioremap_area) {
- pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n");
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static void ghes_ioremap_exit(void)
-{
- free_vm_area(ghes_ioremap_area);
-}
-
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
phys_addr_t paddr;
@@ -1247,13 +1218,9 @@ static int __init ghes_init(void)
ghes_nmi_init_cxt();
- rc = ghes_ioremap_init();
- if (rc)
- goto err;
-
rc = ghes_estatus_pool_init();
if (rc)
- goto err_ioremap_exit;
+ goto err;
rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE *
GHES_ESTATUS_CACHE_ALLOCED_MAX);
@@ -1277,8 +1244,6 @@ static int __init ghes_init(void)
return 0;
err_pool_exit:
ghes_estatus_pool_exit();
-err_ioremap_exit:
- ghes_ioremap_exit();
err:
return rc;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:25 +0000
Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area
Now that nothing is using the ghes_ioremap_area pages, rip them out.
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 572b6c7303ed..f14695e744d0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
- */
-
-/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
- */
-#define GHES_IOREMAP_PAGES 2
-#define GHES_IOREMAP_IRQ_PAGE(base) (base)
-#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)
-
-/* virtual memory area for atomic ioremap */
-static struct vm_struct *ghes_ioremap_area;
-/*
+ *
* These 2 spinlocks are used to prevent the fixmap entries from being used
* simultaneously.
*/
@@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
-static int ghes_ioremap_init(void)
-{
- ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
- VM_IOREMAP, VMALLOC_START, VMALLOC_END);
- if (!ghes_ioremap_area) {
- pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n");
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static void ghes_ioremap_exit(void)
-{
- free_vm_area(ghes_ioremap_area);
-}
-
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
phys_addr_t paddr;
@@ -1247,13 +1218,9 @@ static int __init ghes_init(void)
ghes_nmi_init_cxt();
- rc = ghes_ioremap_init();
- if (rc)
- goto err;
-
rc = ghes_estatus_pool_init();
if (rc)
- goto err_ioremap_exit;
+ goto err;
rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE *
GHES_ESTATUS_CACHE_ALLOCED_MAX);
@@ -1277,8 +1244,6 @@ static int __init ghes_init(void)
return 0;
err_pool_exit:
ghes_estatus_pool_exit();
-err_ioremap_exit:
- ghes_ioremap_exit();
err:
return rc;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Mon, 6 Nov 2017 18:44:25 +0000
Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area
Now that nothing is using the ghes_ioremap_area pages, rip them out.
Signed-off-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: All applicable <stable(a)vger.kernel.org>
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 572b6c7303ed..f14695e744d0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
- */
-
-/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
- */
-#define GHES_IOREMAP_PAGES 2
-#define GHES_IOREMAP_IRQ_PAGE(base) (base)
-#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)
-
-/* virtual memory area for atomic ioremap */
-static struct vm_struct *ghes_ioremap_area;
-/*
+ *
* These 2 spinlocks are used to prevent the fixmap entries from being used
* simultaneously.
*/
@@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
-static int ghes_ioremap_init(void)
-{
- ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
- VM_IOREMAP, VMALLOC_START, VMALLOC_END);
- if (!ghes_ioremap_area) {
- pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n");
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static void ghes_ioremap_exit(void)
-{
- free_vm_area(ghes_ioremap_area);
-}
-
static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
{
phys_addr_t paddr;
@@ -1247,13 +1218,9 @@ static int __init ghes_init(void)
ghes_nmi_init_cxt();
- rc = ghes_ioremap_init();
- if (rc)
- goto err;
-
rc = ghes_estatus_pool_init();
if (rc)
- goto err_ioremap_exit;
+ goto err;
rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE *
GHES_ESTATUS_CACHE_ALLOCED_MAX);
@@ -1277,8 +1244,6 @@ static int __init ghes_init(void)
return 0;
err_pool_exit:
ghes_estatus_pool_exit();
-err_ioremap_exit:
- ghes_ioremap_exit();
err:
return rc;
}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Tue, 26 Sep 2017 09:16:48 +0200
Subject: [PATCH] s390/disassembler: add missing end marker for e7 table
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Cc: <stable(a)vger.kernel.org> # v3.18+
Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
diff --git a/arch/s390/kernel/dis.c b/arch/s390/kernel/dis.c
index f7e82302a71e..d9970c15f79d 100644
--- a/arch/s390/kernel/dis.c
+++ b/arch/s390/kernel/dis.c
@@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = {
{ "vfsq", 0xce, INSTR_VRR_VV000MM },
{ "vfs", 0xe2, INSTR_VRR_VVV00MM },
{ "vftci", 0x4a, INSTR_VRI_VVIMM },
+ { "", 0, INSTR_INVALID }
};
static struct s390_insn opcode_eb[] = {
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Mon, 11 Sep 2017 11:24:22 +0200
Subject: [PATCH] s390/runtime instrumention: fix possible memory corruption
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation")
Cc: <stable(a)vger.kernel.org> # v3.7+
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
diff --git a/arch/s390/kernel/runtime_instr.c b/arch/s390/kernel/runtime_instr.c
index 429d3a782f1c..b9738ae2e1de 100644
--- a/arch/s390/kernel/runtime_instr.c
+++ b/arch/s390/kernel/runtime_instr.c
@@ -49,11 +49,13 @@ void exit_thread_runtime_instr(void)
{
struct task_struct *task = current;
+ preempt_disable();
if (!task->thread.ri_cb)
return;
disable_runtime_instr();
kfree(task->thread.ri_cb);
task->thread.ri_cb = NULL;
+ preempt_enable();
}
SYSCALL_DEFINE1(s390_runtime_instr, int, command)
@@ -64,9 +66,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, command)
return -EOPNOTSUPP;
if (command == S390_RUNTIME_INSTR_STOP) {
- preempt_disable();
exit_thread_runtime_instr();
- preempt_enable();
return 0;
}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Date: Thu, 9 Nov 2017 12:29:34 +0100
Subject: [PATCH] s390: fix transactional execution control register handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan(a)danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: <stable(a)vger.kernel.org> # v3.7+
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com>
diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h
index f6c2b5814ab0..8e6b07609ff4 100644
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -36,8 +36,8 @@ static inline void restore_access_regs(unsigned int *acrs)
save_ri_cb(prev->thread.ri_cb); \
save_gs_cb(prev->thread.gs_cb); \
} \
+ update_cr_regs(next); \
if (next->mm) { \
- update_cr_regs(next); \
set_cpu_flag(CIF_FPU); \
restore_access_regs(&next->thread.acrs[0]); \
restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 389e9f61c76f..5096875f7822 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -238,8 +238,10 @@ static __init void detect_machine_facilities(void)
S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE;
if (test_facility(40))
S390_lowcore.machine_flags |= MACHINE_FLAG_LPP;
- if (test_facility(50) && test_facility(73))
+ if (test_facility(50) && test_facility(73)) {
S390_lowcore.machine_flags |= MACHINE_FLAG_TE;
+ __ctl_set_bit(0, 55);
+ }
if (test_facility(51))
S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC;
if (test_facility(129)) {
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 080c851dd9a5..cee658e27732 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -86,6 +86,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long new_stackp,
memset(&p->thread.per_user, 0, sizeof(p->thread.per_user));
memset(&p->thread.per_event, 0, sizeof(p->thread.per_event));
clear_tsk_thread_flag(p, TIF_SINGLE_STEP);
+ p->thread.per_flags = 0;
/* Initialize per thread user and system timer values */
p->thread.user_timer = 0;
p->thread.guest_timer = 0;
The skcipher_walk_aead_common function calls scatterwalk_copychunks on
the input and output walks to skip the associated data. If the AD end
at an SG list entry boundary, then after these calls the walks will
still be pointing to the end of the skipped region.
These offsets are later checked for alignment in skcipher_walk_next,
so the skcipher_walk may detect the alignment incorrectly.
This patch fixes it by calling scatterwalk_done after the copychunks
calls to ensure that the offsets refer to the right SG list entry.
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnacek(a)gmail.com>
---
crypto/skcipher.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 4faa0fd53b0c..6c45ed536664 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -517,6 +517,9 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk,
scatterwalk_copychunks(NULL, &walk->in, req->assoclen, 2);
scatterwalk_copychunks(NULL, &walk->out, req->assoclen, 2);
+ scatterwalk_done(&walk->in, 0, walk->total);
+ scatterwalk_done(&walk->out, 0, walk->total);
+
walk->iv = req->iv;
walk->oiv = req->iv;
--
2.14.1
From: Rui Hua <huarui.dev(a)gmail.com>
When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s->iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)
It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in /sys/fs/bcache/XXX/internal/cache_read_races.
Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happened, until the whole cache
device is clean, because the condition
(s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.
In this patch, we use s->read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.
[edited by mlyle to fix up whitespace, commit log title, comment
spelling]
Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean")
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Hua Rui <huarui.dev(a)gmail.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Reviewed-by: Coly Li <colyli(a)suse.de>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
---
drivers/md/bcache/request.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 3a7aed7282b2..643c3021624f 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -708,16 +708,15 @@ static void cached_dev_read_error(struct closure *cl)
{
struct search *s = container_of(cl, struct search, cl);
struct bio *bio = &s->bio.bio;
- struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
/*
- * If cache device is dirty (dc->has_dirty is non-zero), then
- * recovery a failed read request from cached device may get a
- * stale data back. So read failure recovery is only permitted
- * when cache device is clean.
+ * If read request hit dirty data (s->read_dirty_data is true),
+ * then recovery a failed read request from cached device may
+ * get a stale data back. So read failure recovery is only
+ * permitted when read request hit clean data in cache device,
+ * or when cache read race happened.
*/
- if (s->recoverable &&
- (dc && !atomic_read(&dc->has_dirty))) {
+ if (s->recoverable && !s->read_dirty_data) {
/* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio);
--
2.14.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
We're supposed to examine msgs[i] and msgs[i+1] to see if they
form a pair suitable for an indexed transfer. But in reality
we're examining msgs[0] and msgs[1]. Fix this.
Cc: stable(a)vger.kernel.org
Cc: Daniel Kurtz <djkurtz(a)chromium.org>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sean Paul <seanpaul(a)chromium.org>
Fixes: 56f9eac05489 ("drm/i915/intel_i2c: use INDEX cycles for i2c read transactions")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/intel_i2c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index eb5827110d8f..165375cbef2f 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -484,7 +484,7 @@ do_gmbus_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs, int num)
for (; i < num; i += inc) {
inc = 1;
- if (gmbus_is_index_read(msgs, i, num)) {
+ if (gmbus_is_index_read(&msgs[i], i, num)) {
ret = gmbus_xfer_index_read(dev_priv, &msgs[i]);
inc = 2; /* an index read is two msgs */
} else if (msgs[i].flags & I2C_M_RD) {
--
2.13.6
This is a note to let you know that I've just added the patch titled
mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-hwpoison-fixup-mm-check-the-return-value-of-lookup_page_ext-for-all-call-sites.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Fri Nov 24 11:13:07 CET 2017
Date: Fri, 24 Nov 2017 11:13:07 +0100
To: Greg KH <gregkh(a)linuxfoundation.org>
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
From: Michal Hocko <mhocko(a)suse.com>
Backport of the upstream commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites") is wrong for hwpoison
pages. I have accidentally negated the condition for bailout. This
basically disables hwpoison pages tracking while the code still
might crash on unusual configurations when struct pages do not have
page_ext allocated. The fix is trivial to invert the condition.
Reported-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/debug-pagealloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/debug-pagealloc.c
+++ b/mm/debug-pagealloc.c
@@ -34,7 +34,7 @@ static inline void set_page_poison(struc
struct page_ext *page_ext;
page_ext = lookup_page_ext(page);
- if (page_ext)
+ if (!page_ext)
return;
__set_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags);
}
@@ -44,7 +44,7 @@ static inline void clear_page_poison(str
struct page_ext *page_ext;
page_ext = lookup_page_ext(page);
- if (page_ext)
+ if (!page_ext)
return;
__clear_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags);
}
@@ -54,7 +54,7 @@ static inline bool page_poison(struct pa
struct page_ext *page_ext;
page_ext = lookup_page_ext(page);
- if (page_ext)
+ if (!page_ext)
return false;
return test_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags);
}
Patches currently in stable-queue which might be from gregkh(a)linuxfoundation.org are
queue-4.4/mm-hwpoison-fixup-mm-check-the-return-value-of-lookup_page_ext-for-all-call-sites.patch
During an eeh a kernel-oops is reported if no vPHB is allocated to the
AFU. This happens as during AFU init, an error in creation of vPHB is
a non-fatal error. Hence afu->phb should always be checked for NULL
before iterating over it for the virtual AFU pci devices.
This patch fixes the kenel-oops by adding a NULL pointer check for
afu->phb before it is dereferenced.
Fixes: 9e8df8a2196("cxl: EEH support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com>
---
Changelog:
v3 -> Fixed a reverse NULL check [Fred]
Resend -> Added the 'Fixes' info and marking the patch to stable tree [Mpe]
v2 -> Added the vphb NULL check to cxl_vphb_error_detected() [Andrew]
---
drivers/misc/cxl/pci.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index bb7fd3f4edab..19969ee86d6f 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -2083,6 +2083,9 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
/* There should only be one entry, but go through the list
* anyway
*/
+ if (afu->phb == NULL)
+ return result;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
if (!afu_dev->driver)
continue;
@@ -2124,8 +2127,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
* Tell the AFU drivers; but we don't care what they
* say, we're going away.
*/
- if (afu->phb != NULL)
- cxl_vphb_error_detected(afu, state);
+ cxl_vphb_error_detected(afu, state);
}
return PCI_ERS_RESULT_DISCONNECT;
}
@@ -2265,6 +2267,9 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
if (cxl_afu_select_best_mode(afu))
goto err;
+ if (afu->phb == NULL)
+ continue;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
/* Reset the device context.
* TODO: make this less disruptive
@@ -2327,6 +2332,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
+ if (afu->phb == NULL)
+ continue;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
if (afu_dev->driver && afu_dev->driver->err_handler &&
afu_dev->driver->err_handler->resume)
--
2.14.3
Commit-ID: 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df
Gitweb: https://git.kernel.org/tip/12a78d43de767eaf8fb272facb7a7b6f2dc6a9df
Author: Masami Hiramatsu <mhiramat(a)kernel.org>
AuthorDate: Fri, 24 Nov 2017 13:56:30 +0900
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Fri, 24 Nov 2017 08:36:12 +0100
x86/decoder: Add new TEST instruction pattern
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12e3771..c4d5591 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -896,7 +896,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Commit-ID: 843c885f41b3c74212d8a85da95993075dd41415
Gitweb: https://git.kernel.org/tip/843c885f41b3c74212d8a85da95993075dd41415
Author: Masami Hiramatsu <mhiramat(a)kernel.org>
AuthorDate: Fri, 24 Nov 2017 13:56:30 +0900
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Fri, 24 Nov 2017 08:16:09 +0100
x86/decoder: Add new TEST instruction pattern
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12e3771..c4d5591 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -896,7 +896,7 @@ EndTable
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
Changes since v1 [1]:
* fix arm64 compilation, add __HAVE_ARCH_PUD_WRITE
* fix sparc64 compilation, add __HAVE_ARCH_PUD_WRITE
* fix s390 compilation, add a pud_write() helper
---
Andrew,
Here is a third version to the pud_write() fix [2], and some follow-on
patches to use the '_access_permitted' helpers in fault and
get_user_pages() paths where we are checking if the thread has access to
write. I explicitly omit conversions for places where the kernel is
checking the _PAGE_RW flag for kernel purposes, not for userspace
access.
Beyond fixing the crash, this series also fixes get_user_pages() and
fault paths to honor protection keys in the same manner as
get_user_pages_fast(). Only the crash fix is tagged for -stable as the
protection key check is done just for consistency reasons since
userspace can change protection keys at will.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013249.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013237.html
---
Dan Williams (4):
mm: fix device-dax pud write-faults triggered by get_user_pages()
mm: replace pud_write with pud_access_permitted in fault + gup paths
mm: replace pmd_write with pmd_access_permitted in fault + gup paths
mm: replace pte_write with pte_access_permitted in fault + gup paths
arch/arm64/include/asm/pgtable.h | 1 +
arch/s390/include/asm/pgtable.h | 6 ++++++
arch/sparc/include/asm/pgtable_64.h | 1 +
arch/sparc/mm/gup.c | 4 ++--
arch/x86/include/asm/pgtable.h | 6 ++++++
fs/dax.c | 3 ++-
include/asm-generic/pgtable.h | 9 +++++++++
include/linux/hugetlb.h | 8 --------
mm/gup.c | 2 +-
mm/hmm.c | 8 ++++----
mm/huge_memory.c | 6 +++---
mm/memory.c | 8 ++++----
12 files changed, 39 insertions(+), 23 deletions(-)
This is the start of the stable review cycle for the 4.14.2 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Nov 24 10:11:38 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.2-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.2-rc1
Jan Harkes <jaharkes(a)cs.cmu.edu>
coda: fix 'kernel memory exposure attempt' in fsync
Jaewon Kim <jaewon31.kim(a)samsung.com>
mm/page_ext.c: check if page_ext is not prepared
Pavel Tatashin <pasha.tatashin(a)oracle.com>
mm/page_alloc.c: broken deferred calculation
Corey Minyard <cminyard(a)mvista.com>
ipmi: fix unsigned long underflow
alex chen <alex.chen(a)huawei.com>
ocfs2: should wait dio before inode lock in ocfs2_setattr()
Changwei Ge <ge.changwei(a)h3c.com>
ocfs2: fix cluster hang after a node dies
Jann Horn <jannh(a)google.com>
mm/pagewalk.c: report holes in hugetlb ranges
Neeraj Upadhyay <neeraju(a)codeaurora.org>
rcu: Fix up pending cbs check in rcu_prepare_for_idle
Alexander Steffen <Alexander.Steffen(a)infineon.com>
tpm-dev-common: Reject too short writes
Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com>
serial: 8250_fintek: Fix finding base_port with activated SuperIO
Lukas Wunner <lukas(a)wunner.de>
serial: omap: Fix EFR write on RTS deassertion
Roberto Sassu <roberto.sassu(a)huawei.com>
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Eric W. Biederman <ebiederm(a)xmission.com>
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
Huacai Chen <chenhc(a)lemote.com>
fealnx: Fix building error on MIPS
Bjørn Mork <bjorn(a)mork.no>
net: cdc_ncm: GetNtbFormat endian fix
Xin Long <lucien.xin(a)gmail.com>
vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
Jason A. Donenfeld <Jason(a)zx2c4.com>
af_netlink: ensure that NLMSG_DONE never fails in dumps
Michael Lyle <mlyle(a)lyle.org>
bio: ensure __bio_clone_fast copies bi_partno
-------------
Diffstat:
Makefile | 4 ++--
block/bio.c | 1 +
drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++----
drivers/char/tpm/tpm-dev-common.c | 6 ++++++
drivers/net/ethernet/fealnx.c | 6 +++---
drivers/net/usb/cdc_ncm.c | 4 ++--
drivers/net/vxlan.c | 31 +++++++++++++------------------
drivers/tty/serial/8250/8250_fintek.c | 3 +++
drivers/tty/serial/omap-serial.c | 2 +-
fs/coda/upcall.c | 3 +--
fs/ocfs2/dlm/dlmrecovery.c | 1 +
fs/ocfs2/file.c | 9 +++++++--
include/linux/mmzone.h | 3 ++-
kernel/rcu/tree_plugin.h | 2 +-
mm/page_alloc.c | 27 ++++++++++++++++++---------
mm/page_ext.c | 4 ----
mm/pagewalk.c | 6 +++++-
net/netlink/af_netlink.c | 17 +++++++++++------
net/netlink/af_netlink.h | 1 +
net/sctp/ipv6.c | 5 +++--
security/integrity/ima/ima_appraise.c | 3 +++
21 files changed, 90 insertions(+), 58 deletions(-)
This is the start of the stable review cycle for the 4.13.16 release.
There are 35 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Nov 24 10:11:25 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.16-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.13.16-rc1
Jan Harkes <jaharkes(a)cs.cmu.edu>
coda: fix 'kernel memory exposure attempt' in fsync
Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
Jaewon Kim <jaewon31.kim(a)samsung.com>
mm/page_ext.c: check if page_ext is not prepared
Pavel Tatashin <pasha.tatashin(a)oracle.com>
mm/page_alloc.c: broken deferred calculation
Corey Minyard <cminyard(a)mvista.com>
ipmi: fix unsigned long underflow
alex chen <alex.chen(a)huawei.com>
ocfs2: should wait dio before inode lock in ocfs2_setattr()
Changwei Ge <ge.changwei(a)h3c.com>
ocfs2: fix cluster hang after a node dies
Jann Horn <jannh(a)google.com>
mm/pagewalk.c: report holes in hugetlb ranges
Neeraj Upadhyay <neeraju(a)codeaurora.org>
rcu: Fix up pending cbs check in rcu_prepare_for_idle
Alexander Steffen <Alexander.Steffen(a)infineon.com>
tpm-dev-common: Reject too short writes
Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com>
serial: 8250_fintek: Fix finding base_port with activated SuperIO
Lukas Wunner <lukas(a)wunner.de>
serial: omap: Fix EFR write on RTS deassertion
Roberto Sassu <roberto.sassu(a)huawei.com>
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Eric W. Biederman <ebiederm(a)xmission.com>
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
Huacai Chen <chenhc(a)lemote.com>
fealnx: Fix building error on MIPS
Xin Long <lucien.xin(a)gmail.com>
sctp: do not peel off an assoc from one netns to another one
Bjørn Mork <bjorn(a)mork.no>
net: cdc_ncm: GetNtbFormat endian fix
Xin Long <lucien.xin(a)gmail.com>
vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
Jason A. Donenfeld <Jason(a)zx2c4.com>
af_netlink: ensure that NLMSG_DONE never fails in dumps
Inbar Karmy <inbark(a)mellanox.com>
net/mlx5e: Set page to null in case dma mapping fails
Huy Nguyen <huyn(a)mellanox.com>
net/mlx5: Cancel health poll before sending panic teardown command
Cong Wang <xiyou.wangcong(a)gmail.com>
vlan: fix a use-after-free in vlan_device_event()
Yuchung Cheng <ycheng(a)google.com>
tcp: fix tcp_fastretrans_alert warning
Eric Dumazet <edumazet(a)google.com>
tcp: gso: avoid refcount_t warning from tcp_gso_segment()
Andrey Konovalov <andreyknvl(a)google.com>
net: usb: asix: fill null-ptr-deref in asix_suspend
Kristian Evensen <kristian.evensen(a)gmail.com>
qmi_wwan: Add missing skb_reset_mac_header-call
Bjørn Mork <bjorn(a)mork.no>
net: qmi_wwan: fix divide by 0 on bad descriptors
Bjørn Mork <bjorn(a)mork.no>
net: cdc_ether: fix divide by 0 on bad descriptors
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: discard lowest hash bit for 802.3ad layer3+4
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6
Ye Yin <hustcat(a)gmail.com>
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
Florian Fainelli <f.fainelli(a)gmail.com>
net: systemport: Correct IPG length settings
Eric Dumazet <edumazet(a)google.com>
tcp: do not mangle skb->cb[] in tcp_make_synack()
Jeff Barnhill <0xeffeff(a)gmail.com>
net: vrf: correct FRA_L3MDEV encode type
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
tcp_nv: fix division by zero in tcpnv_acked()
-------------
Diffstat:
Makefile | 4 ++--
arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++-----------
drivers/char/ipmi/ipmi_msghandler.c | 10 ++++----
drivers/char/tpm/tpm-dev-common.c | 6 +++++
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 10 ++++----
drivers/net/ethernet/fealnx.c | 6 ++---
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++++------
drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 ++++++
drivers/net/usb/asix_devices.c | 4 ++--
drivers/net/usb/cdc_ether.c | 2 +-
drivers/net/usb/cdc_ncm.c | 4 ++--
drivers/net/usb/qmi_wwan.c | 3 ++-
drivers/net/vrf.c | 2 +-
drivers/net/vxlan.c | 31 ++++++++++--------------
drivers/tty/serial/8250/8250_fintek.c | 3 +++
drivers/tty/serial/omap-serial.c | 2 +-
fs/coda/upcall.c | 3 +--
fs/ocfs2/dlm/dlmrecovery.c | 1 +
fs/ocfs2/file.c | 9 +++++--
include/linux/mmzone.h | 3 ++-
include/linux/skbuff.h | 7 ++++++
kernel/rcu/tree_plugin.h | 2 +-
mm/page_alloc.c | 27 ++++++++++++++-------
mm/page_ext.c | 4 ----
mm/pagewalk.c | 6 ++++-
net/8021q/vlan.c | 6 ++---
net/core/skbuff.c | 1 +
net/ipv4/tcp_input.c | 3 +--
net/ipv4/tcp_nv.c | 2 +-
net/ipv4/tcp_offload.c | 12 ++++++++--
net/ipv4/tcp_output.c | 9 ++-----
net/l2tp/l2tp_ip.c | 24 +++++++------------
net/l2tp/l2tp_ip6.c | 24 +++++++------------
net/netlink/af_netlink.c | 17 ++++++++-----
net/netlink/af_netlink.h | 1 +
net/sctp/ipv6.c | 5 ++--
net/sctp/socket.c | 4 ++++
security/integrity/ima/ima_appraise.c | 3 +++
39 files changed, 179 insertions(+), 134 deletions(-)
From: James Hogan <jhogan(a)kernel.org>
Building 32-bit MIPS64r2 kernels produces warnings like the following
on certain toolchains (such as GNU assembler 2.24.90, but not GNU
assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP
save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd
from fpu_save_double and fpu_restore_16odd from fpu_restore_double:
arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1
...
arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1
...
This appears to be because .set mips64r2 does not change the FPU ABI to
64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the
command line on that toolchain, from the default FPU ABI of 32-bit due
to the -mabi=32. This makes access to the odd FPU registers invalid.
Fix by explicitly changing the FPU ABI with .set fp=64 directives in
fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up
in asmmacro.h so fp doesn't turn into $30.
Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: linux-mips(a)linux-mips.org
Cc: <stable(a)vger.kernel.org> # 4.0+: 22b8ba765a72: MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
Cc: <stable(a)vger.kernel.org> # 4.0+
---
arch/mips/include/asm/asmmacro.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index b815d7b3bd27..feb069cbf44e 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -19,6 +19,9 @@
#include <asm/asmmacro-64.h>
#endif
+/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */
+#undef fp
+
/*
* Helper macros for generating raw instruction encodings.
*/
@@ -105,6 +108,7 @@
.macro fpu_save_16odd thread
.set push
.set mips64r2
+ .set fp=64
SET_HARDFLOAT
sdc1 $f1, THREAD_FPR1(\thread)
sdc1 $f3, THREAD_FPR3(\thread)
@@ -163,6 +167,7 @@
.macro fpu_restore_16odd thread
.set push
.set mips64r2
+ .set fp=64
SET_HARDFLOAT
ldc1 $f1, THREAD_FPR1(\thread)
ldc1 $f3, THREAD_FPR3(\thread)
@@ -234,9 +239,6 @@
.endm
#ifdef TOOLCHAIN_SUPPORTS_MSA
-/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */
-#undef fp
-
.macro _cfcmsa rd, cs
.set push
.set mips32r2
--
2.14.1
On Thu, 2017-11-23 at 13:08 +0000, Ben Hutchings wrote:
> On Tue, 2017-11-21 at 19:41 -0800, Joe Perches wrote:
> > On Wed, 2017-11-22 at 01:58 +0000, Ben Hutchings wrote:
> > > 3.16.51-rc1 review patch. If anyone has any objections, please let me know.
> > []
> > > --- a/drivers/md/bcache/writeback.h
> > > +++ b/drivers/md/bcache/writeback.h
> > > @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sector
> > > return ret;
> > > }
> > >
> > > +static inline uint64_t bcache_flash_devs_sectors_dirty(struct cache_set *c)
> > > +{
> > > + uint64_t i, ret = 0;
> >
> > There's no reason i should be uint64_t
> > as nr_uuids is unsigned int.
>
> But this still works, right? That's a minor issue to deal with
> upstream, not in the backport.
correct
This is the start of the stable review cycle for the 4.9.65 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Nov 24 10:11:07 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.65-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.65-rc1
Jan Harkes <jaharkes(a)cs.cmu.edu>
coda: fix 'kernel memory exposure attempt' in fsync
Pavel Tatashin <pasha.tatashin(a)oracle.com>
mm/page_alloc.c: broken deferred calculation
Corey Minyard <cminyard(a)mvista.com>
ipmi: fix unsigned long underflow
alex chen <alex.chen(a)huawei.com>
ocfs2: should wait dio before inode lock in ocfs2_setattr()
Changwei Ge <ge.changwei(a)h3c.com>
ocfs2: fix cluster hang after a node dies
Adam Wallis <awallis(a)codeaurora.org>
dmaengine: dmatest: warn user when dma test times out
Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com>
serial: 8250_fintek: Fix finding base_port with activated SuperIO
Lukas Wunner <lukas(a)wunner.de>
serial: omap: Fix EFR write on RTS deassertion
Roberto Sassu <roberto.sassu(a)huawei.com>
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Eric Biggers <ebiggers(a)google.com>
crypto: dh - Fix double free of ctx->p
Tudor-Dan Ambarus <tudor.ambarus(a)microchip.com>
crypto: dh - fix memleak in setkey
Eric W. Biederman <ebiederm(a)xmission.com>
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
Huacai Chen <chenhc(a)lemote.com>
fealnx: Fix building error on MIPS
Xin Long <lucien.xin(a)gmail.com>
sctp: do not peel off an assoc from one netns to another one
Jason A. Donenfeld <Jason(a)zx2c4.com>
af_netlink: ensure that NLMSG_DONE never fails in dumps
Cong Wang <xiyou.wangcong(a)gmail.com>
vlan: fix a use-after-free in vlan_device_event()
Andrey Konovalov <andreyknvl(a)google.com>
net: usb: asix: fill null-ptr-deref in asix_suspend
Kristian Evensen <kristian.evensen(a)gmail.com>
qmi_wwan: Add missing skb_reset_mac_header-call
Bjørn Mork <bjorn(a)mork.no>
net: qmi_wwan: fix divide by 0 on bad descriptors
Bjørn Mork <bjorn(a)mork.no>
net: cdc_ether: fix divide by 0 on bad descriptors
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: discard lowest hash bit for 802.3ad layer3+4
Ye Yin <hustcat(a)gmail.com>
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
Eric Dumazet <edumazet(a)google.com>
tcp: do not mangle skb->cb[] in tcp_make_synack()
Jeff Barnhill <0xeffeff(a)gmail.com>
net: vrf: correct FRA_L3MDEV encode type
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
tcp_nv: fix division by zero in tcpnv_acked()
-------------
Diffstat:
Makefile | 4 ++--
crypto/dh.c | 34 +++++++++++++++-------------------
drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++----
drivers/dma/dmatest.c | 1 +
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/fealnx.c | 6 +++---
drivers/net/usb/asix_devices.c | 4 ++--
drivers/net/usb/cdc_ether.c | 2 +-
drivers/net/usb/qmi_wwan.c | 3 ++-
drivers/net/vrf.c | 2 +-
drivers/tty/serial/8250/8250_fintek.c | 3 +++
drivers/tty/serial/omap-serial.c | 2 +-
fs/coda/upcall.c | 3 +--
fs/ocfs2/dlm/dlmrecovery.c | 1 +
fs/ocfs2/file.c | 9 +++++++--
include/linux/mmzone.h | 3 ++-
include/linux/skbuff.h | 7 +++++++
mm/page_alloc.c | 27 ++++++++++++++++++---------
net/8021q/vlan.c | 6 +++---
net/core/skbuff.c | 1 +
net/ipv4/tcp_nv.c | 2 +-
net/ipv4/tcp_output.c | 9 ++-------
net/netlink/af_netlink.c | 17 +++++++++++------
net/netlink/af_netlink.h | 1 +
net/sctp/ipv6.c | 5 +++--
net/sctp/socket.c | 4 ++++
security/integrity/ima/ima_appraise.c | 3 +++
27 files changed, 103 insertions(+), 68 deletions(-)
On Thu 23-11-17 13:05:10, Ben Hutchings wrote:
> On Wed, 2017-11-22 at 08:41 +0100, Vlastimil Babka wrote:
> > On 11/22/2017 02:58 AM, Ben Hutchings wrote:
> > > 3.16.51-rc1 review patch. If anyone has any objections, please let me know.
> >
> > I don't really care much in the end, but is "fix wrong comment" really a
> > stable patch material these days? :)
>
> It had a Fixes: field and it clearly won't do any harm.
Fixes tag is sometimes abused this way. It will not do any harm but the
fewer patch to backport the better IMHO
> Still, none of the other stable branches has it so I'll drop it.
Makes sense.
--
Michal Hocko
SUSE Labs
When no IOMMU is available, all GEM buffers allocated by Exynos DRM driver
are contiguous, because of the underlying dma_alloc_attrs() function
provides only such buffers. In such case it makes no sense to keep
BO_NONCONTIG flag for the allocated GEM buffers. This allows to avoid
failures for buffer contiguity checks in the subsequent operations on GEM
objects.
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
CC: stable(a)vger.kernel.org # v4.4+
---
This issue is there since commit 0519f9a12d011 ("drm/exynos: add iommu
support for exynos drm framework"), but this patch applies cleanly
only to v4.4+ kernel releases due changes in the surrounding code.
Changelog:
v2:
- added warning message when buffer flags are updadated (requested by Inki)
v1: https://patchwork.kernel.org/patch/10034919/
- initial version
---
drivers/gpu/drm/exynos/exynos_drm_gem.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 077de014d610..4400efe3974a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -247,6 +247,15 @@ struct exynos_drm_gem *exynos_drm_gem_create(struct drm_device *dev,
if (IS_ERR(exynos_gem))
return exynos_gem;
+ if (!is_drm_iommu_supported(dev) && (flags & EXYNOS_BO_NONCONTIG)) {
+ /*
+ * when no IOMMU is available, all allocated buffers are
+ * contiguous anyway, so drop EXYNOS_BO_NONCONTIG flag
+ */
+ flags &= ~EXYNOS_BO_NONCONTIG;
+ DRM_WARN("Non-contiguous allocation is not supported without IOMMU, falling back to contiguous buffer\n");
+ }
+
/* set memory type and cache attribute from user side. */
exynos_gem->flags = flags;
--
2.14.2
We added crtc_id to the atomic ioctl, but forgot to add it for vblank
and page flip events. Commit bd386e518056 ("drm: Reorganize
drm_pending_event to support future event types [v2]") added it to
the vblank event, but page flip event was still missing.
Correct this and add a test for making sure we always set crtc_id correctly.
Fixes: bd386e518056 ("drm: Reorganize drm_pending_event to support future event types [v2]")
Fixes: 5db06a8a98f5 ("drm: Pass CRTC ID in userspace vblank events")
Cc: Daniel Stone <daniels(a)collabora.com>
Cc: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
---
drivers/gpu/drm/drm_plane.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
index 19404e34cd59..37a93cdffb4a 100644
--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -1030,6 +1030,7 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
e->event.base.type = DRM_EVENT_FLIP_COMPLETE;
e->event.base.length = sizeof(e->event);
e->event.vbl.user_data = page_flip->user_data;
+ e->event.vbl.crtc_id = crtc->base.id;
ret = drm_event_reserve_init(dev, file_priv, &e->base, &e->event.base);
if (ret) {
kfree(e);
--
2.15.0
From: Jan Kara <jack(a)suse.cz>
[ Upstream commit e3fce68cdbed297d927e993b3ea7b8b1cee545da ]
Currently dax_iomap_rw() takes care of invalidating page tables and
evicting hole pages from the radix tree when write(2) to the file
happens. This invalidation is only necessary when there is some block
allocation resulting from write(2). Furthermore in current place the
invalidation is racy wrt page fault instantiating a hole page just after
we have invalidated it.
So perform the page invalidation inside dax_iomap_actor() where we can
do it only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
fs/dax.c | 28 +++++++++++-----------------
1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index bf6218da7928..800748f10b3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1265,6 +1265,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
return -EIO;
+ /*
+ * Write can allocate block for an area which has a hole page mapped
+ * into page tables. We have to tear down these mappings so that data
+ * written by write(2) is visible in mmap.
+ */
+ if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+ invalidate_inode_pages2_range(inode->i_mapping,
+ pos >> PAGE_SHIFT,
+ (end - 1) >> PAGE_SHIFT);
+ }
+
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
struct blk_dax_ctl dax = { 0 };
@@ -1329,23 +1340,6 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iov_iter_rw(iter) == WRITE)
flags |= IOMAP_WRITE;
- /*
- * Yes, even DAX files can have page cache attached to them: A zeroed
- * page is inserted into the pagecache when we have to serve a write
- * fault on a hole. It should never be dirtied and can simply be
- * dropped from the pagecache once we get real data for the page.
- *
- * XXX: This is racy against mmap, and there's nothing we can do about
- * it. We'll eventually need to shift this down even further so that
- * we can check if we allocated blocks over a hole first.
- */
- if (mapping->nrpages) {
- ret = invalidate_inode_pages2_range(mapping,
- pos >> PAGE_SHIFT,
- (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
- WARN_ON_ONCE(ret);
- }
-
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
iter, iomap_dax_actor);
--
2.11.0
Running this code with IRQs enabled (where dummy_lock is a spinlock):
static void check_load_gs_index(void)
{
/* This will fail. */
load_gs_index(0xffff);
spin_lock(&dummy_lock);
spin_unlock(&dummy_lock);
}
Will generate a lockdep warning. The issue is that the actual write
to %gs would cause an exception with IRQs disabled, and the exception
handler would, as an inadvertent side effect, update irqflag tracing
to reflect the IRQs-off status. native_load_gs_index() would then
turn IRQs back on and return with irqflag tracing still thinking that
IRQs were off. The dummy lock-and-unlock causes lockdep to notice the
error and warn.
Fix it by adding the missing tracing.
Apparently nothing did this in a context where it mattered. I haven't
tried to find a code path that would actually exhibit the warning if
appropriately nasty user code were running.
I suspect that the security impact of this bug is very, very low --
production systems don't run with lockdep enabled, and the warning is
mostly harmless anyway.
Found during a quick audit of the entry code to try to track down an
unrelated bug that Ingo found in some still-in-development code.
Cc: stable(a)vger.kernel.org
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
Hi Ingo-
You asked me to look for an irqflag tracing bug, so I found one :)
arch/x86/entry/entry_64.S | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a2b30ec69497..3c288f260fdf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -51,15 +51,19 @@ ENTRY(native_usergs_sysret64)
END(native_usergs_sysret64)
#endif /* CONFIG_PARAVIRT */
-.macro TRACE_IRQS_IRETQ
+.macro TRACE_IRQS_FLAGS flags:req
#ifdef CONFIG_TRACE_IRQFLAGS
- bt $9, EFLAGS(%rsp) /* interrupts off? */
+ bt $9, \flags /* interrupts off? */
jnc 1f
TRACE_IRQS_ON
1:
#endif
.endm
+.macro TRACE_IRQS_IRETQ
+ TRACE_IRQS_FLAGS EFLAGS(%rsp)
+.endm
+
/*
* When dynamic function tracer is enabled it will add a breakpoint
* to all locations that it is about to modify, sync CPUs, update
@@ -943,11 +947,13 @@ ENTRY(native_load_gs_index)
FRAME_BEGIN
pushfq
DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
+ TRACE_IRQS_OFF
SWAPGS
.Lgs_change:
movl %edi, %gs
2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
SWAPGS
+ TRACE_IRQS_FLAGS (%rsp)
popfq
FRAME_END
ret
--
2.13.6
During an eeh a kernel-oops is reported if no vPHB to allocated to the
AFU. This happens as during AFU init, an error in creation of vPHB is
a non-fatal error. Hence afu->phb should always be checked for NULL
before iterating over it for the virtual AFU pci devices.
This patch fixes the kenel-oops by adding a NULL pointer check for
afu->phb before it is dereferenced.
Fixes: 9e8df8a2196("cxl: EEH support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com>
---
Changelog:
Resend -> Added the 'Fixes' info and marking the patch to stable tree [Mpe]
v2 -> Added the vphb NULL check to cxl_vphb_error_detected() [Andrew]
---
drivers/misc/cxl/pci.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index bb7fd3f4edab..18773343ab3e 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -2083,6 +2083,9 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu,
/* There should only be one entry, but go through the list
* anyway
*/
+ if (afu->phb == NULL)
+ return result;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
if (!afu_dev->driver)
continue;
@@ -2124,8 +2127,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
* Tell the AFU drivers; but we don't care what they
* say, we're going away.
*/
- if (afu->phb != NULL)
- cxl_vphb_error_detected(afu, state);
+ cxl_vphb_error_detected(afu, state);
}
return PCI_ERS_RESULT_DISCONNECT;
}
@@ -2265,6 +2267,9 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev)
if (cxl_afu_select_best_mode(afu))
goto err;
+ if (afu->phb == NULL)
+ continue;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
/* Reset the device context.
* TODO: make this less disruptive
@@ -2327,6 +2332,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
+ if (afu->phb != NULL)
+ continue;
+
list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) {
if (afu_dev->driver && afu_dev->driver->err_handler &&
afu_dev->driver->err_handler->resume)
--
2.14.3
On Wed, Nov 15, 2017 at 03:39:22PM +0000, Moore, Robert wrote:
>> -----Original Message-----
>> From: alexander.levin(a)verizon.com [mailto:alexander.levin@verizon.com]
>> Sent: Tuesday, November 14, 2017 6:46 PM
>> To: linux-kernel(a)vger.kernel.org; stable(a)vger.kernel.org
>> Cc: Moore, Robert <robert.moore(a)intel.com>; Zheng, Lv
>> <lv.zheng(a)intel.com>; Wysocki, Rafael J <rafael.j.wysocki(a)intel.com>;
>> alexander.levin(a)verizon.com
>> Subject: [PATCH AUTOSEL for 4.9 01/56] ACPICA: Resources: Not a valid
>> resource if buffer length too long
>>
>> From: Bob Moore <robert.moore(a)intel.com>
>>
>> [ Upstream commit 57707a9a7780fab426b8ae9b4c7b65b912a748b3 ]
>>
>> ACPICA commit 9f76de2d249b18804e35fb55d14b1c2604d627a1
>> ACPICA commit b2e89d72ef1e9deefd63c3fd1dee90f893575b3a
>> ACPICA commit 23b5bbe6d78afd3c5abf3adb91a1b098a3000b2e
>>
>> The declared buffer length must be the same as the length of the byte
>> initializer list, otherwise not a valid resource descriptor.
[snip]
>[Moore, Robert]
>
>Please explain what you are doing here.
Proposing this commit for the 4.9 LTS tree.
--
Thanks,
Sasha
From: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
[ Upstream commit 657279778af54f35e54b07b6687918f254a2992c ]
OMAP1510, OMAP5910 and OMAP310 have only 9 logical channels.
OMAP1610, OMAP5912, OMAP1710, OMAP730, and OMAP850 have 16 logical channels
available.
The wired 17 for the lch_count must have been used to cover the 16 + 1
dedicated LCD channel, in reality we can only use 9 or 16 channels.
The d->chan_count is not used by the omap-dma stack, so we can skip the
setup. chan_count was configured to the number of logical channels and not
the actual number of physical channels anyways.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
Acked-by: Aaro Koskinen <aaro.koskinen(a)iki.fi>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
arch/arm/mach-omap1/dma.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/arch/arm/mach-omap1/dma.c b/arch/arm/mach-omap1/dma.c
index 4be601b638d7..8129e5f9c94d 100644
--- a/arch/arm/mach-omap1/dma.c
+++ b/arch/arm/mach-omap1/dma.c
@@ -31,7 +31,6 @@
#include <mach/irqs.h>
#define OMAP1_DMA_BASE (0xfffed800)
-#define OMAP1_LOGICAL_DMA_CH_COUNT 17
static u32 enable_1510_mode;
@@ -311,8 +310,6 @@ static int __init omap1_system_dma_init(void)
goto exit_iounmap;
}
- d->lch_count = OMAP1_LOGICAL_DMA_CH_COUNT;
-
/* Valid attributes for omap1 plus processors */
if (cpu_is_omap15xx())
d->dev_caps = ENABLE_1510_MODE;
@@ -329,13 +326,14 @@ static int __init omap1_system_dma_init(void)
d->dev_caps |= CLEAR_CSR_ON_READ;
d->dev_caps |= IS_WORD_16;
- if (cpu_is_omap15xx())
- d->chan_count = 9;
- else if (cpu_is_omap16xx() || cpu_is_omap7xx()) {
- if (!(d->dev_caps & ENABLE_1510_MODE))
- d->chan_count = 16;
+ /* available logical channels */
+ if (cpu_is_omap15xx()) {
+ d->lch_count = 9;
+ } else {
+ if (d->dev_caps & ENABLE_1510_MODE)
+ d->lch_count = 9;
else
- d->chan_count = 9;
+ d->lch_count = 16;
}
p = dma_plat_info;
--
2.11.0
From: Florian Fainelli <f.fainelli(a)gmail.com>
[ Upstream commit bb7da333d0a9f3bddc08f84187b7579a3f68fd24 ]
Since we need to pad our packets, utilize skb_put_padto() which
increases skb->len by how much we need to pad, allowing us to eliminate
the test on skb->len right below.
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
drivers/net/ethernet/broadcom/bcmsysport.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 8860e74aa28f..fae1a1ff53ab 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1061,13 +1061,12 @@ static netdev_tx_t bcm_sysport_xmit(struct sk_buff *skb,
* (including FCS and tag) because the length verification is done after
* the Broadcom tag is stripped off the ingress packet.
*/
- if (skb_padto(skb, ETH_ZLEN + ENET_BRCM_TAG_LEN)) {
+ if (skb_put_padto(skb, ETH_ZLEN + ENET_BRCM_TAG_LEN)) {
ret = NETDEV_TX_OK;
goto out;
}
- skb_len = skb->len < ETH_ZLEN + ENET_BRCM_TAG_LEN ?
- ETH_ZLEN + ENET_BRCM_TAG_LEN : skb->len;
+ skb_len = skb->len;
mapping = dma_map_single(kdev, skb->data, skb_len, DMA_TO_DEVICE);
if (dma_mapping_error(kdev, mapping)) {
--
2.11.0
From: Bob Moore <robert.moore(a)intel.com>
[ Upstream commit 57707a9a7780fab426b8ae9b4c7b65b912a748b3 ]
ACPICA commit 9f76de2d249b18804e35fb55d14b1c2604d627a1
ACPICA commit b2e89d72ef1e9deefd63c3fd1dee90f893575b3a
ACPICA commit 23b5bbe6d78afd3c5abf3adb91a1b098a3000b2e
The declared buffer length must be the same as the length of the
byte initializer list, otherwise not a valid resource descriptor.
Link: https://github.com/acpica/acpica/commit/9f76de2d
Link: https://github.com/acpica/acpica/commit/b2e89d72
Link: https://github.com/acpica/acpica/commit/23b5bbe6
Signed-off-by: Bob Moore <robert.moore(a)intel.com>
Signed-off-by: Lv Zheng <lv.zheng(a)intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
drivers/acpi/acpica/utresrc.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/acpica/utresrc.c b/drivers/acpi/acpica/utresrc.c
index 1de3376da66a..2ad99ea3d496 100644
--- a/drivers/acpi/acpica/utresrc.c
+++ b/drivers/acpi/acpica/utresrc.c
@@ -421,8 +421,10 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
ACPI_FUNCTION_TRACE(ut_walk_aml_resources);
- /* The absolute minimum resource template is one end_tag descriptor */
-
+ /*
+ * The absolute minimum resource template is one end_tag descriptor.
+ * However, we will treat a lone end_tag as just a simple buffer.
+ */
if (aml_length < sizeof(struct aml_resource_end_tag)) {
return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG);
}
@@ -454,9 +456,8 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
/* Invoke the user function */
if (user_function) {
- status =
- user_function(aml, length, offset, resource_index,
- context);
+ status = user_function(aml, length, offset,
+ resource_index, context);
if (ACPI_FAILURE(status)) {
return_ACPI_STATUS(status);
}
@@ -480,6 +481,12 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
*context = aml;
}
+ /* Check if buffer is defined to be longer than the resource length */
+
+ if (aml_length > (offset + length)) {
+ return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG);
+ }
+
/* Normal exit */
return_ACPI_STATUS(AE_OK);
--
2.11.0
This is the start of the stable review cycle for the 3.18.84 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Nov 24 10:10:45 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.84-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.84-rc1
Jan Harkes <jaharkes(a)cs.cmu.edu>
coda: fix 'kernel memory exposure attempt' in fsync
Corey Minyard <cminyard(a)mvista.com>
ipmi: fix unsigned long underflow
alex chen <alex.chen(a)huawei.com>
ocfs2: should wait dio before inode lock in ocfs2_setattr()
Roberto Sassu <roberto.sassu(a)huawei.com>
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
Cong Wang <xiyou.wangcong(a)gmail.com>
vlan: fix a use-after-free in vlan_device_event()
Jason A. Donenfeld <Jason(a)zx2c4.com>
af_netlink: ensure that NLMSG_DONE never fails in dumps
Huacai Chen <chenhc(a)lemote.com>
fealnx: Fix building error on MIPS
Xin Long <lucien.xin(a)gmail.com>
sctp: do not peel off an assoc from one netns to another one
Ye Yin <hustcat(a)gmail.com>
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
Eric Dumazet <edumazet(a)google.com>
tcp: do not mangle skb->cb[] in tcp_make_synack()
Eric W. Biederman <ebiederm(a)xmission.com>
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
WANG Cong <xiyou.wangcong(a)gmail.com>
ipv6/dccp: do not inherit ipv6_mc_list from parent
-------------
Diffstat:
Makefile | 4 ++--
drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++----
drivers/net/ethernet/fealnx.c | 6 +++---
fs/coda/upcall.c | 3 +--
fs/ocfs2/file.c | 9 +++++++--
include/linux/skbuff.h | 7 +++++++
net/8021q/vlan.c | 6 +++---
net/core/skbuff.c | 1 +
net/dccp/ipv6.c | 7 +++++++
net/ipv4/tcp_output.c | 9 ++-------
net/ipv6/tcp_ipv6.c | 2 ++
net/netlink/af_netlink.c | 17 +++++++++++------
net/netlink/af_netlink.h | 1 +
net/sctp/ipv6.c | 2 ++
net/sctp/socket.c | 4 ++++
security/integrity/ima/ima_appraise.c | 3 +++
16 files changed, 62 insertions(+), 29 deletions(-)
This is the start of the stable review cycle for the 4.14.1 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue Nov 21 14:59:32 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.1-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.1-rc1
Johan Hovold <johan(a)kernel.org>
spi: fix use-after-free at controller deregistration
Hans de Goede <hdegoede(a)redhat.com>
staging: rtl8188eu: Revert 4 commits breaking ARP
Hans de Goede <hdegoede(a)redhat.com>
staging: vboxvideo: Fix reporting invalid suggested-offset-properties
Johan Hovold <johan(a)kernel.org>
staging: greybus: spilib: fix use-after-free after deregistration
Gilad Ben-Yossef <gilad(a)benyossef.com>
staging: ccree: fix 64 bit scatter/gather DMA ops
Huacai Chen <chenhc(a)lemote.com>
staging: sm750fb: Fix parameter mistake in poke32
Aditya Shankar <aditya.shankar(a)microchip.com>
staging: wilc1000: Fix bssid buffer offset in Txq
Bjorn Andersson <bjorn.andersson(a)linaro.org>
rpmsg: glink: Add missing MODULE_LICENSE
Jason Gerecke <killertofu(a)gmail.com>
HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
HID: cp2112: add HIDRAW dependency
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: peaq_wmi: Fix missing terminating entry for peaq_dmi_table
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: peaq-wmi: Add DMI check before binding to the WMI interface
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/MCE/AMD: Always give panic severity for UC errors in kernel context
Andy Lutomirski <luto(a)kernel.org>
selftests/x86/protection_keys: Fix syscall NR redefinition warnings
Johan Hovold <johan(a)kernel.org>
USB: serial: garmin_gps: fix memory leak on probe errors
Johan Hovold <johan(a)kernel.org>
USB: serial: garmin_gps: fix I/O after failed probe and remove
Douglas Fischer <douglas.fischer(a)outlook.com>
USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update
Lu Baolu <baolu.lu(a)linux.intel.com>
USB: serial: Change DbC debug device binding ID
Johan Hovold <johan(a)kernel.org>
USB: serial: metro-usb: stop I/O after failed open
Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
usb: gadget: f_fs: Fix use-after-free in ffs_free_inst
Bernhard Rosenkraenzer <bernhard.rosenkranzer(a)linaro.org>
USB: Add delay-init quirk for Corsair K70 LUX keyboards
Alan Stern <stern(a)rowland.harvard.edu>
USB: usbfs: compute urb->actual_length for isochronous
Lu Baolu <baolu.lu(a)linux.intel.com>
USB: early: Use new USB product ID and strings for DbC device
raveendra padasalagi <raveendra.padasalagi(a)broadcom.com>
crypto: brcm - Explicity ACK mailbox message
Eric Biggers <ebiggers(a)google.com>
crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
Eric Biggers <ebiggers(a)google.com>
crypto: dh - Don't permit 'p' to be 0
Eric Biggers <ebiggers(a)google.com>
crypto: dh - Fix double free of ctx->p
Andrey Konovalov <andreyknvl(a)google.com>
media: dib0700: fix invalid dvb_detach argument
Arvind Yadav <arvind.yadav.cs(a)gmail.com>
media: imon: Fix null-ptr-deref in imon_probe
Adam Wallis <awallis(a)codeaurora.org>
dmaengine: dmatest: warn user when dma test times out
Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
-------------
Diffstat:
Makefile | 4 +-
arch/x86/kernel/cpu/mcheck/mce-severity.c | 7 +-
crypto/dh.c | 33 ++++-----
crypto/dh_helper.c | 16 ++++
drivers/crypto/bcm/cipher.c | 101 ++++++++++++--------------
drivers/dma/dmatest.c | 1 +
drivers/edac/sb_edac.c | 9 ++-
drivers/hid/Kconfig | 2 +-
drivers/hid/wacom_wac.h | 1 +
drivers/media/rc/imon.c | 5 ++
drivers/media/usb/dvb-usb/dib0700_devices.c | 24 +++---
drivers/platform/x86/peaq-wmi.c | 19 +++++
drivers/rpmsg/qcom_glink_native.c | 3 +
drivers/spi/spi.c | 5 +-
drivers/staging/ccree/cc_lli_defs.h | 2 +-
drivers/staging/greybus/spilib.c | 8 +-
drivers/staging/rtl8188eu/core/rtw_recv.c | 83 ++++++++++++---------
drivers/staging/rtl8188eu/os_dep/mon.c | 34 ++-------
drivers/staging/sm750fb/ddk750_chip.h | 2 +-
drivers/staging/vboxvideo/vbox_drv.h | 8 +-
drivers/staging/vboxvideo/vbox_irq.c | 4 +-
drivers/staging/vboxvideo/vbox_mode.c | 26 +++++--
drivers/staging/wilc1000/wilc_wlan.c | 2 +-
drivers/usb/core/devio.c | 14 ++++
drivers/usb/core/quirks.c | 3 +
drivers/usb/early/xhci-dbc.h | 6 +-
drivers/usb/gadget/function/f_fs.c | 1 +
drivers/usb/serial/garmin_gps.c | 22 +++++-
drivers/usb/serial/metro-usb.c | 11 ++-
drivers/usb/serial/qcserial.c | 1 +
drivers/usb/serial/usb_debug.c | 4 +-
tools/testing/selftests/x86/protection_keys.c | 24 ++++--
32 files changed, 289 insertions(+), 196 deletions(-)
Kthread function bch_allocator_thread() references allocator_wait(ca, cond)
and when kthread_should_stop() is true, this kthread exits.
The problem is, if kthread_should_stop() is true, macro allocator_wait()
calls "return 0" with current task state TASK_INTERRUPTIBLE. After function
bch_allocator_thread() returns to do_exit(), there are some blocking
operations are called, then a kenrel warning is popped up by __might_sleep
from kernel/sched/core.c,
"WARNING: do not call blocking ops when !TASK_RUNNING; state=1 set at [xxxx]"
If the task is interrupted and preempted out, since its status is
TASK_INTERRUPTIBLE, it means scheduler won't pick it back to run forever,
and the allocator thread may hang in do_exit().
This patch sets allocator kthread state back to TASK_RUNNING before it
returns to do_exit(), which avoids a potential deadlock.
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/alloc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index a27d85232ce1..996ebbabd819 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -286,9 +286,12 @@ do { \
if (cond) \
break; \
\
+ \
mutex_unlock(&(ca)->set->bucket_lock); \
- if (kthread_should_stop()) \
+ if (kthread_should_stop()) { \
+ __set_current_state(TASK_RUNNING); \
return 0; \
+ } \
\
schedule(); \
mutex_lock(&(ca)->set->bucket_lock); \
--
2.13.6
This is the start of the stable review cycle for the 4.4.100 release.
There are 59 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue Nov 21 14:31:34 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.100-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.100-rc1
Johan Hovold <johan(a)kernel.org>
USB: serial: garmin_gps: fix memory leak on probe errors
Johan Hovold <johan(a)kernel.org>
USB: serial: garmin_gps: fix I/O after failed probe and remove
Douglas Fischer <douglas.fischer(a)outlook.com>
USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update
Bernhard Rosenkraenzer <bernhard.rosenkranzer(a)linaro.org>
USB: Add delay-init quirk for Corsair K70 LUX keyboards
Alan Stern <stern(a)rowland.harvard.edu>
USB: usbfs: compute urb->actual_length for isochronous
Dmitry V. Levin <ldv(a)altlinux.org>
uapi: fix linux/rds.h userspace compilation errors
Dmitry V. Levin <ldv(a)altlinux.org>
uapi: fix linux/rds.h userspace compilation error
Sasha Levin <alexander.levin(a)verizon.com>
Revert "uapi: fix linux/rds.h userspace compilation errors"
Sasha Levin <alexander.levin(a)verizon.com>
Revert "crypto: xts - Add ECB dependency"
Paul Burton <paul.burton(a)imgtec.com>
MIPS: Netlogic: Exclude netlogic,xlp-pic code from XLR builds
Marcin Nowakowski <marcin.nowakowski(a)imgtec.com>
MIPS: init: Ensure reserved memory regions are not added to bootmem
Marcin Nowakowski <marcin.nowakowski(a)imgtec.com>
MIPS: init: Ensure bootmem does not corrupt reserved memory
Paul Burton <paul.burton(a)imgtec.com>
MIPS: End asm function prologue macros with .insn
Jannik Becher <becher.jannik(a)gmail.com>
staging: rtl8712: fixed little endian problem
Emil Tantilov <emil.s.tantilov(a)intel.com>
ixgbe: do not disable FEC from the driver
Emil Tantilov <emil.s.tantilov(a)intel.com>
ixgbe: add mask for 64 RSS queues
Tony Nguyen <anthony.l.nguyen(a)intel.com>
ixgbe: Reduce I2C retry count on X550 devices
Emil Tantilov <emil.s.tantilov(a)intel.com>
ixgbe: handle close/suspend race with netif_device_detach/present
Emil Tantilov <emil.s.tantilov(a)intel.com>
ixgbe: fix AER error handling
Jon Mason <jon.mason(a)broadcom.com>
arm64: dts: NS2: reserve memory for Nitro firmware
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Add new codec ID ALC299
Arvind Yadav <arvind.yadav.cs(a)gmail.com>
gpu: drm: mgag200: mgag200_main:- Handle error from pci_iomap
Alexey Khoroshilov <khoroshilov(a)ispras.ru>
backlight: adp5520: Fix error handling in adp5520_bl_probe()
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
backlight: lcd: Fix race condition during register
Takashi Iwai <tiwai(a)suse.de>
ALSA: vx: Fix possible transfer overflow
Takashi Iwai <tiwai(a)suse.de>
ALSA: vx: Don't try to update capture stream before running
James Smart <james.smart(a)broadcom.com>
scsi: lpfc: Clear the VendorVersion in the PLOGI/PLOGI ACC payload
James Smart <james.smart(a)broadcom.com>
scsi: lpfc: Correct issue leading to oops during link reset
James Smart <james.smart(a)broadcom.com>
scsi: lpfc: Correct host name in symbolic_name field
James Smart <james.smart(a)broadcom.com>
scsi: lpfc: FCoE VPort enable-disable does not bring up the VPort
James Smart <james.smart(a)broadcom.com>
scsi: lpfc: Add missing memory barrier
Galo Navarro <anglorvaroa(a)gmail.com>
staging: rtl8188eu: fix incorrect ERROR tags from logs
subhashj(a)codeaurora.org <subhashj(a)codeaurora.org>
scsi: ufs: add capability to keep auto bkops always enabled
Javier Martinez Canillas <javier(a)osg.samsung.com>
scsi: ufs-qcom: Fix module autoload
Hannu Lounento <hannu.lounento(a)ge.com>
igb: Fix hw_dbg logging in igb_update_flash_i210
Todd Fujinaka <todd.fujinaka(a)intel.com>
igb: close/suspend race in netif_device_detach
Aaron Sierra <asierra(a)xes-inc.com>
igb: reset the PHY before reading the PHY ID
Arvind Yadav <arvind.yadav.cs(a)gmail.com>
drm/sti: sti_vtg: Handle return NULL error from devm_ioremap_nocache
Geert Uytterhoeven <geert(a)linux-m68k.org>
ata: SATA_MV should depend on HAS_DMA
Geert Uytterhoeven <geert(a)linux-m68k.org>
ata: SATA_HIGHBANK should depend on HAS_DMA
Geert Uytterhoeven <geert(a)linux-m68k.org>
ata: ATA_BMDMA should depend on HAS_DMA
Tony Lindgren <tony(a)atomide.com>
ARM: dts: Fix omap3 off mode pull defines
Tony Lindgren <tony(a)atomide.com>
ARM: OMAP2+: Fix init for multiple quirks for the same SoC
Tony Lindgren <tony(a)atomide.com>
ARM: dts: Fix am335x and dm814x scm syscon to probe children
Tony Lindgren <tony(a)atomide.com>
ARM: dts: Fix compatible for ti81xx uarts for 8250
Ngai-Mint Kwan <ngai-mint.kwan(a)intel.com>
fm10k: request reset when mbx->state changes
Roger Quadros <rogerq(a)ti.com>
extcon: palmas: Check the parent instance to prevent the NULL
Adam Wallis <awallis(a)codeaurora.org>
dmaengine: dmatest: warn user when dma test times out
Leif Liddy <leif.linux(a)gmail.com>
Bluetooth: btusb: fix QCA Rome suspend/resume
Eric Biggers <ebiggers(a)google.com>
arm: crypto: reduce priority of bit-sliced AES cipher
Bjørn Mork <bjorn(a)mork.no>
net: qmi_wwan: fix divide by 0 on bad descriptors
Bjørn Mork <bjorn(a)mork.no>
net: cdc_ether: fix divide by 0 on bad descriptors
Xin Long <lucien.xin(a)gmail.com>
sctp: do not peel off an assoc from one netns to another one
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't leak stack data via response ring
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: don't let ldimm64 leak map addresses on unprivileged
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: x86: fix singlestepping over syscall
Jan Kara <jack(a)suse.cz>
ext4: fix data exposure after a crash
Andrey Konovalov <andreyknvl(a)google.com>
media: dib0700: fix invalid dvb_detach argument
Arvind Yadav <arvind.yadav.cs(a)gmail.com>
media: imon: Fix null-ptr-deref in imon_probe
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/am33xx.dtsi | 3 +-
arch/arm/boot/dts/dm814x.dtsi | 9 ++-
arch/arm/boot/dts/dm816x.dtsi | 6 +-
arch/arm/crypto/aesbs-glue.c | 6 +-
arch/arm/mach-omap2/pdata-quirks.c | 1 -
arch/arm64/boot/dts/broadcom/ns2.dtsi | 2 +
arch/mips/include/asm/asm.h | 10 ++-
arch/mips/kernel/setup.c | 78 +++++++++++++++++++-
arch/mips/netlogic/common/irq.c | 4 +-
arch/x86/include/asm/kvm_emulate.h | 1 +
arch/x86/kvm/emulate.c | 1 +
arch/x86/kvm/x86.c | 52 ++++++-------
crypto/Kconfig | 1 -
drivers/ata/Kconfig | 3 +
drivers/block/xen-blkback/blkback.c | 23 +++---
drivers/block/xen-blkback/common.h | 25 ++-----
drivers/bluetooth/btusb.c | 6 ++
drivers/dma/dmatest.c | 1 +
drivers/extcon/extcon-palmas.c | 5 ++
drivers/gpu/drm/mgag200/mgag200_main.c | 2 +
drivers/gpu/drm/sti/sti_vtg.c | 4 +
drivers/media/rc/imon.c | 5 ++
drivers/media/usb/dvb-usb/dib0700_devices.c | 24 +++---
drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 10 ++-
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 6 +-
drivers/net/ethernet/intel/igb/e1000_82575.c | 11 +++
drivers/net/ethernet/intel/igb/e1000_i210.c | 4 +-
drivers/net/ethernet/intel/igb/igb_main.c | 21 +++---
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 8 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 +++---
drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c | 4 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 2 -
drivers/net/usb/cdc_ether.c | 2 +-
drivers/net/usb/qmi_wwan.c | 2 +-
drivers/scsi/lpfc/lpfc_attr.c | 17 +++++
drivers/scsi/lpfc/lpfc_els.c | 6 ++
drivers/scsi/lpfc/lpfc_hw.h | 6 ++
drivers/scsi/lpfc/lpfc_sli.c | 3 +
drivers/scsi/lpfc/lpfc_vport.c | 8 ++
drivers/scsi/ufs/ufs-qcom.c | 1 +
drivers/scsi/ufs/ufshcd.c | 33 ++++++---
drivers/scsi/ufs/ufshcd.h | 13 ++++
drivers/staging/rtl8188eu/include/rtw_debug.h | 2 +-
drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 2 +-
drivers/usb/core/devio.c | 14 ++++
drivers/usb/core/quirks.c | 3 +
drivers/usb/serial/garmin_gps.c | 22 +++++-
drivers/usb/serial/qcserial.c | 1 +
drivers/video/backlight/adp5520_bl.c | 12 ++-
drivers/video/backlight/lcd.c | 4 +-
fs/ext4/inode.c | 23 +++---
include/dt-bindings/pinctrl/omap.h | 4 +-
include/uapi/linux/rds.h | 102 +++++++++++++-------------
kernel/bpf/verifier.c | 21 ++++--
net/sctp/socket.c | 4 +
sound/drivers/vx/vx_pcm.c | 8 +-
sound/pci/hda/patch_realtek.c | 10 +++
sound/pci/vx222/vx222_ops.c | 12 +--
sound/pcmcia/vx/vxp_ops.c | 12 +--
60 files changed, 481 insertions(+), 231 deletions(-)
On Wed 22-11-17 10:09:13, Zi Yan wrote:
>
>
> Michal Hocko wrote:
> > On Wed 22-11-17 09:43:46, Zi Yan wrote:
> >>
> >> Michal Hocko wrote:
[...]
> >>> but why is unsafe to enable the feature on other arches which support
> >>> THP? Is there any plan to do the next step and remove this config
> >>> option?
> >> Because different architectures have their own way of specifying a swap
> >> entry. This means, to support THP migration, each architecture needs to
> >> add its own __pmd_to_swp_entry() and __swp_entry_to_pmd(), which are
> >> used for arch-independent pmd_to_swp_entry() and swp_entry_to_pmd().
> >
> > I understand that part. But this smells like a matter of coding, no?
> > I was suprised to see the note about safety which didn't make much sense
> > to me.
>
> And testing as well. I had powerpc book3s support in my initial patch
> submission, but removed it because I do not have access to the powerpc
> machine any more. I also tried ARM64, which seems working by adding the
> code, but I have no hardware to test it now.
>
> Any suggestions?
Cc arch maintainers and mailing lists?
--
Michal Hocko
SUSE Labs
This is the start of the stable review cycle for the 3.2.96 release.
There are 61 patches in this series, which will be posted as responses
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Nov 24 20:00:00 UTC 2017.
Anything received after that time might be too late.
A combined patch relative to 3.2.95 will be posted as an additional
response to this. A shortlog and diffstat can be found below.
Ben.
-------------
Aleksandr Bezzubikov (1):
PCI: shpchp: Enable bridge bus mastering if MSI is enabled
[48b79a14505349a29b3e20f03619ada9b33c4b17]
Amir Goldstein (1):
xfs: fix incorrect log_flushed on fsync
[47c7d0b19502583120c3f396c7559e7a77288a68]
Andrey Korolyov (1):
cs5536: add support for IDE controller variant
[591b6bb605785c12a21e8b07a08a277065b655a5]
Andy Lutomirski (1):
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
[9584d98bed7a7a904d0702ad06bbcc94703cb5b4]
Arvind Yadav (1):
media: imon: Fix null-ptr-deref in imon_probe
[58fd55e838276a0c13d1dc7c387f90f25063cbf3]
Bart Van Assche (1):
block: Relax a check in blk_start_queue()
[4ddd56b003f251091a67c15ae3fe4a5c5c5e390a]
Ben Hutchings (1):
mac80211: Fix null dereference in ieee80211_key_link()
[not upstream; fixes a regression specific to 3.2-stable]
Benjamin Block (1):
scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
[a099b7b1fc1f0418ab8d79ecf98153e1e134656e]
Bjørn Mork (1):
net: cdc_ether: fix divide by 0 on bad descriptors
[2cb80187ba065d7decad7c6614e35e07aec8a974]
Brian King (1):
scsi: aacraid: Fix command send race condition
[1ae948fa4f00f3a2823e7cb19a3049ef27dd6947]
Cameron Gutman (2):
Input: xpad - don't depend on endpoint order
[c01b5e7464f0cf20936d7467c7528163c4e2782d]
Input: xpad - validate USB endpoint type during probe
[122d6a347329818419b032c5a1776e6b3866d9b9]
Chad Dupuis (1):
[SCSI] qla2xxx: Add mutex around optrom calls to serialize accesses.
[7a8ab9c840b5dff9bb70328338a86444ed1c2415]
Christophe JAILLET (1):
driver core: bus: Fix a potential double free
[0f9b011d3321ca1079c7a46c18cb1956fbdb7bcb]
Colin Ian King (1):
media: em28xx: calculate left volume level correctly
[801e3659bf2c87c31b7024087d61e89e172b5651]
Dan Carpenter (2):
powerpc/44x: Fix mask and shift to zero bug
[8d046759f6ad75824fdf7b9c9a3da0272ea9ea92]
scsi: qla2xxx: Fix an integer overflow in sysfs code
[e6f77540c067b48dee10f1e33678415bfcc89017]
Dmitry Fleytman (1):
usb: Add device quirk for Logitech HD Pro Webcam C920-C
[a1279ef74eeeb5f627f091c71d80dd7ac766c99d]
Dmitry Torokhov (1):
Input: gtco - fix potential out-of-bound access
[a50829479f58416a013a4ccca791336af3c584c7]
Douglas Anderson (1):
USB: core: Avoid race of async_completed() w/ usbdev_release()
[ed62ca2f4f51c17841ea39d98c0c409cb53a3e10]
Edwin Török (1):
dlm: avoid double-free on error path in dlm_device_{register,unregister}
[55acdd926f6b21a5cdba23da98a48aedf19ac9c3]
Eric Dumazet (1):
ipv6: fix typo in fib6_net_exit()
[32a805baf0fb70b6dbedefcd7249ac7f580f9e3b]
Eric W. Biederman (1):
fcntl: Don't use ambiguous SIG_POLL si_codes
[d08477aa975e97f1dc64c0ae59cebf98520456ce]
Eryu Guan (1):
ext4: validate s_first_meta_bg at mount time
[3a4b77cd47bb837b8557595ec7425f281f2ca1fe]
Finn Thain (1):
scsi: mac_esp: Fix PIO transfers for MESSAGE IN phase
[7640d91d285893a5cf1e62b2cd00f0884c401d93]
Guenter Roeck (1):
media: uvcvideo: Prevent heap overflow when accessing mapped controls
[7e09f7d5c790278ab98e5f2c22307ebe8ad6e8ba]
Guillaume Nault (2):
l2tp: pass tunnel pointer to ->session_create()
[f026bc29a8e093edfbb2a77700454b285c97e8ad]
l2tp: prevent creation of sessions on terminated tunnels
[f3c66d4e144a0904ea9b95d23ed9f8eb38c11bfb]
Guillermo A. Amaral (1):
Input: xpad - add a few new VID/PID combinations
[540602a43ae5fa94064f8fae100f5ca75d4c002b]
Jan H . Schönherr (1):
KVM: SVM: Add a missing 'break' statement
[49a8afca386ee1775519a4aa80f8e121bd227dd4]
Joe Carnuccio (1):
[SCSI] qla2xxx: Corrections to returned sysfs error codes.
[71dfe9e776878d9583d004edade55edc2bdac5eb]
Johan Hovold (2):
USB: serial: console: fix use-after-free after failed setup
[299d7572e46f98534033a9e65973f13ad1ce9047]
[media] cx231xx-cards: fix NULL-deref on missing association descriptor
[6c3b047fa2d2286d5e438bcb470c7b1a49f415f6]
Johannes Berg (1):
mac80211: don't compare TKIP TX MIC key in reinstall prevention
[cfbb0d90a7abb289edc91833d0905931f8805f12]
Jonas Gorski (2):
MIPS: AR7: allow NULL clock for clk_get_rate
[585e0e9d02a690c29932b2fc0789835c7b91d448]
MIPS: BCM63XX: allow NULL clock for clk_get_rate
[1b495faec231980b6c719994b24044ccc04ae06c]
Kai-Heng Feng (2):
Input: i8042 - add Gigabyte P57 to the keyboard reset table
[697c5d8a36768b36729533fb44622b35d56d6ad0]
usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard
[de3af5bf259d7a0bfaac70441c8568ab5998d80c]
Leon Romanovsky (1):
net/mlx4_core: Make explicit conversion to 64bit value
[187782eb58a89ea030731114c6ae37842a4472fe]
Mike Marciniszyn (1):
IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation
[5b0ef650bd0f820e922fcc42f1985d4621ae19cf]
Nisar Sayed (1):
smsc95xx: Configure pause time to 0xffff when tx flow control enabled
[9c0827317f235865ae421293f8aecf6cb327a63e]
Noa Osherovich (1):
IB/core: Fix the validations of a multicast LID in attach or detach operations
[5236333592244557a19694a51337df6ac018f0a7]
Oleg Nesterov (1):
signal: move the "sig < SIGRTMIN" check into siginmask(sig)
[5c8ccefdf46c5f87d87b694c7fbc04941c2c99a5]
Paul Mackerras (1):
powerpc: Correct instruction code for xxlor instruction
[93b2d3cf3733b4060d3623161551f51ea1ab5499]
Rui Teng (1):
powerpc/mm: Fix check of multiple 16G pages from device tree
[23493c121912a39f0262e0dbeb236e1d39efa4d5]
Sabrina Dubroca (1):
ipv6: fix memory leak with multiple tables during netns destruction
[ba1cc08d9488c94cb8d94f545305688b72a2a300]
Sean Young (1):
media: lirc_zilog: driver only sends LIRCCODE
[89d8a2cc51d1f29ea24a0b44dde13253141190a0]
SeongJae Park (1):
mm/vmstat.c: fix wrong comment
[f113e64121ba9f4791332248b315d9f57ee33a6b]
Steffen Maier (6):
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
[975171b4461be296a35e83ebd748946b81cf0635]
scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
[1a5d999ebfc7bfe28deb48931bb57faa8e4102b6]
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
[9fe5d2b2fd30aa8c7827ec62cbbe6d30df4fe3e3]
scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
[12c3e5754c8022a4f2fd1e9f00d19e99ee0d3cc1]
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
[71b8e45da51a7b64a23378221c0a5868bd79da4f]
scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
[fdb7cee3b9e3c561502e58137a837341f10cbf8b]
Steven Rostedt (1):
ftrace: Fix selftest goto location on error
[46320a6acc4fb58f04bcf78c4c942cc43b20f986]
Ted Mielczarek (1):
Input: xpad - add support for Xbox One controllers
[1a48ff81b3912be5fadae3fafde6c2f632246a4c]
Theodore Ts'o (1):
ext4: fix fencepost in s_first_meta_bg validation
[2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2]
Thomas Gleixner (1):
genirq: Make sparse_irq_lock protect what it should protect
[12ac1d0f6c3e95732d144ffa65c8b20fbd9aa462]
Wanpeng Li (1):
KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
[9a6e7c39810e4a8bc7fc95056cefb40583fe07ef]
Xiangliang.Yu (1):
drm/ttm: Fix accounting error when fail to get pages for pool
[9afae2719273fa1d406829bf3498f82dbdba71c7]
Xin Long (1):
sctp: do not peel off an assoc from one netns to another one
[df80cd9b28b9ebaa284a41df611dbf3a2d05ca74]
Makefile | 4 +-
arch/mips/ar7/clock.c | 3 +
arch/mips/bcm63xx/clk.c | 3 +
arch/powerpc/boot/4xx.c | 2 +-
arch/powerpc/include/asm/ppc-opcode.h | 2 +-
arch/powerpc/mm/hash_utils_64.c | 2 +-
arch/x86/include/asm/elf.h | 5 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/x86.c | 34 ++++-
block/blk-core.c | 2 +-
drivers/ata/pata_amd.c | 1 +
drivers/ata/pata_cs5536.c | 1 +
drivers/base/bus.c | 2 +-
drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +-
drivers/infiniband/core/verbs.c | 44 +++++-
drivers/infiniband/hw/qib/qib_rc.c | 3 +-
drivers/input/joystick/xpad.c | 218 ++++++++++++++++++++++++----
drivers/input/serio/i8042-x86ia64io.h | 7 +
drivers/input/tablet/gtco.c | 17 ++-
drivers/media/rc/imon.c | 5 +
drivers/media/video/cx231xx/cx231xx-cards.c | 2 +-
drivers/media/video/em28xx/em28xx-audio.c | 2 +-
drivers/media/video/uvc/uvc_ctrl.c | 7 +
drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
drivers/net/usb/cdc_ether.c | 5 +-
drivers/net/usb/smsc95xx.c | 11 +-
drivers/pci/hotplug/shpchp_hpc.c | 2 +
drivers/s390/scsi/zfcp_dbf.c | 31 +++-
drivers/s390/scsi/zfcp_dbf.h | 13 +-
drivers/s390/scsi/zfcp_fc.h | 6 +-
drivers/s390/scsi/zfcp_fsf.c | 7 +-
drivers/s390/scsi/zfcp_scsi.c | 16 +-
drivers/scsi/aacraid/aachba.c | 48 +++---
drivers/scsi/mac_esp.c | 35 ++---
drivers/scsi/qla2xxx/qla_attr.c | 71 ++++++---
drivers/scsi/qla2xxx/qla_bsg.c | 12 +-
drivers/scsi/qla2xxx/qla_def.h | 1 +
drivers/scsi/qla2xxx/qla_os.c | 1 +
drivers/staging/media/lirc/lirc_zilog.c | 8 +-
drivers/usb/core/devio.c | 4 +-
drivers/usb/core/quirks.c | 6 +-
drivers/usb/serial/console.c | 1 +
fs/dlm/user.c | 4 +
fs/ext4/super.c | 9 ++
fs/fcntl.c | 13 +-
fs/xfs/xfs_log.c | 7 -
include/asm-generic/siginfo.h | 4 +-
include/linux/pci_ids.h | 1 +
include/linux/signal.h | 24 +--
kernel/irq/irqdesc.c | 24 +--
kernel/trace/trace_selftest.c | 2 +-
mm/vmstat.c | 2 +-
net/ipv6/ip6_fib.c | 17 ++-
net/l2tp/l2tp_core.c | 38 +++--
net/l2tp/l2tp_core.h | 8 +-
net/l2tp/l2tp_eth.c | 11 +-
net/l2tp/l2tp_netlink.c | 8 +-
net/l2tp/l2tp_ppp.c | 19 +--
net/mac80211/key.c | 38 ++++-
net/sctp/socket.c | 5 +
60 files changed, 632 insertions(+), 251 deletions(-)
--
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein
This is a note to let you know that I've just added the patch titled
mm/pagewalk.c: report holes in hugetlb ranges
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-pagewalk.c-report-holes-in-hugetlb-ranges.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 373c4557d2aa362702c4c2d41288fb1e54990b7c Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 14 Nov 2017 01:03:44 +0100
Subject: mm/pagewalk.c: report holes in hugetlb ranges
From: Jann Horn <jannh(a)google.com>
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Changed for 4.4/4.9 stable backport:
- fix up conflict in the huge_pte_offset() call
Fixes: 1e25a271c8ac ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/pagewalk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -142,8 +142,12 @@ static int walk_hugetlb_range(unsigned l
do {
next = hugetlb_entry_end(h, addr, end);
pte = huge_pte_offset(walk->mm, addr & hmask);
- if (pte && walk->hugetlb_entry)
+
+ if (pte)
err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
+ else if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, walk);
+
if (err)
break;
} while (addr = next, addr != end);
Patches currently in stable-queue which might be from jannh(a)google.com are
queue-4.9/mm-pagewalk.c-report-holes-in-hugetlb-ranges.patch
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Changed for 4.4/4.9 stable backport:
- fix up conflict in the huge_pte_offset() call
Fixes: 1e25a271c8ac ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Please apply this patch to <=4.9 stable trees instead of the
original patch.
mm/pagewalk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 29f2f8b853ae..c2cbd2620169 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -142,8 +142,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
do {
next = hugetlb_entry_end(h, addr, end);
pte = huge_pte_offset(walk->mm, addr & hmask);
- if (pte && walk->hugetlb_entry)
+
+ if (pte)
err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
+ else if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, walk);
+
if (err)
break;
} while (addr = next, addr != end);
--
2.15.0.448.gf294e3d99a-goog
This is a note to let you know that I've just added the patch titled
mm/pagewalk.c: report holes in hugetlb ranges
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-pagewalk.c-report-holes-in-hugetlb-ranges.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 373c4557d2aa362702c4c2d41288fb1e54990b7c Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 14 Nov 2017 01:03:44 +0100
Subject: mm/pagewalk.c: report holes in hugetlb ranges
From: Jann Horn <jannh(a)google.com>
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Changed for 4.4/4.9 stable backport:
- fix up conflict in the huge_pte_offset() call
Fixes: 1e25a271c8ac ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/pagewalk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -142,8 +142,12 @@ static int walk_hugetlb_range(unsigned l
do {
next = hugetlb_entry_end(h, addr, end);
pte = huge_pte_offset(walk->mm, addr & hmask);
- if (pte && walk->hugetlb_entry)
+
+ if (pte)
err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
+ else if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, walk);
+
if (err)
break;
} while (addr = next, addr != end);
Patches currently in stable-queue which might be from jannh(a)google.com are
queue-4.4/mm-pagewalk.c-report-holes-in-hugetlb-ranges.patch
On Wed 22-11-17 09:43:46, Zi Yan wrote:
>
>
> Michal Hocko wrote:
> > On Wed 22-11-17 09:54:16, Michal Hocko wrote:
> >> On Mon 20-11-17 21:18:55, Zi Yan wrote:
> > [...]
> >>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> >>> index 895ec0c4942e..a2246cf670ba 100644
> >>> --- a/include/linux/migrate.h
> >>> +++ b/include/linux/migrate.h
> >>> @@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page,
> >>> new_page = __alloc_pages_nodemask(gfp_mask, order,
> >>> preferred_nid, nodemask);
> >>>
> >>> - if (new_page && PageTransHuge(page))
> >>> + if (new_page && PageTransHuge(new_page))
> >>> prep_transhuge_page(new_page);
> >> I would keep the two checks consistent. But that leads to a more
> >> interesting question. new_page_nodemask does
> >>
> >> if (thp_migration_supported() && PageTransHuge(page)) {
> >> order = HPAGE_PMD_ORDER;
> >> gfp_mask |= GFP_TRANSHUGE;
> >> }
> >
> > And one more question/note. Why do we need thp_migration_supported
> > in the first place? 9c670ea37947 ("mm: thp: introduce
> > CONFIG_ARCH_ENABLE_THP_MIGRATION") says
> > : Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
> > : functionality to x86_64, which should be safer at the first step.
> >
> > but why is unsafe to enable the feature on other arches which support
> > THP? Is there any plan to do the next step and remove this config
> > option?
>
> Because different architectures have their own way of specifying a swap
> entry. This means, to support THP migration, each architecture needs to
> add its own __pmd_to_swp_entry() and __swp_entry_to_pmd(), which are
> used for arch-independent pmd_to_swp_entry() and swp_entry_to_pmd().
I understand that part. But this smells like a matter of coding, no?
I was suprised to see the note about safety which didn't make much sense
to me.
--
Michal Hocko
SUSE Labs
This is a note to let you know that I've just added the patch titled
ipmi: Prefer ACPI system interfaces over SMBIOS ones
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipmi-prefer-acpi-system-interfaces-over-smbios-ones.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7e030d6dff713250c7dcfb543cad2addaf479b0e Mon Sep 17 00:00:00 2001
From: Corey Minyard <cminyard(a)mvista.com>
Date: Fri, 8 Sep 2017 14:05:58 -0500
Subject: ipmi: Prefer ACPI system interfaces over SMBIOS ones
From: Corey Minyard <cminyard(a)mvista.com>
commit 7e030d6dff713250c7dcfb543cad2addaf479b0e upstream.
The recent changes to add SMBIOS (DMI) IPMI interfaces as platform
devices caused DMI to be selected before ACPI, causing ACPI type
of operations to not work.
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/char/ipmi/ipmi_si_intf.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -3424,7 +3424,7 @@ static inline void wait_for_timer_and_th
del_timer_sync(&smi_info->si_timer);
}
-static int is_new_interface(struct smi_info *info)
+static struct smi_info *find_dup_si(struct smi_info *info)
{
struct smi_info *e;
@@ -3439,24 +3439,36 @@ static int is_new_interface(struct smi_i
*/
if (info->slave_addr && !e->slave_addr)
e->slave_addr = info->slave_addr;
- return 0;
+ return e;
}
}
- return 1;
+ return NULL;
}
static int add_smi(struct smi_info *new_smi)
{
int rv = 0;
+ struct smi_info *dup;
mutex_lock(&smi_infos_lock);
- if (!is_new_interface(new_smi)) {
- pr_info(PFX "%s-specified %s state machine: duplicate\n",
- ipmi_addr_src_to_str(new_smi->addr_source),
- si_to_str[new_smi->si_type]);
- rv = -EBUSY;
- goto out_err;
+ dup = find_dup_si(new_smi);
+ if (dup) {
+ if (new_smi->addr_source == SI_ACPI &&
+ dup->addr_source == SI_SMBIOS) {
+ /* We prefer ACPI over SMBIOS. */
+ dev_info(dup->dev,
+ "Removing SMBIOS-specified %s state machine in favor of ACPI\n",
+ si_to_str[new_smi->si_type]);
+ cleanup_one_si(dup);
+ } else {
+ dev_info(new_smi->dev,
+ "%s-specified %s state machine: duplicate\n",
+ ipmi_addr_src_to_str(new_smi->addr_source),
+ si_to_str[new_smi->si_type]);
+ rv = -EBUSY;
+ goto out_err;
+ }
}
pr_info(PFX "Adding %s-specified %s state machine\n",
@@ -3865,7 +3877,8 @@ static void cleanup_one_si(struct smi_in
poll(to_clean);
schedule_timeout_uninterruptible(1);
}
- disable_si_irq(to_clean, false);
+ if (to_clean->handlers)
+ disable_si_irq(to_clean, false);
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);
Patches currently in stable-queue which might be from cminyard(a)mvista.com are
queue-4.14/ipmi-fix-unsigned-long-underflow.patch
queue-4.14/ipmi-prefer-acpi-system-interfaces-over-smbios-ones.patch
I'm requesting a backport of
7e030d6dff713250c7dcfb543cad2addaf479b0e ipmi: Prefer ACPI system
interfaces over SMBIOS ones
to the 4.14 stable kernel tree only. This was already staged for
Linus in my public tree before Andrew noticed an issue that this
patch fixes, where the system can oops when the ipmi_si module
is removed. Since it fixes an oops that likely can affect people, I'd
like it to be added to the stable tree,
Since this bug was added in 4.13 by:
0944d889a237b6107f9ceeee053fe7221cdd1089 ipmi: Convert DMI handling
over to a platform device
it should only require a backport to 4.14.
Thank you,
-corey
On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch(a)lst.de>
>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>
>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>
>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>> the code.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch(a)lst.de>
>>>>>>>>>> Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
>>>>>>>>>> Cc: Keith Busch <keith.busch(a)intel.com>
>>>>>>>>>> Cc: linux-block(a)vger.kernel.org
>>>>>>>>>> Cc: linux-nvme(a)lists.infradead.org
>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
>>>>>>>>>> Cc: Mike Galbraith <efault(a)gmx.de>
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
>
> Great, thanks for testing.
>
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
>
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
>
> Greg? Upstream commit is mentioned higher up, start of the email.
>
Forgot to cc Greg?
Hi all,
Since I'm going slightly off-topic, I've tweaked the subject line and
trimmed some of the conversation.
I believe everyone in the CC list might be interested in the
following, yet feel free to adjust.
Above all, I'd kindly ask everyone to skim through and draw their conclusions.
If the ideas put forward have some value - great, if not - let my email rot.
On 17 November 2017 at 13:57, Greg KH <gregkh(a)linuxfoundation.org> wrote:
>>
>> I still have no idea how this autoselect picks up patches that do *not*
>> have cc: stable nor Fixes: from us. What information do you have that we
>> don't for making that call?
>
> I'll let Sasha describe how he's doing this, but in the end, does it
> really matter _how_ it is done, vs. the fact that it seems to at least
> one human reviewer that this is a patch that _should_ be included based
> on the changelog text and the code patch?
>
> By having this review process that Sasha is providing, he's saying
> "Here's a patch that I think might be good for stable, do you object?"
>
> If you do, great, no harm done, all is fine, the patch is dropped. If
> you don't object, just ignore the email and the patch gets merged.
>
> If you don't want any of this to happen for your subsystem at all, then
> also fine, just let us know and we will ignore it entirely.
>
Let me start with saying that I'm handling the releases for Mesa 3D -
the project providing OpenGL, Vulkan and many other userspace graphics
drivers.
I've been doing that for 3 years now, which admittedly is quite a
short time relative to the kernel.
There is a procedure quite similar to the kernel, with a few
differences - see below for details.
We also autoselect patches, hence my interest in the heuristics
applied for nominating patches ;-)
That aside, here are some things I've learned from my experience.
Some of those may not be applicable - hope you'll find them useful:
- Try to reference developers to existing documentation/procedure.
Was just reminded that even long standing developers can forget detail X or Y.
- CC developers for the important stuff - aka do not CC on each accepted patch.
Accepted patches are merged in pre-release branch and a email with
accepted/deferred/rejected list is sent.
Patches that had conflicts merging, and ones that are rejected have
their author in the CC list.
Rejected patches have brief description + developers are contacted beforehand.
- Autoselect patches are merged only with the approval from the
respective developers.
IMHO this engages developers into the process, without distracting
them too much.
It is by no means a perfect system - input and changes are always appreciated.
That said, here are some suggestions which should make autosel smoother:
- Document the autoselect process
Information about about What, Why, and [ideally] How - analogous to
the normal stable nominations.
Insert reference to the process in the patch notification email.
- Make the autoselect nominations _more_ distinct than the normal stable ones.
Maintainers will want to put more cognitive effort into the patches.
HTH
Emil
This is a note to let you know that I've just added the patch titled
mm/page_ext.c: check if page_ext is not prepared
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-page_ext.c-check-if-page_ext-is-not-prepared.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e492080e640c2d1235ddf3441cae634cfffef7e1 Mon Sep 17 00:00:00 2001
From: Jaewon Kim <jaewon31.kim(a)samsung.com>
Date: Wed, 15 Nov 2017 17:39:07 -0800
Subject: mm/page_ext.c: check if page_ext is not prepared
From: Jaewon Kim <jaewon31.kim(a)samsung.com>
commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.
online_page_ext() and page_ext_init() allocate page_ext for each
section, but they do not allocate if the first PFN is !pfn_present(pfn)
or !pfn_valid(pfn). Then section->page_ext remains as NULL.
lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a
valid PFN, __set_page_owner will try to get page_ext through
lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse
NULL pointer as value 0. This incurrs invalid address access.
This is the panic example when PFN 0x100000 is not valid but PFN
0x13FC00 is being used for page_ext. section->page_ext is NULL,
get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
0x13FC00.
To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.
Unable to handle kernel paging request at virtual address 01dfa014
------------[ cut here ]------------
Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
Internal error: Oops: 96000045 [#1] PREEMPT SMP
Modules linked in:
PC is at __set_page_owner+0x48/0x78
LR is at __set_page_owner+0x44/0x78
__set_page_owner+0x48/0x78
get_page_from_freelist+0x880/0x8e8
__alloc_pages_nodemask+0x14c/0xc48
__do_page_cache_readahead+0xdc/0x264
filemap_fault+0x2ac/0x550
ext4_filemap_fault+0x3c/0x58
__do_fault+0x80/0x120
handle_mm_fault+0x704/0xbb0
do_page_fault+0x2e8/0x394
do_mem_abort+0x88/0x124
Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites").
Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim(a)samsung.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Joonsoo Kim <js1304(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/page_ext.c | 4 ----
1 file changed, 4 deletions(-)
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -106,7 +106,6 @@ struct page_ext *lookup_page_ext(struct
struct page_ext *base;
base = NODE_DATA(page_to_nid(page))->node_page_ext;
-#ifdef CONFIG_DEBUG_VM
/*
* The sanity checks the page allocator does upon freeing a
* page can reach here before the page_ext arrays are
@@ -115,7 +114,6 @@ struct page_ext *lookup_page_ext(struct
*/
if (unlikely(!base))
return NULL;
-#endif
offset = pfn - round_down(node_start_pfn(page_to_nid(page)),
MAX_ORDER_NR_PAGES);
return base + offset;
@@ -180,7 +178,6 @@ struct page_ext *lookup_page_ext(struct
{
unsigned long pfn = page_to_pfn(page);
struct mem_section *section = __pfn_to_section(pfn);
-#ifdef CONFIG_DEBUG_VM
/*
* The sanity checks the page allocator does upon freeing a
* page can reach here before the page_ext arrays are
@@ -189,7 +186,6 @@ struct page_ext *lookup_page_ext(struct
*/
if (!section->page_ext)
return NULL;
-#endif
return section->page_ext + pfn;
}
Patches currently in stable-queue which might be from jaewon31.kim(a)samsung.com are
queue-4.4/mm-page_ext.c-check-if-page_ext-is-not-prepared.patch
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
In [1], Andrea reported that during memory hotplug/hot remove
prep_transhuge_page() is called incorrectly on non-THP pages for
migration, when THP is on but THP migration is not enabled.
This leads to a bad state of target pages for migration.
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
[1] https://lkml.org/lkml/2017/11/20/411
Cc: stable(a)vger.kernel.org # v4.14
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
include/linux/migrate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 895ec0c4942e..a2246cf670ba 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page,
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
--
2.14.2
On Wed 22-11-17 07:29:38, Zi Yan wrote:
> On 22 Nov 2017, at 7:13, Zi Yan wrote:
>
> > On 22 Nov 2017, at 5:14, Michal Hocko wrote:
> >
> >> On Wed 22-11-17 10:35:10, Michal Hocko wrote:
> >> [...]
> >>> Moreover I am not really sure this is really working properly. Just look
> >>> at the split_huge_page. It moves all the tail pages to the LRU list
> >>> while migrate_pages has a list of pages to migrate. So we will migrate
> >>> the head page and all the rest will get back to the LRU list. What
> >>> guarantees that they will get migrated as well.
> >>
> >> OK, so this is as I've expected. It doesn't work! Some pfn walker based
> >> migration will just skip tail pages see madvise_inject_error.
> >> __alloc_contig_migrate_range will simply fail on THP page see
> >> isolate_migratepages_block so we even do not try to migrate it.
> >> do_move_page_to_node_array will simply migrate head and do not care
> >> about tail pages. do_mbind splits the page and then fall back to pte
> >> walk when thp migration is not supported but it doesn't handle tail
> >> pages if the THP migration path is not able to allocate a fresh THP
> >> AFAICS. Memory hotplug should be safe because it doesn't skip the whole
> >> THP when doing pfn walk.
> >>
> >> Unless I am missing something here this looks like a huge mess to me.
> >
> > +Kirill
> >
> > First, I agree with you that splitting a THP and only migrating its head page
> > is a mess. But what you describe is also the behavior of migrate_page()
> > _before_ THP migration support is added. I thought that was intended.
> >
> > Look at http://elixir.free-electrons.com/linux/v4.13.15/source/mm/migrate.c#L1091,
> > unmap_and_move() splits THPs and only migrates the head page in v4.13 before THP
> > migration is added. I think the behavior was introduced since v4.5 (I just skimmed
> > v4.0 to v4.13 code and did not have time to use git blame), before that THPs are
> > not migrated but shown as successfully migrated (at least from v4.4’s code).
>
> Sorry, I misread v4.4’s code, it also does ‘splitting a THP and migrating its head page’.
> This behavior was there for a long time, at least since v3.0.
>
> The code in unmap_and_move() is:
>
> if (unlikely(PageTransHuge(page)))
> if (unlikely(split_huge_page(page)))
> goto out;
I _think_ that this all should be handled at migrate_pages layer. Try to
migrate THP and fallback to split_huge_page into to the list when it
fails. I haven't checked whether there is something which would prevent
that though. THP tricks in specific paths then should be removed.
--
Michal Hocko
SUSE Labs
I've made mistake during converting hugetlb code to 5-level paging:
in huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset().
Otherwise it leads to crash -- NULL-pointer dereference in pud_alloc()
if p4d table is not yet allocated.
It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Cc: <stable(a)vger.kernel.org> # v4.11+
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2d2ff5e8bf2b..94a4c0b63580 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4617,7 +4617,9 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
pud = pud_alloc(mm, p4d, addr);
if (pud) {
if (sz == PUD_SIZE) {
--
2.15.0
Hi,
Is there a way to get notifications about patches queued up for the
-rc1 commits in linux-stable-rc.git? I currently use the push that is
associated with email to this list from Greg. But that only gives 48h
window to do the building and testing. If there is a way to get the
patches earlier, that would give more time for testing activities. Any
hints are much appreciated.
Best Regards,
Milosz
On Wed 22-11-17 04:18:35, Zi Yan wrote:
> On 22 Nov 2017, at 3:54, Michal Hocko wrote:
[...]
> > I would keep the two checks consistent. But that leads to a more
> > interesting question. new_page_nodemask does
> >
> > if (thp_migration_supported() && PageTransHuge(page)) {
> > order = HPAGE_PMD_ORDER;
> > gfp_mask |= GFP_TRANSHUGE;
> > }
> >
> > How come it is safe to allocate an order-0 page if
> > !thp_migration_supported() when we are about to migrate THP? This
> > doesn't make any sense to me. Are we working around this somewhere else?
> > Why shouldn't we simply return NULL here?
>
> If !thp_migration_supported(), we will first split a THP and migrate
> its head page. This process is done in unmap_and_move() after
> get_new_page() (the function pointer to this new_page_nodemask()) is
> called. The situation can be PageTransHuge(page) is true here, but the
> page is split in unmap_and_move(), so we want to return a order-0 page
> here.
This deserves a big fat comment in the code because this is not clear
from the code!
> I think the confusion comes from that there is no guarantee of THP
> allocation when we are doing THP migration. If we can allocate a THP
> during THP migration, we are good. Otherwise, we want to fallback to
> the old way, splitting the original THP and migrating the head page,
> to preserve the original code behavior.
I understand that but that should be done explicitly rather than relying
on two functions doing the right thing because this is just too fragile.
Moreover I am not really sure this is really working properly. Just look
at the split_huge_page. It moves all the tail pages to the LRU list
while migrate_pages has a list of pages to migrate. So we will migrate
the head page and all the rest will get back to the LRU list. What
guarantees that they will get migrated as well.
This all looks like a mess!
--
Michal Hocko
SUSE Labs
On Tue 21 Nov 2017, 17:35, Zi Yan wrote:
> On 21 Nov 2017, at 17:12, Andrew Morton wrote:
>
> > On Mon, 20 Nov 2017 21:18:55 -0500 Zi Yan <zi.yan(a)sent.com> wrote:
> >
> >> This patch fixes it by only calling prep_transhuge_page() when we are
> >> certain that the target page is THP.
> >
> > What are the user-visible effects of the bug?
>
> By inspecting the code, if called on a non-THP, prep_transhuge_page() will
> 1) change the value of the mapping of (page + 2), since it is used for THP deferred list;
> 2) change the lru value of (page + 1), since it is used for THP’s dtor.
>
> Both can lead to data corruption of these two pages.
Pragmatically and from the point of view of the memory_hotplug subsys,
the effect is a kernel crash when pages are being migrated during a memory
hot remove offline and migration target pages are found in a bad state.
Best,
Andrea
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.9/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.4/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -447,8 +447,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.14/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.13/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-3.18/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
The patch below does not apply to the 4.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: [PATCH] mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 326439428daf..f2face8b889e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: [PATCH] mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 326439428daf..f2face8b889e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
This is a note to let you know that I've just added the patch titled
mm, swap: fix false error message in __swp_swapcount()
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-swap-fix-false-error-message-in-__swp_swapcount.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: mm, swap: fix false error message in __swp_swapcount()
From: Huang Ying <huang.ying.caritas(a)gmail.com>
commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 upstream.
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/swap_state.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -506,6 +506,7 @@ struct page *swapin_readahead(swp_entry_
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true;
@@ -519,6 +520,8 @@ struct page *swapin_readahead(swp_entry_
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
Patches currently in stable-queue which might be from huang.ying.caritas(a)gmail.com are
queue-4.13/mm-swap-fix-false-error-message-in-__swp_swapcount.patch
This is a note to let you know that I've just added the patch titled
x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-amd-derive-l3-shared_cpu_map-from-cpu_llc_shared_mask.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2b83809a5e6d619a780876fcaf68cdc42b50d28c Mon Sep 17 00:00:00 2001
From: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Date: Mon, 31 Jul 2017 10:51:59 +0200
Subject: x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
From: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c upstream.
For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID
to calculate shared_cpu_map. However, APIC IDs are not guaranteed to
be contiguous for cores across different L3s (e.g. family17h system
w/ downcore configuration). This breaks the logic, and results in an
incorrect L3 shared_cpu_map.
Instead, always use the previously calculated cpu_llc_shared_mask of
each CPU to derive the L3 shared_cpu_map.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20170731085159.9455-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -811,7 +811,24 @@ static int __cache_amd_cpumap_setup(unsi
struct cacheinfo *this_leaf;
int i, sibling;
- if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
+ /*
+ * For L3, always use the pre-calculated cpu_llc_shared_mask
+ * to derive shared_cpu_map.
+ */
+ if (index == 3) {
+ for_each_cpu(i, cpu_llc_shared_mask(cpu)) {
+ this_cpu_ci = get_cpu_cacheinfo(i);
+ if (!this_cpu_ci->info_list)
+ continue;
+ this_leaf = this_cpu_ci->info_list + index;
+ for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) {
+ if (!cpu_online(sibling))
+ continue;
+ cpumask_set_cpu(sibling,
+ &this_leaf->shared_cpu_map);
+ }
+ }
+ } else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
unsigned int apicid, nshared, first, last;
this_leaf = this_cpu_ci->info_list + index;
@@ -837,19 +854,6 @@ static int __cache_amd_cpumap_setup(unsi
continue;
cpumask_set_cpu(sibling,
&this_leaf->shared_cpu_map);
- }
- }
- } else if (index == 3) {
- for_each_cpu(i, cpu_llc_shared_mask(cpu)) {
- this_cpu_ci = get_cpu_cacheinfo(i);
- if (!this_cpu_ci->info_list)
- continue;
- this_leaf = this_cpu_ci->info_list + index;
- for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) {
- if (!cpu_online(sibling))
- continue;
- cpumask_set_cpu(sibling,
- &this_leaf->shared_cpu_map);
}
}
} else
Patches currently in stable-queue which might be from suravee.suthikulpanit(a)amd.com are
queue-4.13/x86-cpu-amd-derive-l3-shared_cpu_map-from-cpu_llc_shared_mask.patch