The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d58c1834bf0d218a0bc00f8fb44874551b21da84 Mon Sep 17 00:00:00 2001
From: Kaike Wan <kaike.wan(a)intel.com>
Date: Thu, 15 Aug 2019 15:20:33 -0400
Subject: [PATCH] IB/hfi1: Drop stale TID RDMA packets
In a congested fabric with adaptive routing enabled, traces show that
the sender could receive stale TID RDMA NAK packets that contain newer
KDETH PSNs and older Verbs PSNs. If not dropped, these packets could
cause the incorrect rewinding of the software flows and the incorrect
completion of TID RDMA WRITE requests, and eventually leading to memory
corruption and kernel crash.
The current code drops stale TID RDMA ACK/NAK packets solely based
on KDETH PSNs, which may lead to erroneous processing. This patch
fixes the issue by also checking the Verbs PSN. Addition checks are
added before rewinding the TID RDMA WRITE DATA packets.
Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Kaike Wan <kaike.wan(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.inte…
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c
index 996fc298207e..94070144fef5 100644
--- a/drivers/infiniband/hw/hfi1/tid_rdma.c
+++ b/drivers/infiniband/hw/hfi1/tid_rdma.c
@@ -4509,7 +4509,7 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi1_packet *packet)
struct rvt_swqe *wqe;
struct tid_rdma_request *req;
struct tid_rdma_flow *flow;
- u32 aeth, psn, req_psn, ack_psn, resync_psn, ack_kpsn;
+ u32 aeth, psn, req_psn, ack_psn, flpsn, resync_psn, ack_kpsn;
unsigned long flags;
u16 fidx;
@@ -4538,6 +4538,9 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi1_packet *packet)
ack_kpsn--;
}
+ if (unlikely(qp->s_acked == qp->s_tail))
+ goto ack_op_err;
+
wqe = rvt_get_swqe_ptr(qp, qp->s_acked);
if (wqe->wr.opcode != IB_WR_TID_RDMA_WRITE)
@@ -4550,7 +4553,8 @@ void hfi1_rc_rcv_tid_rdma_ack(struct hfi1_packet *packet)
trace_hfi1_tid_flow_rcv_tid_ack(qp, req->acked_tail, flow);
/* Drop stale ACK/NAK */
- if (cmp_psn(psn, full_flow_psn(flow, flow->flow_state.spsn)) < 0)
+ if (cmp_psn(psn, full_flow_psn(flow, flow->flow_state.spsn)) < 0 ||
+ cmp_psn(req_psn, flow->flow_state.resp_ib_psn) < 0)
goto ack_op_err;
while (cmp_psn(ack_kpsn,
@@ -4712,7 +4716,12 @@ done:
switch ((aeth >> IB_AETH_CREDIT_SHIFT) &
IB_AETH_CREDIT_MASK) {
case 0: /* PSN sequence error */
+ if (!req->flows)
+ break;
flow = &req->flows[req->acked_tail];
+ flpsn = full_flow_psn(flow, flow->flow_state.lpsn);
+ if (cmp_psn(psn, flpsn) > 0)
+ break;
trace_hfi1_tid_flow_rcv_tid_ack(qp, req->acked_tail,
flow);
req->r_ack_psn = mask_psn(be32_to_cpu(ohdr->bth[2]));
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 19 Aug 2019 15:52:35 +0000
Subject: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
There have been reports of RDRAND issues after resuming from suspend on
some AMD family 15h and family 16h systems. This issue stems from a BIOS
not performing the proper steps during resume to ensure RDRAND continues
to function properly.
RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
support using CPUID, including the kernel, will believe that RDRAND is
not supported.
Update the CPU initialization to clear the RDRAND CPUID bit for any family
15h and 16h processor that supports RDRAND. If it is known that the family
15h or family 16h system does not have an RDRAND resume issue or that the
system will not be placed in suspend, the "rdrand=force" kernel parameter
can be used to stop the clearing of the RDRAND CPUID bit.
Additionally, update the suspend and resume path to save and restore the
MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
place after resuming from suspend.
Note, that clearing the RDRAND CPUID bit does not prevent a processor
that normally supports the RDRAND instruction from executing it. So any
code that determined the support based on family and model won't #UD.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: Andrew Cooper <andrew.cooper3(a)citrix.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Chen Yu <yu.c.chen(a)intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: "linux-doc(a)vger.kernel.org" <linux-doc(a)vger.kernel.org>
Cc: "linux-pm(a)vger.kernel.org" <linux-pm(a)vger.kernel.org>
Cc: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
Cc: <stable(a)vger.kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "x86(a)kernel.org" <x86(a)kernel.org>
Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.15662299…
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..4c1971960afa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4090,6 +4090,13 @@
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.
+ rdrand= [X86]
+ force - Override the decision by the kernel to hide the
+ advertisement of RDRAND support (this affects
+ certain AMD processors because of buggy BIOS
+ support, specifically around the suspend/resume
+ path).
+
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6b4fc2788078..271d837d69a8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -381,6 +381,7 @@
#define MSR_AMD64_PATCH_LEVEL 0x0000008b
#define MSR_AMD64_TSC_RATIO 0xc0000104
#define MSR_AMD64_NB_CFG 0xc001001f
+#define MSR_AMD64_CPUID_FN_1 0xc0011004
#define MSR_AMD64_PATCH_LOADER 0xc0010020
#define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
#define MSR_AMD64_OSVW_STATUS 0xc0010141
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 8d4e50428b68..68c363c341bf 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -804,6 +804,64 @@ static void init_amd_ln(struct cpuinfo_x86 *c)
msr_set_bit(MSR_AMD64_DE_CFG, 31);
}
+static bool rdrand_force;
+
+static int __init rdrand_cmdline(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!strcmp(str, "force"))
+ rdrand_force = true;
+ else
+ return -EINVAL;
+
+ return 0;
+}
+early_param("rdrand", rdrand_cmdline);
+
+static void clear_rdrand_cpuid_bit(struct cpuinfo_x86 *c)
+{
+ /*
+ * Saving of the MSR used to hide the RDRAND support during
+ * suspend/resume is done by arch/x86/power/cpu.c, which is
+ * dependent on CONFIG_PM_SLEEP.
+ */
+ if (!IS_ENABLED(CONFIG_PM_SLEEP))
+ return;
+
+ /*
+ * The nordrand option can clear X86_FEATURE_RDRAND, so check for
+ * RDRAND support using the CPUID function directly.
+ */
+ if (!(cpuid_ecx(1) & BIT(30)) || rdrand_force)
+ return;
+
+ msr_clear_bit(MSR_AMD64_CPUID_FN_1, 62);
+
+ /*
+ * Verify that the CPUID change has occurred in case the kernel is
+ * running virtualized and the hypervisor doesn't support the MSR.
+ */
+ if (cpuid_ecx(1) & BIT(30)) {
+ pr_info_once("BIOS may not properly restore RDRAND after suspend, but hypervisor does not support hiding RDRAND via CPUID.\n");
+ return;
+ }
+
+ clear_cpu_cap(c, X86_FEATURE_RDRAND);
+ pr_info_once("BIOS may not properly restore RDRAND after suspend, hiding RDRAND via CPUID. Use rdrand=force to reenable.\n");
+}
+
+static void init_amd_jg(struct cpuinfo_x86 *c)
+{
+ /*
+ * Some BIOS implementations do not restore proper RDRAND support
+ * across suspend and resume. Check on whether to hide the RDRAND
+ * instruction support via CPUID.
+ */
+ clear_rdrand_cpuid_bit(c);
+}
+
static void init_amd_bd(struct cpuinfo_x86 *c)
{
u64 value;
@@ -818,6 +876,13 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
wrmsrl_safe(MSR_F15H_IC_CFG, value);
}
}
+
+ /*
+ * Some BIOS implementations do not restore proper RDRAND support
+ * across suspend and resume. Check on whether to hide the RDRAND
+ * instruction support via CPUID.
+ */
+ clear_rdrand_cpuid_bit(c);
}
static void init_amd_zn(struct cpuinfo_x86 *c)
@@ -860,6 +925,7 @@ static void init_amd(struct cpuinfo_x86 *c)
case 0x10: init_amd_gh(c); break;
case 0x12: init_amd_ln(c); break;
case 0x15: init_amd_bd(c); break;
+ case 0x16: init_amd_jg(c); break;
case 0x17: init_amd_zn(c); break;
}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 24b079e94bc2..c9ef6a7a4a1a 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -12,6 +12,7 @@
#include <linux/smp.h>
#include <linux/perf_event.h>
#include <linux/tboot.h>
+#include <linux/dmi.h>
#include <asm/pgtable.h>
#include <asm/proto.h>
@@ -23,7 +24,7 @@
#include <asm/debugreg.h>
#include <asm/cpu.h>
#include <asm/mmu_context.h>
-#include <linux/dmi.h>
+#include <asm/cpu_device_id.h>
#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -397,15 +398,14 @@ static int __init bsp_pm_check_init(void)
core_initcall(bsp_pm_check_init);
-static int msr_init_context(const u32 *msr_id, const int total_num)
+static int msr_build_context(const u32 *msr_id, const int num)
{
- int i = 0;
+ struct saved_msrs *saved_msrs = &saved_context.saved_msrs;
struct saved_msr *msr_array;
+ int total_num;
+ int i, j;
- if (saved_context.saved_msrs.array || saved_context.saved_msrs.num > 0) {
- pr_err("x86/pm: MSR quirk already applied, please check your DMI match table.\n");
- return -EINVAL;
- }
+ total_num = saved_msrs->num + num;
msr_array = kmalloc_array(total_num, sizeof(struct saved_msr), GFP_KERNEL);
if (!msr_array) {
@@ -413,19 +413,30 @@ static int msr_init_context(const u32 *msr_id, const int total_num)
return -ENOMEM;
}
- for (i = 0; i < total_num; i++) {
- msr_array[i].info.msr_no = msr_id[i];
+ if (saved_msrs->array) {
+ /*
+ * Multiple callbacks can invoke this function, so copy any
+ * MSR save requests from previous invocations.
+ */
+ memcpy(msr_array, saved_msrs->array,
+ sizeof(struct saved_msr) * saved_msrs->num);
+
+ kfree(saved_msrs->array);
+ }
+
+ for (i = saved_msrs->num, j = 0; i < total_num; i++, j++) {
+ msr_array[i].info.msr_no = msr_id[j];
msr_array[i].valid = false;
msr_array[i].info.reg.q = 0;
}
- saved_context.saved_msrs.num = total_num;
- saved_context.saved_msrs.array = msr_array;
+ saved_msrs->num = total_num;
+ saved_msrs->array = msr_array;
return 0;
}
/*
- * The following section is a quirk framework for problematic BIOSen:
+ * The following sections are a quirk framework for problematic BIOSen:
* Sometimes MSRs are modified by the BIOSen after suspended to
* RAM, this might cause unexpected behavior after wakeup.
* Thus we save/restore these specified MSRs across suspend/resume
@@ -440,7 +451,7 @@ static int msr_initialize_bdw(const struct dmi_system_id *d)
u32 bdw_msr_id[] = { MSR_IA32_THERM_CONTROL };
pr_info("x86/pm: %s detected, MSR saving is needed during suspending.\n", d->ident);
- return msr_init_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id));
+ return msr_build_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id));
}
static const struct dmi_system_id msr_save_dmi_table[] = {
@@ -455,9 +466,58 @@ static const struct dmi_system_id msr_save_dmi_table[] = {
{}
};
+static int msr_save_cpuid_features(const struct x86_cpu_id *c)
+{
+ u32 cpuid_msr_id[] = {
+ MSR_AMD64_CPUID_FN_1,
+ };
+
+ pr_info("x86/pm: family %#hx cpu detected, MSR saving is needed during suspending.\n",
+ c->family);
+
+ return msr_build_context(cpuid_msr_id, ARRAY_SIZE(cpuid_msr_id));
+}
+
+static const struct x86_cpu_id msr_save_cpu_table[] = {
+ {
+ .vendor = X86_VENDOR_AMD,
+ .family = 0x15,
+ .model = X86_MODEL_ANY,
+ .feature = X86_FEATURE_ANY,
+ .driver_data = (kernel_ulong_t)msr_save_cpuid_features,
+ },
+ {
+ .vendor = X86_VENDOR_AMD,
+ .family = 0x16,
+ .model = X86_MODEL_ANY,
+ .feature = X86_FEATURE_ANY,
+ .driver_data = (kernel_ulong_t)msr_save_cpuid_features,
+ },
+ {}
+};
+
+typedef int (*pm_cpu_match_t)(const struct x86_cpu_id *);
+static int pm_cpu_check(const struct x86_cpu_id *c)
+{
+ const struct x86_cpu_id *m;
+ int ret = 0;
+
+ m = x86_match_cpu(msr_save_cpu_table);
+ if (m) {
+ pm_cpu_match_t fn;
+
+ fn = (pm_cpu_match_t)m->driver_data;
+ ret = fn(m);
+ }
+
+ return ret;
+}
+
static int pm_check_save_msr(void)
{
dmi_check_system(msr_save_dmi_table);
+ pm_cpu_check(msr_save_cpu_table);
+
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7c7cfdcf7f1777c7376fc9a239980de04b6b5ea1 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Wed, 14 Aug 2019 15:59:50 +0300
Subject: [PATCH] scsi: ufs: Fix NULL pointer dereference in
ufshcd_config_vreg_hpm()
Fix the following BUG:
[ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c
[ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core]
[ 187.065938] Call Trace:
[ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core]
[ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core]
[ 187.065993] ? pci_pm_restore+0xb0/0xb0
[ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci]
[ 187.066017] pci_pm_thaw+0x4c/0x90
[ 187.066030] dpm_run_callback+0x5b/0x150
[ 187.066043] device_resume+0x11b/0x220
Voltage regulators are optional, so functions must check they exist
before dereferencing.
Note this issue is hidden if CONFIG_REGULATORS is not set, because the
offending code is optimised away.
Notes for stable:
The issue first appears in commit 57d104c153d3 ("ufs: add UFS power
management support") but is inadvertently fixed in commit 60f0187031c0
("scsi: ufs: disable vccq if it's not needed by UFS device") which in
turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq
if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and
v5.1+
Fixes: 57d104c153d3 ("ufs: add UFS power management support")
Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"")
Cc: stable(a)vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e274053109d0..029da74bb2f5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7062,6 +7062,9 @@ static inline int ufshcd_config_vreg_lpm(struct ufs_hba *hba,
static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
struct ufs_vreg *vreg)
{
+ if (!vreg)
+ return 0;
+
return ufshcd_config_vreg_load(hba->dev, vreg, vreg->max_uA);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7c7cfdcf7f1777c7376fc9a239980de04b6b5ea1 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Wed, 14 Aug 2019 15:59:50 +0300
Subject: [PATCH] scsi: ufs: Fix NULL pointer dereference in
ufshcd_config_vreg_hpm()
Fix the following BUG:
[ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c
[ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core]
[ 187.065938] Call Trace:
[ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core]
[ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core]
[ 187.065993] ? pci_pm_restore+0xb0/0xb0
[ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci]
[ 187.066017] pci_pm_thaw+0x4c/0x90
[ 187.066030] dpm_run_callback+0x5b/0x150
[ 187.066043] device_resume+0x11b/0x220
Voltage regulators are optional, so functions must check they exist
before dereferencing.
Note this issue is hidden if CONFIG_REGULATORS is not set, because the
offending code is optimised away.
Notes for stable:
The issue first appears in commit 57d104c153d3 ("ufs: add UFS power
management support") but is inadvertently fixed in commit 60f0187031c0
("scsi: ufs: disable vccq if it's not needed by UFS device") which in
turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq
if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and
v5.1+
Fixes: 57d104c153d3 ("ufs: add UFS power management support")
Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"")
Cc: stable(a)vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e274053109d0..029da74bb2f5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7062,6 +7062,9 @@ static inline int ufshcd_config_vreg_lpm(struct ufs_hba *hba,
static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
struct ufs_vreg *vreg)
{
+ if (!vreg)
+ return 0;
+
return ufshcd_config_vreg_load(hba->dev, vreg, vreg->max_uA);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7c7cfdcf7f1777c7376fc9a239980de04b6b5ea1 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Wed, 14 Aug 2019 15:59:50 +0300
Subject: [PATCH] scsi: ufs: Fix NULL pointer dereference in
ufshcd_config_vreg_hpm()
Fix the following BUG:
[ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c
[ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core]
[ 187.065938] Call Trace:
[ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core]
[ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core]
[ 187.065993] ? pci_pm_restore+0xb0/0xb0
[ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci]
[ 187.066017] pci_pm_thaw+0x4c/0x90
[ 187.066030] dpm_run_callback+0x5b/0x150
[ 187.066043] device_resume+0x11b/0x220
Voltage regulators are optional, so functions must check they exist
before dereferencing.
Note this issue is hidden if CONFIG_REGULATORS is not set, because the
offending code is optimised away.
Notes for stable:
The issue first appears in commit 57d104c153d3 ("ufs: add UFS power
management support") but is inadvertently fixed in commit 60f0187031c0
("scsi: ufs: disable vccq if it's not needed by UFS device") which in
turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq
if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and
v5.1+
Fixes: 57d104c153d3 ("ufs: add UFS power management support")
Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"")
Cc: stable(a)vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e274053109d0..029da74bb2f5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7062,6 +7062,9 @@ static inline int ufshcd_config_vreg_lpm(struct ufs_hba *hba,
static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
struct ufs_vreg *vreg)
{
+ if (!vreg)
+ return 0;
+
return ufshcd_config_vreg_load(hba->dev, vreg, vreg->max_uA);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9cdde85804833af77c6afbf7c53f0d959c42eb9f Mon Sep 17 00:00:00 2001
From: Hyungwoo Yang <hyungwoo.yang(a)intel.com>
Date: Wed, 29 May 2019 21:03:54 -0700
Subject: [PATCH] platform/chrome: cros_ec_ishtp: fix crash during suspend
Kernel crashes during suspend due to wrong conversion in
suspend and resume functions.
Use the proper helper to get ishtp_cl_device instance.
Cc: <stable(a)vger.kernel.org> # 5.2.x: b12bbdc5: HID: intel-ish-hid: fix wrong driver_data usage
Signed-off-by: Hyungwoo Yang <hyungwoo.yang(a)intel.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo(a)collabora.com>
diff --git a/drivers/platform/chrome/cros_ec_ishtp.c b/drivers/platform/chrome/cros_ec_ishtp.c
index e504d255d5ce..430731cdf827 100644
--- a/drivers/platform/chrome/cros_ec_ishtp.c
+++ b/drivers/platform/chrome/cros_ec_ishtp.c
@@ -707,7 +707,7 @@ static int cros_ec_ishtp_reset(struct ishtp_cl_device *cl_device)
*/
static int __maybe_unused cros_ec_ishtp_suspend(struct device *device)
{
- struct ishtp_cl_device *cl_device = dev_get_drvdata(device);
+ struct ishtp_cl_device *cl_device = ishtp_dev_to_cl_device(device);
struct ishtp_cl *cros_ish_cl = ishtp_get_drvdata(cl_device);
struct ishtp_cl_data *client_data = ishtp_get_client_data(cros_ish_cl);
@@ -722,7 +722,7 @@ static int __maybe_unused cros_ec_ishtp_suspend(struct device *device)
*/
static int __maybe_unused cros_ec_ishtp_resume(struct device *device)
{
- struct ishtp_cl_device *cl_device = dev_get_drvdata(device);
+ struct ishtp_cl_device *cl_device = ishtp_dev_to_cl_device(device);
struct ishtp_cl *cros_ish_cl = ishtp_get_drvdata(cl_device);
struct ishtp_cl_data *client_data = ishtp_get_client_data(cros_ish_cl);
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: f7d5b3dc4792 - Linux 5.2.10
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/124342
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: f7d5b3dc4792 - Linux 5.2.10
We grabbed the 4a6e874ddb67 commit of the stable queue repository.
We then merged the patchset with `git am`:
asoc-simple_card_utils.h-care-null-dai-at-asoc_simpl.patch
asoc-simple-card-fix-an-use-after-free-in-simple_dai.patch
asoc-simple-card-fix-an-use-after-free-in-simple_for.patch
asoc-audio-graph-card-fix-use-after-free-in-graph_da.patch
asoc-audio-graph-card-fix-an-use-after-free-in-graph.patch
asoc-audio-graph-card-add-missing-const-at-graph_get.patch
regulator-axp20x-fix-dcdca-and-dcdcd-for-axp806.patch
regulator-axp20x-fix-dcdc5-and-dcdc6-for-axp803.patch
asoc-samsung-odroid-fix-an-use-after-free-issue-for-.patch
asoc-samsung-odroid-fix-a-double-free-issue-for-cpu_.patch
asoc-intel-bytcht_es8316-add-quirk-for-irbis-nb41-ne.patch
hid-logitech-hidpp-add-usb-pid-for-a-few-more-suppor.patch
hid-add-044f-b320-thrustmaster-inc.-2-in-1-dt.patch
mips-kernel-only-use-i8253-clocksource-with-periodic.patch
mips-fix-cacheinfo.patch
libbpf-sanitize-var-to-conservative-1-byte-int.patch
netfilter-ebtables-fix-a-memory-leak-bug-in-compat.patch
asoc-dapm-fix-handling-of-custom_stop_condition-on-d.patch
asoc-sof-use-__u32-instead-of-uint32_t-in-uapi-heade.patch
spi-pxa2xx-balance-runtime-pm-enable-disable-on-erro.patch
bpf-sockmap-sock_map_delete-needs-to-use-xchg.patch
bpf-sockmap-synchronize_rcu-before-free-ing-map.patch
bpf-sockmap-only-create-entry-if-ulp-is-not-already-.patch
selftests-bpf-fix-sendmsg6_prog-on-s390.patch
asoc-dapm-fix-a-memory-leak-bug.patch
bonding-force-slave-speed-check-after-link-state-rec.patch
net-mvpp2-don-t-check-for-3-consecutive-idle-frames-.patch
selftests-forwarding-gre_multipath-enable-ipv4-forwa.patch
selftests-forwarding-gre_multipath-fix-flower-filter.patch
selftests-bpf-add-another-gso_segs-access.patch
libbpf-fix-using-uninitialized-ioctl-results.patch
can-dev-call-netif_carrier_off-in-register_candev.patch
can-mcp251x-add-error-check-when-wq-alloc-failed.patch
can-gw-fix-error-path-of-cgw_module_init.patch
asoc-fail-card-instantiation-if-dai-format-setup-fai.patch
staging-fbtft-fix-gpio-handling.patch
libbpf-silence-gcc8-warning-about-string-truncation.patch
st21nfca_connectivity_event_received-null-check-the-.patch
st_nci_hci_connectivity_event_received-null-check-th.patch
nl-mac-80211-fix-interface-combinations-on-crypto-co.patch
asoc-ti-davinci-mcasp-fix-clk-pdir-handling-for-i2s-.patch
asoc-rockchip-fix-mono-capture.patch
asoc-ti-davinci-mcasp-correct-slot_width-posed-const.patch
net-usb-qmi_wwan-add-the-broadmobi-bm818-card.patch
qed-rdma-fix-the-hw_ver-returned-in-device-attribute.patch
isdn-misdn-hfcsusb-fix-possible-null-pointer-derefer.patch
habanalabs-fix-f-w-download-in-be-architecture.patch
mac80211_hwsim-fix-possible-null-pointer-dereference.patch
net-stmmac-manage-errors-returned-by-of_get_mac_addr.patch
netfilter-ipset-actually-allow-destination-mac-addre.patch
netfilter-ipset-copy-the-right-mac-address-in-bitmap.patch
netfilter-ipset-fix-rename-concurrency-with-listing.patch
rxrpc-fix-potential-deadlock.patch
rxrpc-fix-the-lack-of-notification-when-sendmsg-fail.patch
nvmem-use-the-same-permissions-for-eeprom-as-for-nvm.patch
iwlwifi-mvm-avoid-races-in-rate-init-and-rate-perfor.patch
iwlwifi-dbg_ini-move-iwl_dbg_tlv_load_bin-out-of-deb.patch
iwlwifi-dbg_ini-move-iwl_dbg_tlv_free-outside-of-deb.patch
iwlwifi-fix-locking-in-delayed-gtk-setting.patch
iwlwifi-mvm-send-lq-command-always-async.patch
enetc-fix-build-error-without-phylib.patch
isdn-hfcsusb-fix-misdn-driver-crash-caused-by-transf.patch
net-phy-phy_led_triggers-fix-a-possible-null-pointer.patch
perf-bench-numa-fix-cpu0-binding.patch
spi-pxa2xx-add-support-for-intel-tiger-lake.patch
can-sja1000-force-the-string-buffer-null-terminated.patch
can-peak_usb-force-the-string-buffer-null-terminated.patch
asoc-amd-acp3x-use-dma_ops-of-parent-device-for-acp3.patch
net-ethernet-qlogic-qed-force-the-string-buffer-null.patch
enetc-select-phylib-while-config_fsl_enetc_vf-is-set.patch
nfsv4-fix-a-credential-refcount-leak-in-nfs41_check_.patch
nfsv4-when-recovering-state-fails-with-eagain-retry-.patch
nfsv4.1-fix-open-stateid-recovery.patch
nfsv4.1-only-reap-expired-delegations.patch
nfsv4-fix-a-potential-sleep-while-atomic-in-nfs4_do_.patch
nfs-fix-regression-whereby-fscache-errors-are-appear.patch
hid-quirks-set-the-increment_usage_on_duplicate-quir.patch
hid-input-fix-a4tech-horizontal-wheel-custom-usage.patch
drm-rockchip-suspend-dp-late.patch
smb3-fix-potential-memory-leak-when-processing-compo.patch
smb3-kernel-oops-mounting-a-encryptdata-share-with-c.patch
sched-deadline-fix-double-accounting-of-rq-running-b.patch
sched-psi-reduce-psimon-fifo-priority.patch
sched-psi-do-not-require-setsched-permission-from-th.patch
s390-protvirt-avoid-memory-sharing-for-diag-308-set-.patch
s390-mm-fix-dump_pagetables-top-level-page-table-wal.patch
s390-put-_stext-and-_etext-into-.text-section.patch
ata-rb532_cf-fix-unused-variable-warning-in-rb532_pa.patch
net-cxgb3_main-fix-a-resource-leak-in-a-error-path-i.patch
net-stmmac-fix-issues-when-number-of-queues-4.patch
net-stmmac-tc-do-not-return-a-fragment-entry.patch
drm-amdgpu-pin-the-csb-buffer-on-hw-init-for-gfx-v8.patch
net-hisilicon-make-hip04_tx_reclaim-non-reentrant.patch
net-hisilicon-fix-hip04-xmit-never-return-tx_busy.patch
net-hisilicon-fix-dma_map_single-failed-on-arm64.patch
nfsv4-ensure-state-recovery-handles-etimedout-correc.patch
libata-have-ata_scsi_rw_xlat-fail-invalid-passthroug.patch
libata-add-sg-safety-checks-in-sff-pio-transfers.patch
x86-lib-cpu-address-missing-prototypes-warning.patch
drm-vmwgfx-fix-memory-leak-when-too-many-retries-hav.patch
block-aoe-fix-kernel-crash-due-to-atomic-sleep-when-.patch
block-bfq-handle-null-return-value-by-bfq_init_rq.patch
perf-ftrace-fix-failure-to-set-cpumask-when-only-one.patch
perf-cpumap-fix-writing-to-illegal-memory-in-handlin.patch
perf-pmu-events-fix-missing-cpu_clk_unhalted.core-ev.patch
dt-bindings-riscv-fix-the-schema-compatible-string-f.patch
kvm-arm64-don-t-write-junk-to-sysregs-on-reset.patch
kvm-arm-don-t-write-junk-to-cp15-registers-on-reset.patch
selftests-kvm-adding-config-fragments.patch
iwlwifi-mvm-disable-tx-amsdu-on-older-nics.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
ppc64le:
Host 1:
✅ Boot test [0]
✅ Podman system integration test (as root) [1]
✅ Podman system integration test (as user) [1]
✅ LTP lite [2]
✅ Loopdev Sanity [3]
✅ jvm test suite [4]
✅ AMTU (Abstract Machine Test Utility) [5]
✅ LTP: openposix test suite [6]
✅ Ethernet drivers sanity [7]
✅ Networking socket: fuzz [8]
✅ audit: audit testsuite test [9]
✅ httpd: mod_ssl smoke sanity [10]
✅ iotop: sanity [11]
✅ tuned: tune-processes-through-perf [12]
✅ Usex - version 1.9-29 [13]
x86_64:
Host 1:
✅ Boot test [0]
✅ Podman system integration test (as root) [1]
✅ Podman system integration test (as user) [1]
✅ LTP lite [2]
✅ Loopdev Sanity [3]
✅ jvm test suite [4]
✅ AMTU (Abstract Machine Test Utility) [5]
✅ LTP: openposix test suite [6]
✅ Ethernet drivers sanity [7]
✅ Networking socket: fuzz [8]
✅ audit: audit testsuite test [9]
✅ httpd: mod_ssl smoke sanity [10]
✅ iotop: sanity [11]
✅ tuned: tune-processes-through-perf [12]
✅ pciutils: sanity smoke test [14]
✅ Usex - version 1.9-29 [13]
✅ stress: stress-ng [15]
Test source:
💚 Pull requests are welcome for new tests or improvements to existing tests!
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/p…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/san…
[15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
vgpu ppgtt notification was split into 2 steps, the first step is to
update PVINFO's pdp register and then write PVINFO's g2v_notify register
with action code to tirgger ppgtt notification to GVT side.
currently these steps were not atomic operations due to no any protection,
so it is easy to enter race condition state during the MTBF, stress and
IGT test to cause GPU hang.
the solution is to add a lock to make vgpu ppgtt notication as atomic
operation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Xiaolin Zhang <xiaolin.zhang(a)intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++++
drivers/gpu/drm/i915/i915_vgpu.c | 1 +
3 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eb31c16..32e17c4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -961,6 +961,7 @@ struct i915_frontbuffer_tracking {
};
struct i915_virtual_gpu {
+ struct mutex lock;
bool active;
u32 caps;
};
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2cd2dab..ff0b178 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -833,6 +833,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
enum vgt_g2v_type msg;
int i;
+ mutex_lock(&dev_priv->vgpu.lock);
+
if (create)
atomic_inc(px_used(ppgtt->pd)); /* never remove */
else
@@ -860,6 +862,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
I915_WRITE(vgtif_reg(g2v_notify), msg);
+ mutex_unlock(&dev_priv->vgpu.lock);
+
return 0;
}
diff --git a/drivers/gpu/drm/i915/i915_vgpu.c b/drivers/gpu/drm/i915/i915_vgpu.c
index bf2b837..7493544 100644
--- a/drivers/gpu/drm/i915/i915_vgpu.c
+++ b/drivers/gpu/drm/i915/i915_vgpu.c
@@ -94,6 +94,7 @@ void i915_detect_vgpu(struct drm_i915_private *dev_priv)
dev_priv->vgpu.caps = readl(shared_area + vgtif_offset(vgt_caps));
dev_priv->vgpu.active = true;
+ mutex_init(&dev_priv->vgpu.lock);
DRM_INFO("Virtual GPU for Intel GVT-g detected.\n");
out:
--
2.7.4
From: Johannes Berg <johannes.berg(a)intel.com>
commit cfb21b11b891b08b79be07be57c40a85bb926668 upstream.
On older NICs, we occasionally see issues with A-MSDU support,
where the commands in the FIFO get confused and then we see an
assert EDC because the next command in the FIFO isn't TX.
We've tried to isolate this issue and understand where it comes
from, but haven't found any errors in building the A-MSDU in
software.
At least for now, disable A-MSDU support on older hardware so
that users can use it again without fearing the assert.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=203315.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 964c7baabede..3905770b8a1f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -474,7 +474,19 @@ int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm)
ieee80211_hw_set(hw, SUPPORTS_VHT_EXT_NSS_BW);
ieee80211_hw_set(hw, BUFF_MMPDU_TXQ);
ieee80211_hw_set(hw, STA_MMPDU_TXQ);
- ieee80211_hw_set(hw, TX_AMSDU);
+ /*
+ * On older devices, enabling TX A-MSDU occasionally leads to
+ * something getting messed up, the command read from the FIFO
+ * gets out of sync and isn't a TX command, so that we have an
+ * assert EDC.
+ *
+ * It's not clear where the bug is, but since we didn't used to
+ * support A-MSDU until moving the mac80211 iTXQs, just leave it
+ * for older devices. We also don't see this issue on any newer
+ * devices.
+ */
+ if (mvm->cfg->device_family >= IWL_DEVICE_FAMILY_9000)
+ ieee80211_hw_set(hw, TX_AMSDU);
ieee80211_hw_set(hw, TX_FRAG_LIST);
if (iwl_mvm_has_tlc_offload(mvm)) {
--
2.23.0.rc1
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
The code like this:
ptr = kmalloc(size, GFP_KERNEL);
page = virt_to_page(ptr);
offset = offset_in_page(ptr);
kfree(page_address(page) + offset);
may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.
In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag
is different from the tag in shadow hence print false report.
Instead of just comparing tags, do the following:
1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's
double-free and it doesn't matter what tag the pointer have.
2) If pointer tag is different from 0xFF, make sure that tag in the
shadow is the same as in the pointer.
Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu(a)mediatek.com>
Reported-by: Mark Rutland <mark.rutland(a)arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/common.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/kasan/common.c~mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y
+++ a/mm/kasan/common.c
@@ -407,8 +407,14 @@ static inline bool shadow_invalid(u8 tag
if (IS_ENABLED(CONFIG_KASAN_GENERIC))
return shadow_byte < 0 ||
shadow_byte >= KASAN_SHADOW_SCALE_SIZE;
- else
- return tag != (u8)shadow_byte;
+
+ /* else CONFIG_KASAN_SW_TAGS: */
+ if ((u8)shadow_byte == KASAN_TAG_INVALID)
+ return true;
+ if ((tag != KASAN_TAG_KERNEL) && (tag != (u8)shadow_byte))
+ return true;
+
+ return false;
}
static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
_
From: Henry Burns <henryburns(a)google.com>
Subject: mm/zsmalloc.c: fix race condition in zs_destroy_pool
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
have no guarantee that migration isn't happening in the background at that
time.
Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages. But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work(). Which would mean pages
still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages). This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding. Keeping that state under the class spinlock
keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if
compaction hits the leaked page. This crash would only occur if people
are changing their zswap backend at runtime (which eventually starts
destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 59 insertions(+), 2 deletions(-)
--- a/mm/zsmalloc.c~mm-zsmallocc-fix-race-condition-in-zs_destroy_pool
+++ a/mm/zsmalloc.c
@@ -54,6 +54,7 @@
#include <linux/mount.h>
#include <linux/pseudo_fs.h>
#include <linux/migrate.h>
+#include <linux/wait.h>
#include <linux/pagemap.h>
#include <linux/fs.h>
@@ -268,6 +269,10 @@ struct zs_pool {
#ifdef CONFIG_COMPACTION
struct inode *inode;
struct work_struct free_work;
+ /* A wait queue for when migration races with async_free_zspage() */
+ struct wait_queue_head migration_wait;
+ atomic_long_t isolated_pages;
+ bool destroying;
#endif
};
@@ -1874,6 +1879,19 @@ static void putback_zspage_deferred(stru
}
+static inline void zs_pool_dec_isolated(struct zs_pool *pool)
+{
+ VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
+ atomic_long_dec(&pool->isolated_pages);
+ /*
+ * There's no possibility of racing, since wait_for_isolated_drain()
+ * checks the isolated count under &class->lock after enqueuing
+ * on migration_wait.
+ */
+ if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
+ wake_up_all(&pool->migration_wait);
+}
+
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
@@ -1943,6 +1961,7 @@ static bool zs_page_isolate(struct page
*/
if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
get_zspage_mapping(zspage, &class_idx, &fullness);
+ atomic_long_inc(&pool->isolated_pages);
remove_zspage(class, zspage, fullness);
}
@@ -2042,8 +2061,16 @@ static int zs_page_migrate(struct addres
* Page migration is done so let's putback isolated zspage to
* the list if @page is final isolated subpage in the zspage.
*/
- if (!is_zspage_isolated(zspage))
+ if (!is_zspage_isolated(zspage)) {
+ /*
+ * We cannot race with zs_destroy_pool() here because we wait
+ * for isolation to hit zero before we start destroying.
+ * Also, we ensure that everyone can see pool->destroying before
+ * we start waiting.
+ */
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
+ }
reset_page(page);
put_page(page);
@@ -2094,8 +2121,8 @@ static void zs_page_putback(struct page
* so let's defer.
*/
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
}
-
spin_unlock(&class->lock);
}
@@ -2118,8 +2145,36 @@ static int zs_register_migration(struct
return 0;
}
+static bool pool_isolated_are_drained(struct zs_pool *pool)
+{
+ return atomic_long_read(&pool->isolated_pages) == 0;
+}
+
+/* Function for resolving migration */
+static void wait_for_isolated_drain(struct zs_pool *pool)
+{
+
+ /*
+ * We're in the process of destroying the pool, so there are no
+ * active allocations. zs_page_isolate() fails for completely free
+ * zspages, so we need only wait for the zs_pool's isolated
+ * count to hit zero.
+ */
+ wait_event(pool->migration_wait,
+ pool_isolated_are_drained(pool));
+}
+
static void zs_unregister_migration(struct zs_pool *pool)
{
+ pool->destroying = true;
+ /*
+ * We need a memory barrier here to ensure global visibility of
+ * pool->destroying. Thus pool->isolated pages will either be 0 in which
+ * case we don't care, or it will be > 0 and pool->destroying will
+ * ensure that we wake up once isolation hits 0.
+ */
+ smp_mb();
+ wait_for_isolated_drain(pool); /* This can block */
flush_work(&pool->free_work);
iput(pool->inode);
}
@@ -2357,6 +2412,8 @@ struct zs_pool *zs_create_pool(const cha
if (!pool->name)
goto err;
+ init_waitqueue_head(&pool->migration_wait);
+
if (create_cache(pool))
goto err;
_
From: Henry Burns <henryburns(a)google.com>
Subject: mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage. However, the return value is ignored.
If a zs_free() races in between zs_page_isolate() and zs_page_migrate(),
freeing the last object in the zspage, putback_zspage() will leave the
page in ZS_EMPTY for potentially an unbounded amount of time.
To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur. To avoid duplicated code, move the sequence
to a new putback_zspage_deferred() function which both zs_page_migrate()
and zs_page_putback() call.
Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c~mm-zsmallocc-migration-can-leave-pages-in-zs_empty-indefinitely
+++ a/mm/zsmalloc.c
@@ -1862,6 +1862,18 @@ static void dec_zspage_isolation(struct
zspage->isolated--;
}
+static void putback_zspage_deferred(struct zs_pool *pool,
+ struct size_class *class,
+ struct zspage *zspage)
+{
+ enum fullness_group fg;
+
+ fg = putback_zspage(class, zspage);
+ if (fg == ZS_EMPTY)
+ schedule_work(&pool->free_work);
+
+}
+
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
@@ -2031,7 +2043,7 @@ static int zs_page_migrate(struct addres
* the list if @page is final isolated subpage in the zspage.
*/
if (!is_zspage_isolated(zspage))
- putback_zspage(class, zspage);
+ putback_zspage_deferred(pool, class, zspage);
reset_page(page);
put_page(page);
@@ -2077,14 +2089,13 @@ static void zs_page_putback(struct page
spin_lock(&class->lock);
dec_zspage_isolation(zspage);
if (!is_zspage_isolated(zspage)) {
- fg = putback_zspage(class, zspage);
/*
* Due to page_lock, we cannot free zspage immediately
* so let's defer.
*/
- if (fg == ZS_EMPTY)
- schedule_work(&pool->free_work);
+ putback_zspage_deferred(pool, class, zspage);
}
+
spin_unlock(&class->lock);
}
_
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has. As a result, split THP pages are wrongly reported in
the page_owner file as order-9 pages. Furthermore when the former head
page is freed, the remaining former tail pages are not listed in the
page_owner file at all. This patch fixes that by adding the
split_page_owner() call into __split_huge_page().
Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c~mm-page_owner-handle-thp-splits-correctly
+++ a/mm/huge_memory.c
@@ -32,6 +32,7 @@
#include <linux/shmem_fs.h>
#include <linux/oom.h>
#include <linux/numa.h>
+#include <linux/page_owner.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2516,6 +2517,9 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+
+ split_page_owner(head, HPAGE_PMD_ORDER);
+
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
_
From: Jason Xing <kerneljasonxing(a)linux.alibaba.com>
Subject: psi: get poll_work to run when calling poll syscall next time
Only when calling the poll syscall the first time can user receive POLLPRI
correctly. After that, user always fails to acquire the event signal.
Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.
The question is why we can end up with poll_scheduled = 1 but the work not
running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag.
Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.…
Signed-off-by: Jason Xing <kerneljasonxing(a)linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Caspar Zhang <caspar(a)linux.alibaba.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/sched/psi.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/kernel/sched/psi.c~psi-get-poll_work-to-run-when-calling-poll-syscall-next-time
+++ a/kernel/sched/psi.c
@@ -1131,7 +1131,15 @@ static void psi_trigger_destroy(struct k
* deadlock while waiting for psi_poll_work to acquire trigger_lock
*/
if (kworker_to_destroy) {
+ /*
+ * After the RCU grace period has expired, the worker
+ * can no longer be found through group->poll_kworker.
+ * But it might have been already scheduled before
+ * that - deschedule it cleanly before destroying it.
+ */
kthread_cancel_delayed_work_sync(&group->poll_work);
+ atomic_set(&group->poll_scheduled, 0);
+
kthread_destroy_worker(kworker_to_destroy);
}
kfree(t);
_
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcontrol: flush percpu vmevents before releasing memcg
Similar to vmstats, percpu caching of local vmevents leads to an
accumulation of errors on non-leaf levels. This happens because some
leftovers may remain in percpu caches, so that they are never propagated
up by the cgroup tree and just disappear into nonexistence with on
releasing of the memory cgroup.
To fix this issue let's accumulate and propagate percpu vmevents values
before releasing the memory cgroup similar to what we're doing with
vmstats.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-flush-percpu-vmevents-before-releasing-memcg
+++ a/mm/memcontrol.c
@@ -3295,6 +3295,25 @@ static void memcg_flush_percpu_vmstats(s
}
}
+static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg)
+{
+ unsigned long events[NR_VM_EVENT_ITEMS];
+ struct mem_cgroup *mi;
+ int cpu, i;
+
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ events[i] = 0;
+
+ for_each_online_cpu(cpu)
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ events[i] += raw_cpu_read(
+ memcg->vmstats_percpu->events[i]);
+
+ for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ atomic_long_add(events[i], &mi->vmevents[i]);
+}
+
#ifdef CONFIG_MEMCG_KMEM
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
@@ -4718,10 +4737,11 @@ static void __mem_cgroup_free(struct mem
int node;
/*
- * Flush percpu vmstats to guarantee the value correctness
+ * Flush percpu vmstats and vmevents to guarantee the value correctness
* on parent's and all ancestor levels.
*/
memcg_flush_percpu_vmstats(memcg);
+ memcg_flush_percpu_vmevents(memcg);
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->vmstats_percpu);
_
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcontrol: flush percpu vmstats before releasing memcg
Percpu caching of local vmstats with the conditional propagation by the
cgroup tree leads to an accumulation of errors on non-leaf levels.
Let's imagine two nested memory cgroups A and A/B. Say, a process
belonging to A/B allocates 100 pagecache pages on the CPU 0. The percpu
cache will spill 3 times, so that 32*3=96 pages will be accounted to A/B
and A atomic vmstat counters, 4 pages will remain in the percpu cache.
Imagine A/B is nearby memory.max, so that every following allocation
triggers a direct reclaim on the local CPU. Say, each such attempt will
free 16 pages on a new cpu. That means every percpu cache will have -16
pages, except the first one, which will have 4 - 16 = -12. A/B and A
atomic counters will not be touched at all.
Now a user removes A/B. All percpu caches are freed and corresponding
vmstat numbers are forgotten. A has 96 pages more than expected.
As memory cgroups are created and destroyed, errors do accumulate. Even
1-2 pages differences can accumulate into large numbers.
To fix this issue let's accumulate and propagate percpu vmstat values
before releasing the memory cgroup. At this point these numbers are
stable and cannot be changed.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-2-guro@fb.com
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
--- a/mm/memcontrol.c~mm-memcontrol-flush-percpu-vmstats-before-releasing-memcg
+++ a/mm/memcontrol.c
@@ -3260,6 +3260,41 @@ static u64 mem_cgroup_read_u64(struct cg
}
}
+static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg)
+{
+ unsigned long stat[MEMCG_NR_STAT];
+ struct mem_cgroup *mi;
+ int node, cpu, i;
+
+ for (i = 0; i < MEMCG_NR_STAT; i++)
+ stat[i] = 0;
+
+ for_each_online_cpu(cpu)
+ for (i = 0; i < MEMCG_NR_STAT; i++)
+ stat[i] += raw_cpu_read(memcg->vmstats_percpu->stat[i]);
+
+ for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
+ for (i = 0; i < MEMCG_NR_STAT; i++)
+ atomic_long_add(stat[i], &mi->vmstats[i]);
+
+ for_each_node(node) {
+ struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+ struct mem_cgroup_per_node *pi;
+
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ stat[i] = 0;
+
+ for_each_online_cpu(cpu)
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ stat[i] += raw_cpu_read(
+ pn->lruvec_stat_cpu->count[i]);
+
+ for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ atomic_long_add(stat[i], &pi->lruvec_stat[i]);
+ }
+}
+
#ifdef CONFIG_MEMCG_KMEM
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
@@ -4682,6 +4717,11 @@ static void __mem_cgroup_free(struct mem
{
int node;
+ /*
+ * Flush percpu vmstats to guarantee the value correctness
+ * on parent's and all ancestor levels.
+ */
+ memcg_flush_percpu_vmstats(memcg);
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->vmstats_percpu);
_
From: David Rientjes <rientjes(a)google.com>
Subject: mm, page_alloc: move_freepages should not examine struct page of reserved memory
After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
struct page of reserved memory is zeroed. This causes page->flags to be 0
and fixes issues related to reading /proc/kpageflags, for example, of
reserved memory.
The VM_BUG_ON() in move_freepages_block(), however, assumes that
page_zone() is meaningful even for reserved memory. That assumption is no
longer true after the aforementioned commit.
There's no reason why move_freepages_block() should be testing the
legitimacy of page_zone() for reserved memory; its scope is limited only
to pages on the zone's freelist.
Note that pfn_valid() can be true for reserved memory: there is a backing
struct page. The check for page_to_nid(page) is also buggy but reserved
memory normally only appears on node 0 so the zeroing doesn't affect this.
Move the debug checks to after verifying PageBuddy is true. This isolates
the scope of the checks to only be for buddy pages which are on the zone's
freelist which move_freepages_block() is operating on. In this case, an
incorrect node or zone is a bug worthy of being warned about (and the
examination of struct page is acceptable bcause this memory is not
reserved).
Why does move_freepages_block() gets called on reserved memory? It's
simply math after finding a valid free page from the per-zone free area to
use as fallback. We find the beginning and end of the pageblock of the
valid page and that can bring us into memory that was reserved per the
e820. pfn_valid() is still true (it's backed by a struct page), but since
it's zero'd we shouldn't make any inferences here about comparing its node
or zone. The current node check just happens to succeed most of the time
by luck because reserved memory typically appears on node 0.
The fix here is to validate that we actually have buddy pages before
testing if there's any type of zone or node strangeness going on.
We noticed it almost immediately after bringing 907ec5fca3dc in on
CONFIG_DEBUG_VM builds. It depends on finding specific free pages in
the per-zone free area where the math in move_freepages() will bring
the start or end pfn into reserved memory and wanting to claim that
entire pageblock as a new migratetype. So the path will be rare,
require CONFIG_DEBUG_VM, and require fallback to a different
migratetype.
Some struct pages were already zeroed from reserve pages before
907ec5fca3c so it theoretically could trigger before this commit. I
think it's rare enough under a config option that most people don't run
that others may not have noticed. I wouldn't argue against a stable
tag and the backport should be easy enough, but probably wouldn't
single out a commit that this is fixing.
Mel said:
: The overhead of the debugging check is higher with this patch although
: it'll only affect debug builds and the path is not particularly hot.
: If this was a concern, I think it would be reasonable to simply remove
: the debugging check as the zone boundaries are checked in
: move_freepages_block and we never expect a zone/node to be smaller than
: a pageblock and stuck in the middle of another zone.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp…
Signed-off-by: David Rientjes <rientjes(a)google.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Masayoshi Mizuma <m.mizuma(a)jp.fujitsu.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Pavel Tatashin <pavel.tatashin(a)microsoft.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-move_freepages-should-not-examine-struct-page-of-reserved-memory
+++ a/mm/page_alloc.c
@@ -2238,27 +2238,12 @@ static int move_freepages(struct zone *z
unsigned int order;
int pages_moved = 0;
-#ifndef CONFIG_HOLES_IN_ZONE
- /*
- * page_zone is not safe to call in this context when
- * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
- * anyway as we check zone boundaries in move_freepages_block().
- * Remove at a later date when no bug reports exist related to
- * grouping pages by mobility
- */
- VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
- pfn_valid(page_to_pfn(end_page)) &&
- page_zone(start_page) != page_zone(end_page));
-#endif
for (page = start_page; page <= end_page;) {
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
continue;
}
- /* Make sure we are not inadvertently changing nodes */
- VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
-
if (!PageBuddy(page)) {
/*
* We assume that pages that could be isolated for
@@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *z
continue;
}
+ /* Make sure we are not inadvertently changing nodes */
+ VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
+ VM_BUG_ON_PAGE(page_zone(page) != zone, page);
+
order = page_order(page);
move_to_free_area(page, &zone->free_area[order], migratetype);
page += 1 << order;
_
From: Henry Burns <henryburns(a)google.com>
Subject: mm/z3fold.c: fix race between migration and destruction
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.
Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.
Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns(a)google.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 89 insertions(+)
--- a/mm/z3fold.c~mm-z3foldc-fix-race-between-migration-and-destruction
+++ a/mm/z3fold.c
@@ -41,6 +41,7 @@
#include <linux/workqueue.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/wait.h>
#include <linux/zpool.h>
#include <linux/magic.h>
@@ -145,6 +146,8 @@ struct z3fold_header {
* @release_wq: workqueue for safe page release
* @work: work_struct for safe page release
* @inode: inode for z3fold pseudo filesystem
+ * @destroying: bool to stop migration once we start destruction
+ * @isolated: int to count the number of pages currently in isolation
*
* This structure is allocated at pool creation time and maintains metadata
* pertaining to a particular z3fold pool.
@@ -163,8 +166,11 @@ struct z3fold_pool {
const struct zpool_ops *zpool_ops;
struct workqueue_struct *compact_wq;
struct workqueue_struct *release_wq;
+ struct wait_queue_head isolate_wait;
struct work_struct work;
struct inode *inode;
+ bool destroying;
+ int isolated;
};
/*
@@ -769,6 +775,7 @@ static struct z3fold_pool *z3fold_create
goto out_c;
spin_lock_init(&pool->lock);
spin_lock_init(&pool->stale_lock);
+ init_waitqueue_head(&pool->isolate_wait);
pool->unbuddied = __alloc_percpu(sizeof(struct list_head)*NCHUNKS, 2);
if (!pool->unbuddied)
goto out_pool;
@@ -808,6 +815,15 @@ out:
return NULL;
}
+static bool pool_isolated_are_drained(struct z3fold_pool *pool)
+{
+ bool ret;
+
+ spin_lock(&pool->lock);
+ ret = pool->isolated == 0;
+ spin_unlock(&pool->lock);
+ return ret;
+}
/**
* z3fold_destroy_pool() - destroys an existing z3fold pool
* @pool: the z3fold pool to be destroyed
@@ -817,6 +833,22 @@ out:
static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
+ /*
+ * We set pool-> destroying under lock to ensure that
+ * z3fold_page_isolate() sees any changes to destroying. This way we
+ * avoid the need for any memory barriers.
+ */
+
+ spin_lock(&pool->lock);
+ pool->destroying = true;
+ spin_unlock(&pool->lock);
+
+ /*
+ * We need to ensure that no pages are being migrated while we destroy
+ * these workqueues, as migration can queue work on either of the
+ * workqueues.
+ */
+ wait_event(pool->isolate_wait, !pool_isolated_are_drained(pool));
/*
* We need to destroy pool->compact_wq before pool->release_wq,
@@ -1307,6 +1339,28 @@ static u64 z3fold_get_pool_size(struct z
return atomic64_read(&pool->pages_nr);
}
+/*
+ * z3fold_dec_isolated() expects to be called while pool->lock is held.
+ */
+static void z3fold_dec_isolated(struct z3fold_pool *pool)
+{
+ assert_spin_locked(&pool->lock);
+ VM_BUG_ON(pool->isolated <= 0);
+ pool->isolated--;
+
+ /*
+ * If we have no more isolated pages, we have to see if
+ * z3fold_destroy_pool() is waiting for a signal.
+ */
+ if (pool->isolated == 0 && waitqueue_active(&pool->isolate_wait))
+ wake_up_all(&pool->isolate_wait);
+}
+
+static void z3fold_inc_isolated(struct z3fold_pool *pool)
+{
+ pool->isolated++;
+}
+
static bool z3fold_page_isolate(struct page *page, isolate_mode_t mode)
{
struct z3fold_header *zhdr;
@@ -1333,6 +1387,33 @@ static bool z3fold_page_isolate(struct p
spin_lock(&pool->lock);
if (!list_empty(&page->lru))
list_del(&page->lru);
+ /*
+ * We need to check for destruction while holding pool->lock, as
+ * otherwise destruction could see 0 isolated pages, and
+ * proceed.
+ */
+ if (unlikely(pool->destroying)) {
+ spin_unlock(&pool->lock);
+ /*
+ * If this page isn't stale, somebody else holds a
+ * reference to it. Let't drop our refcount so that they
+ * can call the release logic.
+ */
+ if (unlikely(kref_put(&zhdr->refcount,
+ release_z3fold_page_locked))) {
+ /*
+ * If we get here we have kref problems, so we
+ * should freak out.
+ */
+ WARN(1, "Z3fold is experiencing kref problems\n");
+ return false;
+ }
+ z3fold_page_unlock(zhdr);
+ return false;
+ }
+
+
+ z3fold_inc_isolated(pool);
spin_unlock(&pool->lock);
z3fold_page_unlock(zhdr);
return true;
@@ -1401,6 +1482,10 @@ static int z3fold_page_migrate(struct ad
queue_work_on(new_zhdr->cpu, pool->compact_wq, &new_zhdr->work);
+ spin_lock(&pool->lock);
+ z3fold_dec_isolated(pool);
+ spin_unlock(&pool->lock);
+
page_mapcount_reset(page);
put_page(page);
return 0;
@@ -1420,10 +1505,14 @@ static void z3fold_page_putback(struct p
INIT_LIST_HEAD(&page->lru);
if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) {
atomic64_dec(&pool->pages_nr);
+ spin_lock(&pool->lock);
+ z3fold_dec_isolated(pool);
+ spin_unlock(&pool->lock);
return;
}
spin_lock(&pool->lock);
list_add(&page->lru, &pool->lru);
+ z3fold_dec_isolated(pool);
spin_unlock(&pool->lock);
z3fold_page_unlock(zhdr);
}
_
This is the start of the stable review cycle for the 4.4.190 release.
There are 78 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 24 Aug 2019 05:18:13 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.190-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.190-rc1
YueHaibing <yuehaibing(a)huawei.com>
bonding: Add vlan tx offload to hw_enc_features
Xin Long <lucien.xin(a)gmail.com>
sctp: fix the transport error_count check
Huy Nguyen <huyn(a)mellanox.com>
net/mlx5e: Only support tx/rx pause setting for port owner
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/netback: Reset nr_frags before freeing skb
Eric Dumazet <edumazet(a)google.com>
net/packet: fix race in tpacket_snd()
Matthias Kaehlcke <mka(a)chromium.org>
x86/boot: Disable the address-of-packed-member compiler warning
Joerg Roedel <jroedel(a)suse.de>
iommu/amd: Move iommu_init_pci() to .init section
Andy Lutomirski <luto(a)kernel.org>
x86/vdso: Remove direct HPET access through the vDSO
Doug Ledford <dledford(a)redhat.com>
IB/mlx5: Make coding style more consistent
Jason Gunthorpe <jgg(a)mellanox.com>
RDMA: Directly cast the sockaddr union to sockaddr
Hannes Reinecke <hare(a)suse.de>
scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
Arnd Bergmann <arnd(a)arndb.de>
asm-generic: default BUG_ON(x) to if(x)BUG()
YueHaibing <yuehaibing(a)huawei.com>
Input: psmouse - fix build error of multiple definition
Will Deacon <will(a)kernel.org>
arm64: compat: Allow single-byte watchpoints on all addresses
Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Backport minimal compiler_attributes.h to support GCC 9
Tony Lindgren <tony(a)atomide.com>
USB: serial: option: Add Motorola modem UARTs
Bob Ham <bob.ham(a)puri.sm>
USB: serial: option: add the BroadMobi BM818 card
Yoshiaki Okamoto <yokamoto(a)allied-telesis.co.jp>
USB: serial: option: Add support for ZTE MF871A
Rogan Dawes <rogan(a)dawes.za.net>
USB: serial: option: add D-Link DWM-222 device ID
Oliver Neukum <oneukum(a)suse.com>
usb: cdc-acm: make sure a refcount is taken early enough
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix races in character device registration and deregistraion
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt3000: Fix rounding up of timer divisor
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt3000: Fix signed integer overflow 'divider * base'
Qian Cai <cai(a)lca.pw>
asm-generic: fix -Wtype-limits compiler warnings
YueHaibing <yuehaibing(a)huawei.com>
ocfs2: remove set but not used variable 'last_hash'
Tony Luck <tony.luck(a)intel.com>
IB/core: Add mitigation for Spectre V1
Masahiro Yamada <yamada.masahiro(a)socionext.com>
kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
Miquel Raynal <miquel.raynal(a)bootlin.com>
ata: libahci: do not complain in case of deferred probe
Don Brace <don.brace(a)microsemi.com>
scsi: hpsa: correct scsi command status issue after reset
Kees Cook <keescook(a)chromium.org>
libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
Numfor Mbiziwo-Tiapo <nums(a)google.com>
perf header: Fix use of unitialized value warning
Vince Weaver <vincent.weaver(a)maine.edu>
perf header: Fix divide by zero error if f_header.attr_size==0
Lucas Stach <l.stach(a)pengutronix.de>
irqchip/irq-imx-gpcv2: Forward irq type to parent
YueHaibing <yuehaibing(a)huawei.com>
xen/pciback: remove set but not used variable 'old_state'
Denis Kirjanov <kda(a)linux-powerpc.org>
net: usb: pegasus: fix improper read if get_registers() fail
Oliver Neukum <oneukum(a)suse.com>
Input: iforce - add sanity checks
Oliver Neukum <oneukum(a)suse.com>
Input: kbtab - sanity check for endpoint type
Hillf Danton <hdanton(a)sina.com>
HID: hiddev: do cleanup in failure of opening a device
Hillf Danton <hdanton(a)sina.com>
HID: hiddev: avoid opening a disconnected device
Oliver Neukum <oneukum(a)suse.com>
HID: holtek: test for sanity of intfdata
Wenwen Wang <wenwen(a)cs.uga.edu>
ALSA: hda - Fix a memory leak bug
Miles Chen <miles.chen(a)mediatek.com>
mm/memcontrol.c: fix use after free in mem_cgroup_iter()
Yavuz, Tuba <tuba(a)ece.ufl.edu>
USB: gadget: f_midi: fixing a possible double-free in f_midi
Felipe F. Tonello <eu(a)felipetonello.com>
usb: gadget: f_midi: fail if set_alt fails to allocate requests
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
sh: kernel: hw_breakpoint: Fix missing break in switch statement
Suganath Prabu <suganath-prabu.subramani(a)broadcom.com>
scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
Brian Norris <briannorris(a)chromium.org>
mwifiex: fix 802.11n/WPA detection
Steve French <stfrench(a)microsoft.com>
smb3: send CAP_DFS capability during session setup
Pavel Shilovsky <pshilov(a)microsoft.com>
SMB3: Fix deadlock in validate negotiate hits reconnect
Brian Norris <briannorris(a)chromium.org>
mac80211: don't WARN on short WMM parameters from AP
Wenwen Wang <wenwen(a)cs.uga.edu>
ALSA: firewire: fix a memory leak bug
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (nct7802) Fix wrong detection of in4 presence
Tomas Bortoli <tomasbortoli(a)gmail.com>
can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices
Tomas Bortoli <tomasbortoli(a)gmail.com>
can: peak_usb: pcan_usb_pro: Fix info-leaks to USB devices
Leonard Crestez <leonard.crestez(a)nxp.com>
perf/core: Fix creating kernel counters for PMUs that override event->cpu
Peter Zijlstra <peterz(a)infradead.org>
tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop
Tyrel Datwyler <tyreld(a)linux.vnet.ibm.com>
scsi: ibmvfc: fix WARN_ON during event pool release
Junxiao Bi <junxiao.bi(a)oracle.com>
scsi: megaraid_sas: fix panic on loading firmware crashdump
Arnd Bergmann <arnd(a)arndb.de>
ARM: davinci: fix sleep.S build error on ARMv4
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf probe: Avoid calling freeing routine multiple times for same pointer
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Be more restrictive about when a drain is allowed
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Prevent bypasses of set_params
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Fix regression on compressed capture streams
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: add sanity checks to the fast-requeue path
Wen Yang <wen.yang99(a)zte.com.cn>
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
Björn Gerhart <gerhart(a)posteo.de>
hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
Brian Norris <briannorris(a)chromium.org>
mac80211: don't warn about CW params when not using them
Thomas Tai <thomas.tai(a)oracle.com>
iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND
Florian Westphal <fw(a)strlen.de>
netfilter: nfnetlink: avoid deadlock due to synchronous request_module
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_usb: fix potential double kfree_skb()
Suzuki K Poulose <suzuki.poulose(a)arm.com>
usb: yurex: Fix use-after-free in yurex_delete
Adrian Hunter <adrian.hunter(a)intel.com>
perf db-export: Fix thread__exec_comm()
Joerg Roedel <jroedel(a)suse.de>
mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
Joerg Roedel <jroedel(a)suse.de>
x86/mm: Sync also unmappings in vmalloc_sync_all()
Joerg Roedel <jroedel(a)suse.de>
x86/mm: Check for pfn instead of page in vmalloc_sync_one()
Wenwen Wang <wenwen(a)cs.uga.edu>
sound: fix a memory leak bug
Oliver Neukum <oneukum(a)suse.com>
usb: iowarrior: fix deadlock on disconnect
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-davinci/sleep.S | 1 +
arch/arm64/kernel/hw_breakpoint.c | 7 +--
arch/sh/kernel/hw_breakpoint.c | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/vclock_gettime.c | 15 -------
arch/x86/entry/vdso/vdso-layout.lds.S | 5 +--
arch/x86/include/asm/clocksource.h | 3 +-
arch/x86/kernel/hpet.c | 1 -
arch/x86/kvm/trace.h | 3 +-
arch/x86/mm/fault.c | 15 +++----
drivers/ata/libahci_platform.c | 3 ++
drivers/ata/libata-zpodd.c | 2 +-
drivers/cpufreq/pasemi-cpufreq.c | 23 ++++------
drivers/firmware/Kconfig | 5 ++-
drivers/firmware/iscsi_ibft.c | 4 ++
drivers/hid/hid-holtek-kbd.c | 9 +++-
drivers/hid/usbhid/hiddev.c | 12 +++++
drivers/hwmon/nct6775.c | 3 +-
drivers/hwmon/nct7802.c | 6 +--
drivers/infiniband/core/addr.c | 15 +++----
drivers/infiniband/core/user_mad.c | 6 ++-
drivers/infiniband/hw/mlx5/mr.c | 6 +--
drivers/input/joystick/iforce/iforce-usb.c | 5 +++
drivers/input/mouse/trackpoint.h | 3 +-
drivers/input/tablet/kbtab.c | 6 ++-
drivers/iommu/amd_iommu_init.c | 2 +-
drivers/irqchip/irq-imx-gpcv2.c | 1 +
drivers/net/bonding/bond_main.c | 4 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 8 ++--
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 3 ++
drivers/net/usb/pegasus.c | 2 +-
drivers/net/wireless/mwifiex/main.h | 1 +
drivers/net/wireless/mwifiex/scan.c | 3 +-
drivers/net/xen-netback/netback.c | 2 +
drivers/s390/cio/qdio_main.c | 12 ++---
drivers/scsi/fcoe/fcoe_ctlr.c | 33 ++++++--------
drivers/scsi/hpsa.c | 12 ++++-
drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
drivers/scsi/libfc/fc_rport.c | 5 ++-
drivers/scsi/megaraid/megaraid_sas_base.c | 3 ++
drivers/scsi/mpt3sas/mpt3sas_base.c | 12 ++---
drivers/staging/comedi/drivers/dt3000.c | 8 ++--
drivers/tty/tty_ldsem.c | 5 +--
drivers/usb/class/cdc-acm.c | 18 ++++----
drivers/usb/core/file.c | 10 ++---
drivers/usb/gadget/function/f_midi.c | 6 ++-
drivers/usb/gadget/u_f.h | 2 +
drivers/usb/misc/iowarrior.c | 7 +--
drivers/usb/misc/yurex.c | 2 +-
drivers/usb/serial/option.c | 10 +++++
drivers/xen/xen-pciback/conf_space_capability.c | 3 +-
fs/cifs/smb2pdu.c | 7 ++-
fs/ocfs2/xattr.c | 3 --
include/asm-generic/bug.h | 2 +-
include/asm-generic/getorder.h | 50 +++++++++------------
include/linux/compiler.h | 16 +++++++
include/linux/module.h | 4 +-
include/scsi/libfcoe.h | 1 +
include/sound/compress_driver.h | 5 +--
kernel/events/core.c | 2 +-
mm/memcontrol.c | 41 ++++++++++++-----
mm/vmalloc.c | 9 ++++
net/mac80211/driver-ops.c | 13 ++++--
net/mac80211/mlme.c | 10 +++++
net/netfilter/nfnetlink.c | 2 +-
net/packet/af_packet.c | 7 +++
net/sctp/sm_sideeffect.c | 2 +-
scripts/Makefile.modpost | 2 +-
sound/core/compress_offload.c | 52 +++++++++++++++++-----
sound/firewire/packets-buffer.c | 2 +-
sound/pci/hda/hda_generic.c | 2 +-
sound/sound_core.c | 3 +-
tools/perf/builtin-probe.c | 10 +++++
tools/perf/util/header.c | 9 +++-
tools/perf/util/thread.c | 12 ++++-
78 files changed, 387 insertions(+), 223 deletions(-)
This is the start of the stable review cycle for the 4.19.68 release.
There are 85 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 24 Aug 2019 05:15:49 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.68-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.68-rc1
Michal Simek <michal.simek(a)xilinx.com>
mmc: sdhci-of-arasan: Do now show error message in case of deffered probe
Maxim Mikityanskiy <maximmi(a)mellanox.com>
net/mlx5e: Use flow keys dissector to parse packets for ARFS
Huy Nguyen <huyn(a)mellanox.com>
net/mlx5e: Only support tx/rx pause setting for port owner
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/netback: Reset nr_frags before freeing skb
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
tipc: initialise addr_trail_end when setting node addresses
YueHaibing <yuehaibing(a)huawei.com>
team: Add vlan tx offload to hw_enc_features
Xin Long <lucien.xin(a)gmail.com>
sctp: fix the transport error_count check
zhengbin <zhengbin13(a)huawei.com>
sctp: fix memleak in sctp_send_reset_streams
Eric Dumazet <edumazet(a)google.com>
net/packet: fix race in tpacket_snd()
Wenwen Wang <wenwen(a)cs.uga.edu>
net/mlx4_en: fix a memory leak bug
Chen-Yu Tsai <wens(a)csie.org>
net: dsa: Check existence of .port_mdb_add callback before calling it
YueHaibing <yuehaibing(a)huawei.com>
bonding: Add vlan tx offload to hw_enc_features
Manish Chopra <manishc(a)marvell.com>
bnx2x: Fix VF's VLAN reconfiguration in reload.
Joerg Roedel <jroedel(a)suse.de>
iommu/amd: Move iommu_init_pci() to .init section
YueHaibing <yuehaibing(a)huawei.com>
Input: psmouse - fix build error of multiple definition
Dirk Morris <dmorris(a)metaloft.com>
netfilter: conntrack: Use consistent ct id hash calculation
Will Deacon <will(a)kernel.org>
arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
Mike Snitzer <snitzer(a)redhat.com>
dm: disable DISCARD if the underlying storage no longer supports it
Rodrigo Vivi <rodrigo.vivi(a)intel.com>
drm/i915/cfl: Add a new CFL PCI ID.
Tony Lindgren <tony(a)atomide.com>
USB: serial: option: Add Motorola modem UARTs
Bob Ham <bob.ham(a)puri.sm>
USB: serial: option: add the BroadMobi BM818 card
Yoshiaki Okamoto <yokamoto(a)allied-telesis.co.jp>
USB: serial: option: Add support for ZTE MF871A
Rogan Dawes <rogan(a)dawes.za.net>
USB: serial: option: add D-Link DWM-222 device ID
Oliver Neukum <oneukum(a)suse.com>
USB: CDC: fix sanity checks in CDC union parser
Oliver Neukum <oneukum(a)suse.com>
usb: cdc-acm: make sure a refcount is taken early enough
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix races in character device registration and deregistraion
Jacopo Mondi <jacopo+renesas(a)jmondi.org>
iio: adc: max9611: Fix temperature reading in probe
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt3000: Fix rounding up of timer divisor
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: dt3000: Fix signed integer overflow 'divider * base'
Marc Zyngier <maz(a)kernel.org>
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Anders Roxell <anders.roxell(a)linaro.org>
arm64: KVM: regmap: Fix unexpected switch fall-through
Qian Cai <cai(a)lca.pw>
asm-generic: fix -Wtype-limits compiler warnings
YueHaibing <yuehaibing(a)huawei.com>
ocfs2: remove set but not used variable 'last_hash'
Yang Shi <yang.shi(a)linux.alibaba.com>
Revert "kmemleak: allow to coexist with fault injection"
Colin Ian King <colin.king(a)canonical.com>
drm/exynos: fix missing decrement of retry counter
Jeffrey Hugo <jeffrey.l.hugo(a)gmail.com>
drm: msm: Fix add_gpu_components
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
IB/mad: Fix use-after-free in ib mad completion handling
Guy Levi <guyle(a)mellanox.com>
IB/mlx5: Fix MR registration flow to use UMR properly
Tony Luck <tony.luck(a)intel.com>
IB/core: Add mitigation for Spectre V1
Qian Cai <cai(a)lca.pw>
arm64/mm: fix variable 'pud' set but not used
Masami Hiramatsu <mhiramat(a)kernel.org>
arm64: unwind: Prohibit probing on return_address()
Qian Cai <cai(a)lca.pw>
arm64/efi: fix variable 'si' set but not used
Stephen Boyd <swboyd(a)chromium.org>
kbuild: Check for unknown options with cc-option usage in Kconfig and clang
Masahiro Yamada <yamada.masahiro(a)socionext.com>
kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
Miquel Raynal <miquel.raynal(a)bootlin.com>
ata: libahci: do not complain in case of deferred probe
Wang Xiayang <xywang.sjtu(a)sjtu.edu.cn>
drm/amdgpu: fix a potential information leaking bug
Jia-Ju Bai <baijiaju1990(a)gmail.com>
scsi: qla2xxx: Fix possible fcport null-pointer dereferences
Don Brace <don.brace(a)microsemi.com>
scsi: hpsa: correct scsi command status issue after reset
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix deadlock between fiemap and transaction commits
YueHaibing <yuehaibing(a)huawei.com>
drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m
Kees Cook <keescook(a)chromium.org>
libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
Numfor Mbiziwo-Tiapo <nums(a)google.com>
perf header: Fix use of unitialized value warning
Vince Weaver <vincent.weaver(a)maine.edu>
perf header: Fix divide by zero error if f_header.attr_size==0
Lucas Stach <l.stach(a)pengutronix.de>
irqchip/irq-imx-gpcv2: Forward irq type to parent
Nianyao Tang <tangnianyao(a)huawei.com>
irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail
YueHaibing <yuehaibing(a)huawei.com>
xen/pciback: remove set but not used variable 'old_state'
Geert Uytterhoeven <geert+renesas(a)glider.be>
clk: renesas: cpg-mssr: Fix reset control race condition
Chunyan Zhang <chunyan.zhang(a)unisoc.com>
clk: sprd: Select REGMAP_MMIO to avoid compile errors
Codrin Ciubotariu <codrin.ciubotariu(a)microchip.com>
clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
Vincent Chen <vincent.chen(a)sifive.com>
riscv: Make __fstate_clean() work correctly.
Florian Westphal <fw(a)strlen.de>
netfilter: ebtables: also count base chain policies
Denis Kirjanov <kda(a)linux-powerpc.org>
net: usb: pegasus: fix improper read if get_registers() fail
Oliver Neukum <oneukum(a)suse.com>
Input: iforce - add sanity checks
Oliver Neukum <oneukum(a)suse.com>
Input: kbtab - sanity check for endpoint type
Hillf Danton <hdanton(a)sina.com>
HID: hiddev: do cleanup in failure of opening a device
Hillf Danton <hdanton(a)sina.com>
HID: hiddev: avoid opening a disconnected device
Oliver Neukum <oneukum(a)suse.com>
HID: holtek: test for sanity of intfdata
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda - Let all conexant codec enter D3 when rebooting
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda - Add a generic reboot_notify
Wenwen Wang <wenwen(a)cs.uga.edu>
ALSA: hda - Fix a memory leak bug
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda - Apply workaround for another AMD chip 1022:1487
Hui Peng <benquike(a)gmail.com>
ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit
Hui Peng <benquike(a)gmail.com>
ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda/realtek - Add quirk for HP Envy x360
Max Filippov <jcmvbkbc(a)gmail.com>
xtensa: add missing isync to the cpu_reset TLB code
Viresh Kumar <viresh.kumar(a)linaro.org>
cpufreq: schedutil: Don't skip freq update when limits change
Fabrice Gasnier <fabrice.gasnier(a)st.com>
Revert "pwm: Set class for exported channels in sysfs"
Isaac J. Manjarres <isaacm(a)codeaurora.org>
mm/usercopy: use memory range to be accessed for wraparound check
Miles Chen <miles.chen(a)mediatek.com>
mm/memcontrol.c: fix use after free in mem_cgroup_iter()
Yang Shi <yang.shi(a)linux.alibaba.com>
mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
Yang Shi <yang.shi(a)linux.alibaba.com>
mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
Ralph Campbell <rcampbell(a)nvidia.com>
mm/hmm: fix bad subpage pointer in try_to_unmap_one
NeilBrown <neilb(a)suse.com>
seq_file: fix problem when seeking mid-record
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
sh: kernel: hw_breakpoint: Fix missing break in switch statement
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/include/asm/efi.h | 6 +-
arch/arm64/include/asm/pgtable.h | 4 +-
arch/arm64/kernel/ftrace.c | 21 +++--
arch/arm64/kernel/return_address.c | 3 +
arch/arm64/kernel/stacktrace.c | 3 +
arch/arm64/kvm/regmap.c | 5 ++
arch/riscv/include/asm/switch_to.h | 2 +-
arch/sh/kernel/hw_breakpoint.c | 1 +
arch/xtensa/kernel/setup.c | 1 +
drivers/ata/libahci_platform.c | 3 +
drivers/ata/libata-zpodd.c | 2 +-
drivers/clk/at91/clk-generated.c | 2 +
drivers/clk/renesas/renesas-cpg-mssr.c | 16 +---
drivers/clk/sprd/Kconfig | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
drivers/gpu/drm/bridge/Kconfig | 1 +
drivers/gpu/drm/exynos/exynos_drm_scaler.c | 4 +-
drivers/gpu/drm/msm/msm_drv.c | 3 +-
drivers/hid/hid-holtek-kbd.c | 9 +-
drivers/hid/usbhid/hiddev.c | 12 +++
drivers/iio/adc/max9611.c | 2 +-
drivers/infiniband/core/mad.c | 20 ++---
drivers/infiniband/core/user_mad.c | 6 +-
drivers/infiniband/hw/mlx5/mr.c | 27 ++----
drivers/input/joystick/iforce/iforce-usb.c | 5 ++
drivers/input/mouse/trackpoint.h | 3 +-
drivers/input/tablet/kbtab.c | 6 +-
drivers/iommu/amd_iommu_init.c | 2 +-
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/irqchip/irq-imx-gpcv2.c | 1 +
drivers/md/dm-core.h | 1 +
drivers/md/dm-rq.c | 11 ++-
drivers/md/dm.c | 20 ++++-
drivers/mmc/host/sdhci-of-arasan.c | 3 +-
drivers/net/bonding/bond_main.c | 2 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 2 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 17 ++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 97 +++++++-------------
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 3 +
drivers/net/team/team.c | 2 +
drivers/net/usb/pegasus.c | 2 +-
drivers/net/xen-netback/netback.c | 2 +
drivers/pwm/sysfs.c | 1 -
drivers/scsi/hpsa.c | 12 ++-
drivers/scsi/qla2xxx/qla_init.c | 2 +-
drivers/staging/comedi/drivers/dt3000.c | 8 +-
drivers/usb/class/cdc-acm.c | 12 +--
drivers/usb/core/file.c | 10 +--
drivers/usb/core/message.c | 4 +-
drivers/usb/gadget/udc/renesas_usb3.c | 5 +-
drivers/usb/serial/option.c | 10 +++
drivers/xen/xen-pciback/conf_space_capability.c | 3 +-
fs/btrfs/backref.c | 2 +-
fs/btrfs/transaction.c | 22 ++++-
fs/btrfs/transaction.h | 3 +
fs/ocfs2/xattr.c | 3 -
fs/seq_file.c | 2 +-
include/asm-generic/getorder.h | 50 +++++------
include/drm/i915_pciids.h | 1 +
include/kvm/arm_vgic.h | 1 +
kernel/sched/cpufreq_schedutil.c | 14 ++-
mm/kmemleak.c | 2 +-
mm/memcontrol.c | 39 +++++---
mm/mempolicy.c | 100 +++++++++++++++------
mm/rmap.c | 8 ++
mm/usercopy.c | 2 +-
net/bridge/netfilter/ebtables.c | 28 +++---
net/dsa/switch.c | 3 +
net/netfilter/nf_conntrack_core.c | 16 ++--
net/packet/af_packet.c | 7 ++
net/sctp/sm_sideeffect.c | 2 +-
net/sctp/stream.c | 1 +
net/tipc/addr.c | 1 +
scripts/Kconfig.include | 2 +-
scripts/Makefile.modpost | 2 +-
sound/pci/hda/hda_generic.c | 21 ++++-
sound/pci/hda/hda_generic.h | 1 +
sound/pci/hda/hda_intel.c | 3 +
sound/pci/hda/patch_conexant.c | 15 +---
sound/pci/hda/patch_realtek.c | 12 +--
sound/usb/mixer.c | 37 ++++++--
tools/perf/util/header.c | 9 +-
virt/kvm/arm/arm.c | 11 +++
virt/kvm/arm/vgic/vgic-v2.c | 9 +-
virt/kvm/arm/vgic/vgic-v3.c | 7 +-
virt/kvm/arm/vgic/vgic.c | 11 +++
virt/kvm/arm/vgic/vgic.h | 2 +
90 files changed, 551 insertions(+), 316 deletions(-)
If a program attempts to punch a hole on an inline data file, we need
to convert it to a normal file first.
This was detected using ext4/032 using the adv configuration. Simple
reproducer:
mke2fs -Fq -t ext4 -O inline_data /dev/vdc
mount /vdc
echo "" > /vdc/testfile
xfs_io -c 'truncate 33554432' /vdc/testfile
xfs_io -c 'fpunch 0 1048576' /vdc/testfile
umount /vdc
e2fsck -fy /dev/vdc
Cc: stable(a)vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
fs/ext4/inode.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2b1c58da8d1e..e567f0229d4e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4236,6 +4236,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
+ ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
+ if (ext4_has_inline_data(inode)) {
+ down_write(&EXT4_I(inode)->i_mmap_sem);
+ ret = ext4_convert_inline_data(inode);
+ up_write(&EXT4_I(inode)->i_mmap_sem);
+ if (ret)
+ return ret;
+ }
+
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
--
2.23.0
Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
This behavior changes after the 32-bit support: pti_clone_pgtable()
increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
addr may not be PUD_SIZE/PMD_SIZE aligned.
Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
in these two cases.
The following explains how we debugged this issue:
We use huge page for hot text and thus reduces iTLB misses. As we
benchmark 5.2 based kernel (vs. 4.16 based), we found ~2.5x more
iTLB misses.
To figure out the issue, I use a debug patch that dumps page table for
a pid. The following are information from the workload pid.
For the 4.16 based kernel:
host-4.16 # grep "x pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000 8M USR ro PSE x pmd
0xffffffff81a00000-0xffffffff81c00000 2M ro PSE x pmd
For the 5.2 based kernel before this patch:
host-5.2-before # grep "x pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000 8M USR ro PSE x pmd
The 8MB text in pmd is from user space. 4.16 kernel has 1 pmd for the
irq entry table; while 4.16 kernel doesn't have it.
For the 5.2 based kernel after this patch:
host-5.2-after # grep "x pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000 8M USR ro PSE x pmd
0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
So after this patch, the 5.2 based kernel has 7 PMDs instead of 1 PMD
in 4.16 kernel. This further reduces iTLB miss rate
Cc: stable(a)vger.kernel.org # v4.19+
Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
---
arch/x86/mm/pti.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index b196524759ec..1337494e22ef 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -330,13 +330,13 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
pud = pud_offset(p4d, addr);
if (pud_none(*pud)) {
- addr += PUD_SIZE;
+ addr = round_up(addr + 1, PUD_SIZE);
continue;
}
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
- addr += PMD_SIZE;
+ addr = round_up(addr + 1, PMD_SIZE);
continue;
}
--
2.17.1
Currently, we don't call dma_set_max_seg_size() for i915 because we
intentionally do not limit the segment length that the device supports.
However, this results in a warning being emitted if we try to map
anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG
enabled:
[ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer
than device claims to support [len=98304] [max=65536]
[ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220
debug_dma_map_sg+0x20f/0x340
This was originally brought up on
https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus
there was it wasn't really useful to set a limit (and that dma-debug
isn't really all that useful for i915 in the first place). Unfortunately
though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for
various distro kernels. Since a WARN_ON() will disable automatic problem
reporting (and cause any CI with said option enabled to start
complaining), we really should just fix the problem.
Note that as me and Chris Wilson discussed, the other solution for this
would be to make DMA-API not make such assumptions when a driver hasn't
explicitly set a maximum segment size. But, taking a look at the commit
which originally introduced this behavior, commit 78c47830a5cb
("dma-debug: check scatterlist segments"), there is an explicit mention
of this assumption and how it applies to devices with no segment size:
Conversely, devices which are less limited than the rather
conservative defaults, or indeed have no limitations at all
(e.g. GPUs with their own internal MMU), should be encouraged to
set appropriate dma_parms, as they may get more efficient DMA
mapping performance out of it.
So unless there's any concerns (I'm open to discussion!), let's just
follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit
to silence any warnings.
Changes since v3:
* Drop patch for enabling CONFIG_DMA_API_DEBUG_SG in CI. It looks like
just turning it on causes the kernel to spit out bogus WARN_ONs()
during some igt tests which would otherwise require teaching igt to
disable the various DMA-API debugging options causing this. This is
too much work to be worth it, since DMA-API debugging is useless for
us. So, we'll just settle with this single patch to squelch WARN_ONs()
during driver load for users that have CONFIG_DMA_API_DEBUG_SG turned
on for some reason.
* Move dma_set_max_seg_size() call into i915_driver_hw_probe() - Chris
Wilson
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: <stable(a)vger.kernel.org> # v4.18+
---
drivers/gpu/drm/i915/i915_drv.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b5b2a64753e6..020696726f9e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1279,6 +1279,12 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
pci_set_master(pdev);
+ /*
+ * We don't have a max segment size, so set it to the max so sg's
+ * debugging layer doesn't complain
+ */
+ dma_set_max_seg_size(&pdev->dev, UINT_MAX);
+
/* overlay on gen2 is broken and can't address above 1G */
if (IS_GEN(dev_priv, 2)) {
ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(30));
--
2.21.0
Currently, we don't call dma_set_max_seg_size() for i915 because we
intentionally do not limit the segment length that the device supports.
However, this results in a warning being emitted if we try to map
anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG
enabled:
[ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer
than device claims to support [len=98304] [max=65536]
[ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220
debug_dma_map_sg+0x20f/0x340
This was originally brought up on
https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus
there was it wasn't really useful to set a limit (and that dma-debug
isn't really all that useful for i915 in the first place). Unfortunately
though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for
various distro kernels. Since a WARN_ON() will disable automatic problem
reporting (and cause any CI with said option enabled to start
complaining), we really should just fix the problem.
Note that as me and Chris Wilson discussed, the other solution for this
would be to make DMA-API not make such assumptions when a driver hasn't
explicitly set a maximum segment size. But, taking a look at the commit
which originally introduced this behavior, commit 78c47830a5cb
("dma-debug: check scatterlist segments"), there is an explicit mention
of this assumption and how it applies to devices with no segment size:
Conversely, devices which are less limited than the rather
conservative defaults, or indeed have no limitations at all
(e.g. GPUs with their own internal MMU), should be encouraged to
set appropriate dma_parms, as they may get more efficient DMA
mapping performance out of it.
So unless there's any concerns (I'm open to discussion!), let's just
follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit
to silence any warnings.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: <stable(a)vger.kernel.org> # v4.18+
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0b81e0b64393..a1475039d182 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3152,6 +3152,11 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt)
if (ret)
return ret;
+ /* We don't have a max segment size, so set it to the max so sg's
+ * debugging layer doesn't complain
+ */
+ dma_set_max_seg_size(ggtt->vm.dma, UINT_MAX);
+
if ((ggtt->vm.total - 1) >> 32) {
DRM_ERROR("We never expected a Global GTT with more than 32bits"
" of address space! Found %lldM!\n",
--
2.21.0
Commit 871f1f2bcb01 ("platform/x86: intel_int0002_vgpio: Only implement
irq_set_wake on Bay Trail") removed the irq_set_wake method from the
struct irq_chip used on Cherry Trail, but it did not set
IRQCHIP_SKIP_SET_WAKE causing kernel/irq/manage.c: set_irq_wake_real()
to return -ENXIO.
This causes the kernel to no longer see PME events reported through the
INT0002 device as wakeup events. Which e.g. breaks wakeup by the (USB)
keyboard on many Cherry Trail 2-in-1 devices.
Cc: stable(a)vger.kernel.org
Fixes: 871f1f2bcb01 ("platform/x86: intel_int0002_vgpio: Only implement irq_set_wake on Bay Trail")
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/intel_int0002_vgpio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/intel_int0002_vgpio.c b/drivers/platform/x86/intel_int0002_vgpio.c
index d9542c661ddc..9ea1a2a19f86 100644
--- a/drivers/platform/x86/intel_int0002_vgpio.c
+++ b/drivers/platform/x86/intel_int0002_vgpio.c
@@ -144,6 +144,7 @@ static struct irq_chip int0002_cht_irqchip = {
* No set_wake, on CHT the IRQ is typically shared with the ACPI SCI
* and we don't want to mess with the ACPI SCI irq settings.
*/
+ .flags = IRQCHIP_SKIP_SET_WAKE,
};
static const struct x86_cpu_id int0002_cpu_ids[] = {
--
2.22.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b63f20a778c88b6a04458ed6ffc69da953d3a109
Gitweb: https://git.kernel.org/tip/b63f20a778c88b6a04458ed6ffc69da953d3a109
Author: Sean Christopherson <sean.j.christopherson(a)intel.com>
AuthorDate: Thu, 22 Aug 2019 14:11:22 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 23 Aug 2019 17:38:13 +02:00
x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
avoid clobbering flags.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of the CALL_NOSPEC is a small blob of code that performs
fast emulation by executing the target instruction with fixed operands.
adcb_al_dl:
0x000339f8 <+0>: adc %dl,%al
0x000339fa <+2>: ret
A major motiviation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of CALL_NOSPEC. Clobbering flags
results in all sorts of incorrect emulation, e.g. Jcc instructions often
take the wrong path. Sans the nops...
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
0x0003595a <+58>: mov 0xc0(%ebx),%eax
0x00035960 <+64>: mov 0x60(%ebx),%edx
0x00035963 <+67>: mov 0x90(%ebx),%ecx
0x00035969 <+73>: push %edi
0x0003596a <+74>: popf
0x0003596b <+75>: call *%esi
0x000359a0 <+128>: pushf
0x000359a1 <+129>: pop %edi
0x000359a2 <+130>: mov %eax,0xc0(%ebx)
0x000359b1 <+145>: mov %edx,0x60(%ebx)
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
0x000359a8 <+136>: mov -0x10(%ebp),%eax
0x000359ab <+139>: and $0x8d5,%edi
0x000359b4 <+148>: and $0xfffff72a,%eax
0x000359b9 <+153>: or %eax,%edi
0x000359bd <+157>: mov %edi,0x4(%ebx)
For the most part this has gone unnoticed as emulation of guest code
that can trigger fast emulation is effectively limited to MMIO when
running on modern hardware, and MMIO is rarely, if ever, accessed by
instructions that affect or consume flags.
Breakage is almost instantaneous when running with unrestricted guest
disabled, in which case KVM must emulate all instructions when the guest
has invalid state, e.g. when the guest is in Big Real Mode during early
BIOS.
Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
Fixes: 1a29b5b7f347a ("KVM: x86: Make indirect calls in emulator speculation safe")
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@inte…
---
arch/x86/include/asm/nospec-branch.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 109f974..80bc209 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -192,7 +192,7 @@
" lfence;\n" \
" jmp 902b;\n" \
" .align 16\n" \
- "903: addl $4, %%esp;\n" \
+ "903: lea 4(%%esp), %%esp;\n" \
" pushl %[thunk_target];\n" \
" ret;\n" \
" .align 16\n" \
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: aad39e30fb9e - Linux 5.2.9
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/116984
One or more kernel tests failed:
aarch64:
❌ LTP lite
❌ Loopdev Sanity
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: aad39e30fb9e - Linux 5.2.9
We grabbed the 8a2474fee8e4 commit of the stable queue repository.
We then merged the patchset with `git am`:
keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch
sh-kernel-hw_breakpoint-fix-missing-break-in-switch-statement.patch
seq_file-fix-problem-when-seeking-mid-record.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified.patch
mm-mempolicy-handle-vma-with-unmovable-pages-mapped-correctly-in-mbind.patch
mm-z3fold.c-fix-z3fold_destroy_pool-ordering.patch
mm-z3fold.c-fix-z3fold_destroy_pool-race-condition.patch
mm-memcontrol.c-fix-use-after-free-in-mem_cgroup_iter.patch
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
mm-vmscan-do-not-special-case-slab-reclaim-when-watermarks-are-boosted.patch
cpufreq-schedutil-don-t-skip-freq-update-when-limits-change.patch
drm-amdgpu-fix-gfx9-soft-recovery.patch
drm-nouveau-only-recalculate-pbn-vcpi-on-mode-connector-changes.patch
xtensa-add-missing-isync-to-the-cpu_reset-tlb-code.patch
arm64-ftrace-ensure-module-ftrace-trampoline-is-coherent-with-i-side.patch
alsa-hda-realtek-add-quirk-for-hp-envy-x360.patch
alsa-usb-audio-fix-a-stack-buffer-overflow-bug-in-check_input_term.patch
alsa-usb-audio-fix-an-oob-bug-in-parse_audio_mixer_unit.patch
alsa-hda-apply-workaround-for-another-amd-chip-1022-1487.patch
alsa-hda-fix-a-memory-leak-bug.patch
alsa-hda-add-a-generic-reboot_notify.patch
alsa-hda-let-all-conexant-codec-enter-d3-when-rebooting.patch
hid-holtek-test-for-sanity-of-intfdata.patch
hid-hiddev-avoid-opening-a-disconnected-device.patch
hid-hiddev-do-cleanup-in-failure-of-opening-a-device.patch
input-kbtab-sanity-check-for-endpoint-type.patch
input-iforce-add-sanity-checks.patch
net-usb-pegasus-fix-improper-read-if-get_registers-fail.patch
bpf-fix-access-to-skb_shared_info-gso_segs.patch
netfilter-ebtables-also-count-base-chain-policies.patch
riscv-correct-the-initialized-flow-of-fp-register.patch
riscv-make-__fstate_clean-work-correctly.patch
revert-i2c-imx-improve-the-error-handling-in-i2c_imx_dma_request.patch
blk-mq-move-cancel-of-requeue_work-to-the-front-of-blk_exit_queue.patch
io_uring-fix-manual-setup-of-iov_iter-for-fixed-buffers.patch
rdma-hns-fix-sg-offset-non-zero-issue.patch
ib-mlx5-replace-kfree-with-kvfree.patch
clk-at91-generated-truncate-divisor-to-generated_max.patch
clk-sprd-select-regmap_mmio-to-avoid-compile-errors.patch
clk-renesas-cpg-mssr-fix-reset-control-race-conditio.patch
dma-mapping-check-pfn-validity-in-dma_common_-mmap-g.patch
platform-x86-pcengines-apuv2-fix-softdep-statement.patch
platform-x86-intel_pmc_core-add-icl-nnpi-support-to-.patch
mm-hmm-always-return-ebusy-for-invalid-ranges-in-hmm.patch
xen-pciback-remove-set-but-not-used-variable-old_sta.patch
irqchip-gic-v3-its-free-unused-vpt_page-when-alloc-v.patch
irqchip-irq-imx-gpcv2-forward-irq-type-to-parent.patch
f2fs-fix-to-read-source-block-before-invalidating-it.patch
tools-perf-beauty-fix-usbdevfs_ioctl-table-generator.patch
perf-header-fix-divide-by-zero-error-if-f_header.att.patch
perf-header-fix-use-of-unitialized-value-warning.patch
rdma-qedr-fix-the-hca_type-and-hca_rev-returned-in-d.patch
alsa-pcm-fix-lost-wakeup-event-scenarios-in-snd_pcm_.patch
libata-zpodd-fix-small-read-overflow-in-zpodd_get_me.patch
powerpc-nvdimm-pick-nearby-online-node-if-the-device.patch
drm-bridge-lvds-encoder-fix-build-error-while-config.patch
drm-bridge-tc358764-fix-build-error.patch
btrfs-fix-deadlock-between-fiemap-and-transaction-co.patch
scsi-hpsa-correct-scsi-command-status-issue-after-re.patch
scsi-qla2xxx-fix-possible-fcport-null-pointer-derefe.patch
tracing-fix-header-include-guards-in-trace-event-hea.patch
drm-amdkfd-fix-byte-align-on-vegam.patch
drm-amd-powerplay-fix-null-pointer-dereference-aroun.patch
drm-amdgpu-fix-error-handling-in-amdgpu_cs_process_f.patch
drm-amdgpu-fix-a-potential-information-leaking-bug.patch
ata-libahci-do-not-complain-in-case-of-deferred-prob.patch
kbuild-modpost-handle-kbuild_extra_symbols-only-for-.patch
kbuild-check-for-unknown-options-with-cc-option-usag.patch
arm64-efi-fix-variable-si-set-but-not-used.patch
riscv-fix-perf-record-without-libelf-support.patch
arm64-lower-priority-mask-for-gic_prio_irqon.patch
arm64-unwind-prohibit-probing-on-return_address.patch
arm64-mm-fix-variable-pud-set-but-not-used.patch
arm64-mm-fix-variable-tag-set-but-not-used.patch
ib-core-add-mitigation-for-spectre-v1.patch
ib-mlx5-fix-mr-registration-flow-to-use-umr-properly.patch
rdma-restrack-track-driver-qp-types-in-resource-trac.patch
ib-mad-fix-use-after-free-in-ib-mad-completion-handl.patch
rdma-mlx5-release-locks-during-notifier-unregister.patch
drm-msm-fix-add_gpu_components.patch
rdma-hns-fix-error-return-code-in-hns_roce_v1_rsv_lp.patch
drm-exynos-fix-missing-decrement-of-retry-counter.patch
arm64-kprobes-recover-pstate.d-in-single-step-except.patch
arm64-make-debug-exception-handlers-visible-from-rcu.patch
revert-kmemleak-allow-to-coexist-with-fault-injectio.patch
ocfs2-remove-set-but-not-used-variable-last_hash.patch
page-flags-prioritize-kasan-bits-over-last-cpuid.patch
asm-generic-fix-wtype-limits-compiler-warnings.patch
tpm-tpm_ibm_vtpm-fix-unallocated-banks.patch
arm64-kvm-regmap-fix-unexpected-switch-fall-through.patch
staging-comedi-dt3000-fix-signed-integer-overflow-divider-base.patch
staging-comedi-dt3000-fix-rounding-up-of-timer-divisor.patch
iio-adc-max9611-fix-temperature-reading-in-probe.patch
usb-core-fix-races-in-character-device-registration-and-deregistraion.patch
usb-gadget-udc-renesas_usb3-fix-sysfs-interface-of-role.patch
usb-cdc-acm-make-sure-a-refcount-is-taken-early-enough.patch
usb-cdc-fix-sanity-checks-in-cdc-union-parser.patch
usb-serial-option-add-d-link-dwm-222-device-id.patch
usb-serial-option-add-support-for-zte-mf871a.patch
usb-serial-option-add-the-broadmobi-bm818-card.patch
usb-serial-option-add-motorola-modem-uarts.patch
usb-setup-authorized_default-attributes-using-usb_bus_notify.patch
netfilter-conntrack-use-consistent-ct-id-hash-calculation.patch
iwlwifi-add-support-for-sar-south-korea-limitation.patch
input-psmouse-fix-build-error-of-multiple-definition.patch
bnx2x-fix-vf-s-vlan-reconfiguration-in-reload.patch
bonding-add-vlan-tx-offload-to-hw_enc_features.patch
net-dsa-check-existence-of-.port_mdb_add-callback-before-calling-it.patch
net-mlx4_en-fix-a-memory-leak-bug.patch
net-packet-fix-race-in-tpacket_snd.patch
net-sched-sch_taprio-fix-memleak-in-error-path-for-sched-list-parse.patch
sctp-fix-memleak-in-sctp_send_reset_streams.patch
sctp-fix-the-transport-error_count-check.patch
team-add-vlan-tx-offload-to-hw_enc_features.patch
tipc-initialise-addr_trail_end-when-setting-node-addresses.patch
xen-netback-reset-nr_frags-before-freeing-skb.patch
net-mlx5e-only-support-tx-rx-pause-setting-for-port-owner.patch
bnxt_en-fix-vnic-clearing-logic-for-57500-chips.patch
bnxt_en-improve-rx-doorbell-sequence.patch
bnxt_en-fix-handling-frag_err-when-nvm_install_update-cmd-fails.patch
bnxt_en-suppress-hwrm-errors-for-hwrm_nvm_get_variable-command.patch
bnxt_en-use-correct-src_fid-to-determine-direction-of-the-flow.patch
bnxt_en-fix-to-include-flow-direction-in-l2-key.patch
net-sched-update-skbedit-action-for-batched-events-operations.patch
tc-testing-updated-skbedit-action-tests-with-batch-create-delete.patch
netdevsim-restore-per-network-namespace-accounting-for-fib-entries.patch
net-mlx5e-ethtool-avoid-setting-speed-to-56gbase-when-autoneg-off.patch
net-mlx5e-fix-false-negative-indication-on-tx-reporter-cqe-recovery.patch
net-mlx5e-remove-redundant-check-in-cqe-recovery-flow-of-tx-reporter.patch
net-mlx5e-use-flow-keys-dissector-to-parse-packets-for-arfs.patch
net-tls-prevent-skb_orphan-from-leaking-tls-plain-text-with-offload.patch
net-phy-consider-an_restart-status-when-reading-link-status.patch
netlink-fix-nlmsg_parse-as-a-wrapper-for-strict-message-parsing.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test [0]
✅ xfstests: xfs [1]
✅ selinux-policy: serge-testsuite [2]
✅ lvm thinp sanity [3]
✅ storage: software RAID testing [4]
🚧 ✅ Storage blktests [5]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [6]
✅ Podman system integration test (as user) [6]
❌ LTP lite [7]
❌ Loopdev Sanity [8]
✅ jvm test suite [9]
✅ AMTU (Abstract Machine Test Utility) [10]
✅ LTP: openposix test suite [11]
✅ Ethernet drivers sanity [12]
✅ Networking socket: fuzz [13]
✅ Networking sctp-auth: sockopts test [14]
✅ Networking TCP: keepalive test [15]
✅ audit: audit testsuite test [16]
✅ httpd: mod_ssl smoke sanity [17]
✅ iotop: sanity [18]
✅ tuned: tune-processes-through-perf [19]
✅ Usex - version 1.9-29 [20]
✅ storage: SCSI VPD [21]
✅ stress: stress-ng [22]
ppc64le:
Host 1:
✅ Boot test [0]
✅ xfstests: xfs [1]
✅ selinux-policy: serge-testsuite [2]
✅ lvm thinp sanity [3]
✅ storage: software RAID testing [4]
🚧 ✅ Storage blktests [5]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [6]
✅ Podman system integration test (as user) [6]
✅ LTP lite [7]
✅ Loopdev Sanity [8]
✅ jvm test suite [9]
✅ AMTU (Abstract Machine Test Utility) [10]
✅ LTP: openposix test suite [11]
✅ Ethernet drivers sanity [12]
✅ Networking socket: fuzz [13]
✅ Networking sctp-auth: sockopts test [14]
✅ Networking TCP: keepalive test [15]
✅ audit: audit testsuite test [16]
✅ httpd: mod_ssl smoke sanity [17]
✅ iotop: sanity [18]
✅ tuned: tune-processes-through-perf [19]
✅ Usex - version 1.9-29 [20]
x86_64:
Host 1:
✅ Boot test [0]
✅ xfstests: xfs [1]
✅ selinux-policy: serge-testsuite [2]
✅ lvm thinp sanity [3]
✅ storage: software RAID testing [4]
🚧 ✅ Storage blktests [5]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [6]
✅ Podman system integration test (as user) [6]
✅ LTP lite [7]
✅ Loopdev Sanity [8]
✅ jvm test suite [9]
✅ AMTU (Abstract Machine Test Utility) [10]
✅ LTP: openposix test suite [11]
✅ Ethernet drivers sanity [12]
✅ Networking socket: fuzz [13]
✅ Networking sctp-auth: sockopts test [14]
✅ Networking TCP: keepalive test [15]
✅ audit: audit testsuite test [16]
✅ httpd: mod_ssl smoke sanity [17]
✅ iotop: sanity [18]
✅ tuned: tune-processes-through-perf [19]
✅ pciutils: sanity smoke test [23]
✅ Usex - version 1.9-29 [20]
✅ storage: SCSI VPD [21]
✅ stress: stress-ng [22]
Test source:
💚 Pull requests are welcome for new tests or improvements to existing tests!
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/filesystems…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/lvm/…
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/swra…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/blk
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/p…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#networking/s…
[15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#networking/t…
[16]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[17]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[18]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[19]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[20]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[21]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/scsi…
[22]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[23]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/san…
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Turns out a cfs_rq->runtime_remaining can become positive in
assign_cfs_rq_runtime(), but this codepath has no call to
unthrottle_cfs_rq().
This can leave us in a situation where we have a throttled cfs_rq with
positive ->runtime_remaining, which breaks the math in
distribute_cfs_runtime(): this function expects a negative value so that
it may safely negate it into a positive value.
Add the missing unthrottle_cfs_rq(). While at it, add a WARN_ON where
we expect negative values, and pull in a comment from the mailing list
that didn't make it in [1].
[1]: https://lkml.kernel.org/r/BANLkTi=NmCxKX6EbDQcJYDJ5kKyG2N1ssw@mail.gmail.com
Cc: <stable(a)vger.kernel.org>
Fixes: ec12cb7f31e2 ("sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth")
Reported-by: Liangyan <liangyan.peng(a)linux.alibaba.com>
Signed-off-by: Valentin Schneider <valentin.schneider(a)arm.com>
---
kernel/sched/fair.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1054d2cf6aaa..219ff3f328e5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4385,6 +4385,11 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
}
+static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
+{
+ return cfs_bandwidth_used() && cfs_rq->throttled;
+}
+
/* returns 0 on failure to allocate runtime */
static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
{
@@ -4411,6 +4416,9 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
cfs_rq->runtime_remaining += amount;
+ if (cfs_rq->runtime_remaining > 0 && cfs_rq_throttled(cfs_rq))
+ unthrottle_cfs_rq(cfs_rq);
+
return cfs_rq->runtime_remaining > 0;
}
@@ -4439,11 +4447,6 @@ void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
__account_cfs_rq_runtime(cfs_rq, delta_exec);
}
-static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
-{
- return cfs_bandwidth_used() && cfs_rq->throttled;
-}
-
/* check whether cfs_rq, or any parent, is throttled */
static inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
{
@@ -4628,6 +4631,10 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining)
if (!cfs_rq_throttled(cfs_rq))
goto next;
+ /* By the above check, this should never be true */
+ WARN_ON(cfs_rq->runtime_remaining > 0);
+
+ /* Pick the minimum amount to return to a positive quota state */
runtime = -cfs_rq->runtime_remaining + 1;
if (runtime > remaining)
runtime = remaining;
--
2.22.0
vgpu ppgtt notification was split into 2 steps, the first step is to
update PVINFO's pdp register and then write PVINFO's g2v_notify register
with action code to tirgger ppgtt notification to GVT side.
currently these steps were not atomic operations due to no any protection,
so it is easy to enter race condition state during the MTBF, stress and
IGT test to cause GPU hang.
the solution is to add a lock to make vgpu ppgtt notication as atomic
operation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Xiaolin Zhang <xiaolin.zhang(a)intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++++
drivers/gpu/drm/i915/i915_vgpu.c | 1 +
3 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eb31c16..32e17c4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -961,6 +961,7 @@ struct i915_frontbuffer_tracking {
};
struct i915_virtual_gpu {
+ struct mutex lock;
bool active;
u32 caps;
};
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2cd2dab..1bb93b7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -833,6 +833,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
enum vgt_g2v_type msg;
int i;
+ mutex_lock(&dev_priv->vgpu.lock);
+
if (create)
atomic_inc(px_used(ppgtt->pd)); /* never remove */
else
@@ -860,6 +862,8 @@ static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
I915_WRITE(vgtif_reg(g2v_notify), msg);
+ mutex_lock(&dev_priv->vgpu.lock);
+
return 0;
}
diff --git a/drivers/gpu/drm/i915/i915_vgpu.c b/drivers/gpu/drm/i915/i915_vgpu.c
index bf2b837..7493544 100644
--- a/drivers/gpu/drm/i915/i915_vgpu.c
+++ b/drivers/gpu/drm/i915/i915_vgpu.c
@@ -94,6 +94,7 @@ void i915_detect_vgpu(struct drm_i915_private *dev_priv)
dev_priv->vgpu.caps = readl(shared_area + vgtif_offset(vgt_caps));
dev_priv->vgpu.active = true;
+ mutex_init(&dev_priv->vgpu.lock);
DRM_INFO("Virtual GPU for Intel GVT-g detected.\n");
out:
--
2.7.4
The patch titled
Subject: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
has been added to the -mm tree. Its filename is
mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-kasan-fix-false-positive-invali…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-kasan-fix-false-positive-invali…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
The code like this:
ptr = kmalloc(size, GFP_KERNEL);
page = virt_to_page(ptr);
offset = offset_in_page(ptr);
kfree(page_address(page) + offset);
may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.
In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag
is different from the tag in shadow hence print false report.
Instead of just comparing tags, do the following:
1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's
double-free and it doesn't matter what tag the pointer have.
2) If pointer tag is different from 0xFF, make sure that tag in the
shadow is the same as in the pointer.
Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu(a)mediatek.com>
Reported-by: Mark Rutland <mark.rutland(a)arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/common.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/kasan/common.c~mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y
+++ a/mm/kasan/common.c
@@ -407,8 +407,14 @@ static inline bool shadow_invalid(u8 tag
if (IS_ENABLED(CONFIG_KASAN_GENERIC))
return shadow_byte < 0 ||
shadow_byte >= KASAN_SHADOW_SCALE_SIZE;
- else
- return tag != (u8)shadow_byte;
+
+ /* else CONFIG_KASAN_SW_TAGS: */
+ if ((u8)shadow_byte == KASAN_TAG_INVALID)
+ return true;
+ if ((tag != KASAN_TAG_KERNEL) && (tag != (u8)shadow_byte))
+ return true;
+
+ return false;
}
static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
_
Patches currently in -mm which might be from aryabinin(a)virtuozzo.com are
mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y.patch
mm-vmscan-remove-unused-lru_pages-argument.patch
On Thu, Aug 22, 2019 at 3:21 PM Andrew Morton <akpm(a)linux-foundation.org> wrote:
>
> On Wed, 21 Aug 2019 11:26:25 +0800 Joseph Qi <joseph.qi(a)linux.alibaba.com> wrote:
>
> > Only when calling the poll syscall the first time can user
> > receive POLLPRI correctly. After that, user always fails to
> > acquire the event signal.
> >
> > Reproduce case:
> > 1. Get the monitor code in Documentation/accounting/psi.txt
> > 2. Run it, and wait for the event triggered.
> > 3. Kill and restart the process.
> >
> > The question is why we can end up with poll_scheduled = 1 but the work
> > not running (which would reset it to 0). And the answer is because the
> > scheduling side sees group->poll_kworker under RCU protection and then
> > schedules it, but here we cancel the work and destroy the worker. The
> > cancel needs to pair with resetting the poll_scheduled flag.
>
> Should this be backported into -stable kernels?
Adding GregKH and stable(a)vger.kernel.org
I was able to cleanly apply this patch to stable master and
linux-5.2.y branches (these are the only branches that have psi
triggers).
Greg, Andrew got this patch into -mm tree. Please advise on how we
should proceed to land it in stable 5.2.y and master.
Thanks,
Suren.
The patch titled
Subject: mm, page_owner: handle THP splits correctly
has been added to the -mm tree. Its filename is
mm-page_owner-handle-thp-splits-correctly.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_owner-handle-thp-splits-co…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_owner-handle-thp-splits-co…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has. As a result, split THP pages are wrongly reported in
the page_owner file as order-9 pages. Furthermore when the former head
page is freed, the remaining former tail pages are not listed in the
page_owner file at all. This patch fixes that by adding the
split_page_owner() call into __split_huge_page().
Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c~mm-page_owner-handle-thp-splits-correctly
+++ a/mm/huge_memory.c
@@ -32,6 +32,7 @@
#include <linux/shmem_fs.h>
#include <linux/oom.h>
#include <linux/numa.h>
+#include <linux/page_owner.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2516,6 +2517,9 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+
+ split_page_owner(head, HPAGE_PMD_ORDER);
+
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_owner-handle-thp-splits-correctly.patch
mm-page_owner-record-page-owner-for-each-subpage.patch
mm-page_owner-keep-owner-info-when-freeing-the-page.patch
mm-page_owner-debug_pagealloc-save-and-dump-freeing-stack-trace.patch
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone-fix-2.patch
mm-reclaim-cleanup-should_continue_reclaim.patch
mm-compaction-raise-compaction-priority-after-it-withdrawns.patch
The patch titled
Subject: userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
has been added to the -mm tree. Its filename is
userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/userfaultfd_release-always-remove-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd_release-always-remove-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even
if mm->core_state != NULL.
Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.
Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Tested-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx
+++ a/fs/userfaultfd.c
@@ -880,6 +880,7 @@ static int userfaultfd_release(struct in
/* len == 0 means wake all */
struct userfaultfd_wake_range range = { .len = 0, };
unsigned long new_flags;
+ bool still_valid;
WRITE_ONCE(ctx->released, true);
@@ -895,8 +896,7 @@ static int userfaultfd_release(struct in
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
- if (!mmget_still_valid(mm))
- goto skip_mm;
+ still_valid = mmget_still_valid(mm);
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -907,19 +907,20 @@ static int userfaultfd_release(struct in
continue;
}
new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
- prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
- new_flags, vma->anon_vma,
- vma->vm_file, vma->vm_pgoff,
- vma_policy(vma),
- NULL_VM_UFFD_CTX);
- if (prev)
- vma = prev;
- else
- prev = vma;
+ if (still_valid) {
+ prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
+ new_flags, vma->anon_vma,
+ vma->vm_file, vma->vm_pgoff,
+ vma_policy(vma),
+ NULL_VM_UFFD_CTX);
+ if (prev)
+ vma = prev;
+ else
+ prev = vma;
+ }
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
-skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
_
Patches currently in -mm which might be from oleg(a)redhat.com are
userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
aio-simplify-read_events.patch
Currently, we don't call dma_set_max_seg_size() for i915 because we
intentionally do not limit the segment length that the device supports.
However, this results in a warning being emitted if we try to map
anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG
enabled:
[ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer
than device claims to support [len=98304] [max=65536]
[ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220
debug_dma_map_sg+0x20f/0x340
This was originally brought up on
https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus
there was it wasn't really useful to set a limit (and that dma-debug
isn't really all that useful for i915 in the first place). Unfortunately
though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for
various distro kernels. Since a WARN_ON() will disable automatic problem
reporting (and cause any CI with said option enabled to start
complaining), we really should just fix the problem.
Note that as me and Chris Wilson discussed, the other solution for this
would be to make DMA-API not make such assumptions when a driver hasn't
explicitly set a maximum segment size. But, taking a look at the commit
which originally introduced this behavior, commit 78c47830a5cb
("dma-debug: check scatterlist segments"), there is an explicit mention
of this assumption and how it applies to devices with no segment size:
Conversely, devices which are less limited than the rather
conservative defaults, or indeed have no limitations at all
(e.g. GPUs with their own internal MMU), should be encouraged to
set appropriate dma_parms, as they may get more efficient DMA
mapping performance out of it.
So unless there's any concerns (I'm open to discussion!), let's just
follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit
to silence any warnings.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: <stable(a)vger.kernel.org> # v4.18+
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0b81e0b64393..a1475039d182 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3152,6 +3152,11 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt)
if (ret)
return ret;
+ /* We don't have a max segment size, so set it to the max so sg's
+ * debugging layer doesn't complain
+ */
+ dma_set_max_seg_size(ggtt->vm.dma, UINT_MAX);
+
if ((ggtt->vm.total - 1) >> 32) {
DRM_ERROR("We never expected a Global GTT with more than 32bits"
" of address space! Found %lldM!\n",
--
2.21.0
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
CC: stable(a)vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.
This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
return addr;
}
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+ unsigned long size, unsigned int levels);
+
static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
{
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
while (level) {
int n = (idx & mask) >> (level * shift);
- unsigned long tce;
+ unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) {
+ if (!tce) {
__be64 *tmp2;
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
if (!tmp2)
return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) |
- TCE_PCI_READ | TCE_PCI_WRITE);
+ tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+ oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+ cpu_to_be64(tce)));
+ if (oldtce) {
+ pnv_pci_ioda2_table_do_free_pages(tmp2,
+ ilog2(tbl->it_level_size) + 3, 1);
+ tce = oldtce;
+ }
}
- tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
--
2.17.1
This is a note to let you know that I've just added the patch titled
staging: erofs: detect potential multiref due to corrupted images
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From e12a0ce2fa69798194f3a8628baf6edfbd5c548f Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Wed, 21 Aug 2019 22:01:52 +0800
Subject: staging: erofs: detect potential multiref due to corrupted images
As reported by erofs-utils fuzzer, currently, multiref
(ondisk deduplication) hasn't been supported for now,
we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 4d6faaab04f5..60d7c20db87d 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -798,6 +798,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
for (i = 0; i < nr_pages; ++i)
pages[i] = NULL;
+ err = 0;
z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
cl->pagevec, 0);
@@ -819,8 +820,17 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ /*
+ * currently EROFS doesn't support multiref(dedup),
+ * so here erroring out one multiref page.
+ */
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
}
z_erofs_pagevec_ctor_exit(&ctor, true);
@@ -828,7 +838,6 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
overlapped = false;
compressed_pages = pcl->compressed_pages;
- err = 0;
for (i = 0; i < clusterpages; ++i) {
unsigned int pagenr;
@@ -852,7 +861,12 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
overlapped = true;
--
2.23.0
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
We're not allowed to create new properties after device registration
so for MST connectors we need to either create the max_bpc property
earlier, or we reuse one we already have. Let's do the latter apporach
since the corresponding SST connector already has the prop and its
min/max are correct also for the MST connector.
The problem was highlighted by commit 4f5368b5541a ("drm/kms:
Catch mode_object lifetime errors") which results in the following
spew:
[ 1330.878941] WARNING: CPU: 2 PID: 1554 at drivers/gpu/drm/drm_mode_object.c:45 __drm_mode_object_add+0xa0/0xb0 [drm]
...
[ 1330.879008] Call Trace:
[ 1330.879023] drm_property_create+0xba/0x180 [drm]
[ 1330.879036] drm_property_create_range+0x15/0x30 [drm]
[ 1330.879048] drm_connector_attach_max_bpc_property+0x62/0x80 [drm]
[ 1330.879086] intel_dp_add_mst_connector+0x11f/0x140 [i915]
[ 1330.879094] drm_dp_add_port.isra.20+0x20b/0x440 [drm_kms_helper]
...
Cc: stable(a)vger.kernel.org
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: sunpeng.li(a)amd.com
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sean Paul <sean(a)poorly.run>
Fixes: 5ca0ef8a56b8 ("drm/i915: Add max_bpc property for DP MST")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_dp_mst.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 83faa246e361..9748581c1d62 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -536,7 +536,15 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
intel_attach_force_audio_property(connector);
intel_attach_broadcast_rgb_property(connector);
- drm_connector_attach_max_bpc_property(connector, 6, 12);
+
+ /*
+ * Reuse the prop from the SST connector because we're
+ * not allowed to create new props after device registration.
+ */
+ connector->max_bpc_property =
+ intel_dp->attached_connector->base.max_bpc_property;
+ if (connector->max_bpc_property)
+ drm_connector_attach_max_bpc_property(connector, 6, 12);
return connector;
--
2.21.0
This is a note to let you know that I've just added the patch titled
usb-storage: Add new JMS567 revision to unusual_devs
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 08d676d1685c2a29e4d0e1b0242324e564d4589e Mon Sep 17 00:00:00 2001
From: Henk van der Laan <opensource(a)henkvdlaan.com>
Date: Fri, 16 Aug 2019 22:08:47 +0200
Subject: usb-storage: Add new JMS567 revision to unusual_devs
Revision 0x0117 suffers from an identical issue to earlier revisions,
therefore it should be added to the quirks list.
Signed-off-by: Henk van der Laan <opensource(a)henkvdlaan.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190816200847.21366-1-opensource@henkvdlaan.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/storage/unusual_devs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h
index ea0d27a94afe..1cd9b6305b06 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -2100,7 +2100,7 @@ UNUSUAL_DEV( 0x14cd, 0x6600, 0x0201, 0x0201,
US_FL_IGNORE_RESIDUE ),
/* Reported by Michael Büsch <m(a)bues.ch> */
-UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
+UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0117,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
--
2.23.0
This is a note to let you know that I've just added the patch titled
usb: chipidea: udc: don't do hardware access if gadget has stopped
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From cbe85c88ce80fb92956a0793518d415864dcead8 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen(a)nxp.com>
Date: Tue, 20 Aug 2019 02:07:58 +0000
Subject: usb: chipidea: udc: don't do hardware access if gadget has stopped
After _gadget_stop_activity is executed, we can consider the hardware
operation for gadget has finished, and the udc can be stopped and enter
low power mode. So, any later hardware operations (from usb_ep_ops APIs
or usb_gadget_ops APIs) should be considered invalid, any deinitializatons
has been covered at _gadget_stop_activity.
I meet this problem when I plug out usb cable from PC using mass_storage
gadget, my callstack like: vbus interrupt->.vbus_session->
composite_disconnect ->pm_runtime_put_sync(&_gadget->dev),
the composite_disconnect will call fsg_disable, but fsg_disable calls
usb_ep_disable using async way, there are register accesses for
usb_ep_disable. So sometimes, I get system hang due to visit register
without clock, sometimes not.
The Linux Kernel USB maintainer Alan Stern suggests this kinds of solution.
See: http://marc.info/?l=linux-usb&m=138541769810983&w=2.
Cc: <stable(a)vger.kernel.org> #v4.9+
Signed-off-by: Peter Chen <peter.chen(a)nxp.com>
Link: https://lore.kernel.org/r/20190820020503.27080-2-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/chipidea/udc.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
index 6a5ee8e6da10..67ad40b0a05b 100644
--- a/drivers/usb/chipidea/udc.c
+++ b/drivers/usb/chipidea/udc.c
@@ -709,12 +709,6 @@ static int _gadget_stop_activity(struct usb_gadget *gadget)
struct ci_hdrc *ci = container_of(gadget, struct ci_hdrc, gadget);
unsigned long flags;
- spin_lock_irqsave(&ci->lock, flags);
- ci->gadget.speed = USB_SPEED_UNKNOWN;
- ci->remote_wakeup = 0;
- ci->suspended = 0;
- spin_unlock_irqrestore(&ci->lock, flags);
-
/* flush all endpoints */
gadget_for_each_ep(ep, gadget) {
usb_ep_fifo_flush(ep);
@@ -732,6 +726,12 @@ static int _gadget_stop_activity(struct usb_gadget *gadget)
ci->status = NULL;
}
+ spin_lock_irqsave(&ci->lock, flags);
+ ci->gadget.speed = USB_SPEED_UNKNOWN;
+ ci->remote_wakeup = 0;
+ ci->suspended = 0;
+ spin_unlock_irqrestore(&ci->lock, flags);
+
return 0;
}
@@ -1303,6 +1303,10 @@ static int ep_disable(struct usb_ep *ep)
return -EBUSY;
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return 0;
+ }
/* only internal SW should disable ctrl endpts */
@@ -1392,6 +1396,10 @@ static int ep_queue(struct usb_ep *ep, struct usb_request *req,
return -EINVAL;
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return 0;
+ }
retval = _ep_queue(ep, req, gfp_flags);
spin_unlock_irqrestore(hwep->lock, flags);
return retval;
@@ -1415,8 +1423,8 @@ static int ep_dequeue(struct usb_ep *ep, struct usb_request *req)
return -EINVAL;
spin_lock_irqsave(hwep->lock, flags);
-
- hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
+ if (hwep->ci->gadget.speed != USB_SPEED_UNKNOWN)
+ hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
list_for_each_entry_safe(node, tmpnode, &hwreq->tds, td) {
dma_pool_free(hwep->td_pool, node->ptr, node->dma);
@@ -1487,6 +1495,10 @@ static void ep_fifo_flush(struct usb_ep *ep)
}
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return;
+ }
hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
@@ -1559,6 +1571,10 @@ static int ci_udc_wakeup(struct usb_gadget *_gadget)
int ret = 0;
spin_lock_irqsave(&ci->lock, flags);
+ if (ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(&ci->lock, flags);
+ return 0;
+ }
if (!ci->remote_wakeup) {
ret = -EOPNOTSUPP;
goto out;
--
2.23.0
This is a note to let you know that I've just added the patch titled
usbtmc: more sanity checking for packet size
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From de7b9aa633b693e77942e12f1769506efae6917b Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Tue, 20 Aug 2019 11:28:25 +0200
Subject: usbtmc: more sanity checking for packet size
A malicious device can make the driver divide ny zero
with a nonsense maximum packet size.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190820092826.17694-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/usbtmc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c
index 4942122b2346..36858ddd8d9b 100644
--- a/drivers/usb/class/usbtmc.c
+++ b/drivers/usb/class/usbtmc.c
@@ -2362,8 +2362,11 @@ static int usbtmc_probe(struct usb_interface *intf,
goto err_put;
}
+ retcode = -EINVAL;
data->bulk_in = bulk_in->bEndpointAddress;
data->wMaxPacketSize = usb_endpoint_maxp(bulk_in);
+ if (!data->wMaxPacketSize)
+ goto err_put;
dev_dbg(&intf->dev, "Found bulk in endpoint at %u\n", data->bulk_in);
data->bulk_out = bulk_out->bEndpointAddress;
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: erofs: detect potential multiref due to corrupted images
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From e12a0ce2fa69798194f3a8628baf6edfbd5c548f Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Wed, 21 Aug 2019 22:01:52 +0800
Subject: staging: erofs: detect potential multiref due to corrupted images
As reported by erofs-utils fuzzer, currently, multiref
(ondisk deduplication) hasn't been supported for now,
we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 4d6faaab04f5..60d7c20db87d 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -798,6 +798,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
for (i = 0; i < nr_pages; ++i)
pages[i] = NULL;
+ err = 0;
z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
cl->pagevec, 0);
@@ -819,8 +820,17 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ /*
+ * currently EROFS doesn't support multiref(dedup),
+ * so here erroring out one multiref page.
+ */
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
}
z_erofs_pagevec_ctor_exit(&ctor, true);
@@ -828,7 +838,6 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
overlapped = false;
compressed_pages = pcl->compressed_pages;
- err = 0;
for (i = 0; i < clusterpages; ++i) {
unsigned int pagenr;
@@ -852,7 +861,12 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
overlapped = true;
--
2.23.0
Detecting the ATS capability of the SMMU at probe time introduces a
spinlock into the ->unmap() fast path, even when ATS is not actually
in use. Furthermore, the ATC invalidation that exists is broken, as it
occurs before invalidation of the main SMMU TLB which leaves a window
where the ATC can be repopulated with stale entries.
Given that ATS is both a new feature and a specialist sport, disable it
for now whilst we fix it properly in subsequent patches. Since PRI
requires ATS, disable that too.
Cc: <stable(a)vger.kernel.org>
Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS")
Signed-off-by: Will Deacon <will(a)kernel.org>
---
drivers/iommu/arm-smmu-v3.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3402b1bc8e94..7a368059cd7d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3295,11 +3295,13 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
}
/* Boolean feature flags */
+#if 0 /* ATS invalidation is slow and broken */
if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI)
smmu->features |= ARM_SMMU_FEAT_PRI;
if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS)
smmu->features |= ARM_SMMU_FEAT_ATS;
+#endif
if (reg & IDR0_SEV)
smmu->features |= ARM_SMMU_FEAT_SEV;
--
2.11.0
This is a note to let you know that I've just added the patch titled
staging: erofs: cannot set EROFS_V_Z_INITED_BIT if fill_inode_lazy
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 3407a4198faf01c9d7596c45b8606834b8dfc2b7 Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Mon, 19 Aug 2019 18:34:22 +0800
Subject: staging: erofs: cannot set EROFS_V_Z_INITED_BIT if fill_inode_lazy
fails
As reported by erofs-utils fuzzer, unsupported compressed
clustersize will make fill_inode_lazy fail, for such case
we cannot set EROFS_V_Z_INITED_BIT since we need return
failure for each z_erofs_map_blocks_iter().
Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support")
Cc: <stable(a)vger.kernel.org> # 5.3+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-3-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zmap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
index b61b9b5950ac..7408e86823a4 100644
--- a/drivers/staging/erofs/zmap.c
+++ b/drivers/staging/erofs/zmap.c
@@ -85,12 +85,11 @@ static int fill_inode_lazy(struct inode *inode)
vi->z_physical_clusterbits[1] = vi->z_logical_clusterbits +
((h->h_clusterbits >> 5) & 7);
+ set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
unmap_done:
kunmap_atomic(kaddr);
unlock_page(page);
put_page(page);
-
- set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
out_unlock:
clear_and_wake_up_bit(EROFS_V_BL_Z_BIT, &vi->flags);
return err;
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: erofs: avoid endless loop of invalid lookback distance 0
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 598bb8913d015150b7734b55443c0e53e7189fc7 Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Mon, 19 Aug 2019 18:34:26 +0800
Subject: staging: erofs: avoid endless loop of invalid lookback distance 0
As reported by erofs-utils fuzzer, Lookback distance should
be a positive number, so it should be actually looked back
rather than spinning.
Fixes: 02827e1796b3 ("staging: erofs: add erofs_map_blocks_iter")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-7-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zmap.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
index 7408e86823a4..774dacbc5b32 100644
--- a/drivers/staging/erofs/zmap.c
+++ b/drivers/staging/erofs/zmap.c
@@ -350,6 +350,12 @@ static int vle_extent_lookback(struct z_erofs_maprecorder *m,
switch (m->type) {
case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+ if (unlikely(!m->delta[0])) {
+ errln("invalid lookback distance 0 at nid %llu",
+ vi->nid);
+ DBG_BUGON(1);
+ return -EFSCORRUPTED;
+ }
return vle_extent_lookback(m, m->delta[0]);
case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
map->m_flags &= ~EROFS_MAP_ZIPPED;
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: erofs: add two missing erofs_workgroup_put for corrupted
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 138e1a0990e80db486ab9f6c06bd5c01f9a97999 Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Mon, 19 Aug 2019 18:34:23 +0800
Subject: staging: erofs: add two missing erofs_workgroup_put for corrupted
images
As reported by erofs-utils fuzzer, these error handling
path will be entered to handle corrupted images.
Lack of erofs_workgroup_puts will cause unmounting
unsuccessfully.
Fix these return values to EFSCORRUPTED as well.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-4-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 87b0c96caf8f..23283c97fd3b 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -357,14 +357,16 @@ static struct z_erofs_collection *cllookup(struct z_erofs_collector *clt,
cl = z_erofs_primarycollection(pcl);
if (unlikely(cl->pageofs != (map->m_la & ~PAGE_MASK))) {
DBG_BUGON(1);
- return ERR_PTR(-EIO);
+ erofs_workgroup_put(grp);
+ return ERR_PTR(-EFSCORRUPTED);
}
length = READ_ONCE(pcl->length);
if (length & Z_EROFS_PCLUSTER_FULL_LENGTH) {
if ((map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) > length) {
DBG_BUGON(1);
- return ERR_PTR(-EIO);
+ erofs_workgroup_put(grp);
+ return ERR_PTR(-EFSCORRUPTED);
}
} else {
unsigned int llen = map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT;
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: erofs: some compressed cluster should be submitted for
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From ee45197c807895e156b2be0abcaebdfc116487c8 Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Mon, 19 Aug 2019 18:34:21 +0800
Subject: staging: erofs: some compressed cluster should be submitted for
corrupted images
As reported by erofs_utils fuzzer, a logical page can belong
to at most 2 compressed clusters, if one compressed cluster
is corrupted, but the other has been ready in submitting chain.
The chain needs to submit anyway in order to keep the page
working properly (page unlocked with PG_error set, PG_uptodate
not set).
Let's fix it now.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 2d7aaf98f7de..87b0c96caf8f 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -1307,19 +1307,18 @@ static int z_erofs_vle_normalaccess_readpage(struct file *file,
err = z_erofs_do_read_page(&f, page, &pagepool);
(void)z_erofs_collector_end(&f.clt);
- if (err) {
+ /* if some compressed cluster ready, need submit them anyway */
+ z_erofs_submit_and_unzip(inode->i_sb, &f.clt, &pagepool, true);
+
+ if (err)
errln("%s, failed to read, err [%d]", __func__, err);
- goto out;
- }
- z_erofs_submit_and_unzip(inode->i_sb, &f.clt, &pagepool, true);
-out:
if (f.map.mpage)
put_page(f.map.mpage);
/* clean up the remaining free pages */
put_pages_list(&pagepool);
- return 0;
+ return err;
}
static bool should_decompress_synchronously(struct erofs_sb_info *sbi,
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: erofs: fix an error handling in erofs_readdir()
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From acb383f1dcb4f1e79b66d4be3a0b6f519a957b0d Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Sun, 18 Aug 2019 20:54:57 +0800
Subject: staging: erofs: fix an error handling in erofs_readdir()
Richard observed a forever loop of erofs_read_raw_page() [1]
which can be generated by forcely setting ->u.i_blkaddr
to 0xdeadbeef (as my understanding block layer can
handle access beyond end of device correctly).
After digging into that, it seems the problem is highly
related with directories and then I found the root cause
is an improper error handling in erofs_readdir().
Let's fix it now.
[1] https://lore.kernel.org/r/1163995781.68824.1566084358245.JavaMail.zimbra@no…
Reported-by: Richard Weinberger <richard(a)nod.at>
Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations")
Cc: <stable(a)vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Link: https://lore.kernel.org/r/20190818125457.25906-1-hsiangkao@aol.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/dir.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/erofs/dir.c b/drivers/staging/erofs/dir.c
index 5f38382637e6..77ef856df9f3 100644
--- a/drivers/staging/erofs/dir.c
+++ b/drivers/staging/erofs/dir.c
@@ -82,8 +82,15 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
unsigned int nameoff, maxsize;
dentry_page = read_mapping_page(mapping, i, NULL);
- if (IS_ERR(dentry_page))
- continue;
+ if (dentry_page == ERR_PTR(-ENOMEM)) {
+ err = -ENOMEM;
+ break;
+ } else if (IS_ERR(dentry_page)) {
+ errln("fail to readdir of logical block %u of nid %llu",
+ i, EROFS_V(dir)->nid);
+ err = -EFSCORRUPTED;
+ break;
+ }
de = (struct erofs_dirent *)kmap(dentry_page);
--
2.23.0