Hi Greg,
Upstream commit 821cdad5c46c ("PCI: Wait up to 60 seconds for
device to become ready after FLR") fixes a virtualization issue
for Intel 750 NVMe drive and potentially other PCIe devices taking
longer to recover from functional resets.
problem description below from the commit:
'Sporadic reset issues have been observed with an Intel 750 NVMe drive while
assigning the physical function to the guest machine. The sequence of
events observed is as follows:
- perform a Function Level Reset (FLR)
- sleep up to 1000ms total
- read ~0 from PCI_COMMAND (CRS completion for config read)
- warn that the device didn't return from FLR
- touch the device before it's ready
- device drops config writes when we restore register settings (there's
no mechanism for software to learn about CRS completions for writes)
- incomplete register restore leaves device in inconsistent state
- device probe fails because device is in inconsistent state
After reset, an endpoint may respond to config requests with Configuration
Request Retry Status (CRS) to indicate that it is not ready to accept new
requests. See PCIe r3.1, sec 2.3.1 and 6.6.2.'
Please apply commit 821cdad5c46c to fix the resulting regression.
Thanks,
Sinan
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Memory hotplug, and hotremove operate with per-block granularity. If
machine has large amount of memory (more than 64G), the size of memory
block can span multiple sections. By mistake, during hotremove we set
only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is
older than this commit. In this optimization we also added a check for
sections to be in a proper state during hotplug operation.
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index 62eef264a7bd..73dc2fcc0eab 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr = pfn_to_section_nr(pfn);
struct mem_section *ms;
/*
--
2.17.0
Add support to specify platform specific transition_delay_us instead
of using the transition delay derived from PCC.
With commit "3d41386d556d: cpufreq: CPPC: Use transition_delay_us
depending transition_latency" we are setting transition_delay_us
directly and not applying the LATENCY_MULTIPLIER. With this on Qualcomm
Centriq we can end up with a very high rate of frequency change requests
when using schedutil governor (default rate_limit_us=10 compared to an
earlier value of 10000).
The PCC subspace describes the rate at which the platform can accept
commands on the CPPC's PCC channel. This includes read and write
command on the PCC channel that can be used for reasons other than
frequency transitions. Moreover the same PCC subspace can be used by
multiple freq domains and deriving transition_delay_us from it as we do
now can be sub-optimal.
Moreover if a platform does not use PCC for desired_perf register then
there is no way to compute the transition latency or the delay_us.
CPPC does not have a standard defined mechanism to get the transition
rate or the latency at the moment.
Given the above limitations, it is simpler to have a platform specific
transition_delay_us and rely on PCC derived value only if a platform
specific value is not available.
Signed-off-by: Prashanth Prakash <pprakash(a)codeaurora.org>
Cc: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: Rafael J. Wysocki <rjw(a)rjwysocki.net>
Cc: 4.14+ <stable(a)vger.kernel.org>
Fixes: 3d41386d556d ("cpufreq: CPPC: Use transition_delay_us depending
transition_latency)
---
v2:
* Return final delay_us from cppc_cpufreq_get_transition_delay_us (Viresh)
---
drivers/cpufreq/cppc_cpufreq.c | 43 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index bc5fc16..b1e32ad 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -126,6 +126,46 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy)
cpu->perf_caps.lowest_perf, cpu_num, ret);
}
+/*
+ * The PCC subspace describes the rate at which platform can accept commands
+ * on the shared PCC channel (including READs which do not count towards freq
+ * trasition requests), so ideally we need to use the PCC values as a fallback
+ * if we don't have a platform specific transition_delay_us
+ */
+#if defined(CONFIG_ARM64)
+#include <asm/cputype.h>
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ unsigned long implementor = read_cpuid_implementor();
+ unsigned long part_num = read_cpuid_part_number();
+ unsigned int delay_us = 0;
+
+ switch (implementor) {
+ case ARM_CPU_IMP_QCOM:
+ switch (part_num) {
+ case QCOM_CPU_PART_FALKOR_V1:
+ case QCOM_CPU_PART_FALKOR:
+ delay_us = 10000;
+ break;
+ }
+ break;
+ }
+
+ if (!delay_us)
+ delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+
+ return delay_us;
+}
+
+#else
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ return cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+}
+#endif
+
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu;
@@ -162,8 +202,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu->perf_caps.highest_perf;
policy->cpuinfo.max_freq = cppc_dmi_max_khz;
- policy->transition_delay_us = cppc_get_transition_latency(cpu_num) /
- NSEC_PER_USEC;
+ policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num);
policy->shared_type = cpu->shared_type;
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
--
Qualcomm Datacenter Technologies on behalf of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
Richard Jones has reported that using med_power_with_dipm on a T450s
with a Sandisk SD7UB3Q256G1001 SSD (firmware version X2180501) is
causing the machine to hang.
Switching the LPM to max_performance fixes this, so it seems that
this Sandisk SSD does not handle LPM well.
Note in the past there have been bug-reports about the following
Sandisk models not working with min_power, so we may need to extend
the quirk list in the future: name - firmware
Sandisk SD6SB2M512G1022I - X210400
Sandisk SD6PP4M-256G-1006 - A200906
Cc: stable(a)vger.kernel.org
Cc: Richard W.M. Jones <rjones(a)redhat.com>
Reported-and-tested-by: Richard W.M. Jones <rjones(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/ata/libata-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e400ff2b5db..68596bd4cf06 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4552,6 +4552,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
/* This specific Samsung model/firmware-rev does not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
+ /* Sandisk devices which are known to not handle LPM well */
+ { "SanDisk SD7UB3Q*G1001", NULL, ATA_HORKAGE_NOLPM, },
+
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
--
2.17.0
Please add this patch to stable 4.14
commit f54450ad1942287cc76b38021c0441fc4901d2de
Author: Kees Cook <keescook(a)chromium.org>
Date: Tue Feb 27 13:11:21 2018 -0800
console: Drop added "static" for newport_con
Commit 4fe505119778 ("console: Expand dummy functions for CFI") accidentally
added "static" to newport_con instance of struct consw, while trying to
normalize the declarations. This, however, needed to stay non-static as it
has an extern.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Fixes: 4fe505119778 ("console: Expand dummy functions for CFI")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Hi Greg,
could you cherry-pick the commit 55cc11da6989
Revert "ath10k: send (re)assoc peer command when NSS changed"
for 4.16.y?
We've got a regression report on openSUSE Tumbleweed, and this
upstream commit was confirmed to fix the issue:
http://bugzilla.suse.com/show_bug.cgi?id=1090458
Thanks!
Takashi